Kernel (Panic)

liked a Space 7 days ago

Running on L40S

185

🚀

Flux Style Shaping

Optical illusions and style transfer with FLUX

liked a model 3 months ago

AIDC-AI/Ovis1.6-Gemma2-9B

Image-Text-to-Text • Updated 15 days ago • 7.04k • 257

liked a model 4 months ago

nisten/Biggie-SmoLlm-0.15B-Base

Text Generation • Updated Aug 7 • 562 • 233

liked 2 models 8 months ago

parler-tts/parler_tts_mini_v0.1

Text-to-Speech • Updated Apr 30 • 26.1k • 346

NexaAIDev/Octopus-v2

Text Generation • Updated May 21 • 540 • 862

reacted to SivilTaram's post with 🔥 8 months ago

Post

2423

⚓️ Sailor: A New Multilingual Open LLM for South-East Asia 🌏

Last month we have released a new family of multilingual language models called **Sailor**, ranging from 0.5B to 7B parameters, continually pre-trained from the Qwen1.5 models. Based on our extensive benchmarking, the Sailor models demonstrate exceptional performance on South-East Asian languages, taking us one step closer to multilingual LLMs that can serve the diverse needs of the region and beyond.

Today, we're more than excited to share the key technical details behind the Sailor models! 💪

**Key highlights**:
🔍 Data curation: Merging short examples, document-level code-switching, aggressive data cleaning and deduplication.
🤖 Tokenization Robustness: We find that BPE dropout is really effective to deal with prompt variations.
🔍 Optimizing Data Mixture: We propose a new approach to automatically balance capabilities across different languages!
🌟 Recipe in Continual Pre-training: We discover a powerful metric that can help predict how well the Sailor models will perform on the original domain (e.g., English) after continual pre-training.

We are thrilled to share these technical details with the community and invite you to explore the Sailor models. We hope Sailor models take us one step closer to multilingual LLMs in the world! 🌍✨

To learn more, please access our research paper or reach out to our team.
🔗 Paper: Sailor: Open Language Models for South-East Asia (2404.03608)
🧩 Model: sail/sailor-language-models-65e19a749f978976f1959825
💻 Code: https://github.com/sail-sg/sailor-llm

liked a model 9 months ago

1bitLLM/bitnet_b1_58-3B

Text Generation • Updated Mar 29 • 3.48k • 241

reacted to akhaliq's post with ❤️ 9 months ago

Post

Simple and Scalable Strategies to Continually Pre-train Large Language Models

Simple and Scalable Strategies to Continually Pre-train Large Language Models (2403.08763)

Large language models (LLMs) are routinely pre-trained on billions of tokens, only to start the process over again once new data becomes available. A much more efficient solution is to continually pre-train these models, saving significant compute compared to re-training. However, the distribution shift induced by new data typically results in degraded performance on previous data or poor adaptation to the new data. In this work, we show that a simple and scalable combination of learning rate (LR) re-warming, LR re-decaying, and replay of previous data is sufficient to match the performance of fully re-training from scratch on all available data, as measured by final loss and language model (LM) evaluation benchmarks. Specifically, we show this for a weak but realistic distribution shift between two commonly used LLM pre-training datasets (EnglishrightarrowEnglish) and a stronger distribution shift (EnglishrightarrowGerman) at the 405M parameter model scale with large dataset sizes (hundreds of billions of tokens). Selecting the weak but realistic shift for larger-scale experiments, we also find that our continual learning strategies match the re-training baseline for a 10B parameter LLM. Our results demonstrate that LLMs can be successfully updated via simple and scalable continual learning strategies, matching the re-training baseline using only a fraction of the compute. Finally, inspired by previous work, we propose alternatives to the cosine learning rate schedule that help circumvent forgetting induced by LR re-warming and that are not bound to a fixed token budget.

reacted to vishesh-t27's post with 🔥 9 months ago

Post

Komodo-7B is here !! Today we are releasing the base version of Komodo-7B along with the technical report.

Komodo-7B is a family of LLMs that consist of Komodo-7B-Base and Komodo-7B-Instruct.

Komodo-7B performers really good in multiple Indonesian languages including Indonesian, Acehnese, Balinese, Banjarese, Buginese, Dayak Ngaju, Javanese, Lampungnese, Madurese, Minangkabau, Sundanese, and Toba Batak.

Our model outperforms various existing large language models including some multilingual models.

Technical Report: https://arxiv.org/abs/2403.09362

Base Model HuggingFace: Yellow-AI-NLP/komodo-7b-base

Kudos to the team @louisowen6 , @akanyaani & @biddwan Komodo: A Linguistic Expedition into Indonesia's Regional Languages (2403.09362)

New activity in Yellow-AI-NLP/komodo-7b-base 9 months ago

Why llama2 and not mistral 7b?

1

#1 opened 9 months ago by

Kernel

liked 3 models 9 months ago

upvoted a paper 9 months ago

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

Paper • 2403.03100 • Published Mar 5 • 34

liked a model 9 months ago

playgroundai/playground-v2.5-1024px-aesthetic

Text-to-Image • Updated Mar 15 • 231k • 681

reacted to osanseviero's post with ❤️ 9 months ago

Post

Diaries of Open Source. Part 1.

What a week! Here are some of the exciting Open Source releases of the week!

1. BigCode releases The Stack v2 and StarCoder 2
Resources in https://huggingface.co/posts/loubnabnl/596860170283496
Blog https://huggingface.co/blog/starcoder2
Collection: bigcode/starcoder2-65de6da6e87db3383572be1a

2. Playground v2.5, a very powerful new text-to-image model
Model: playgroundai/playground-v2.5-1024px-aesthetic
Demo: playgroundai/playground-v2.5
Blog: https://playground.com/blog/playground-v2-5

3.Evo: DNA foundation models
Blog: https://arcinstitute.org/news/blog/evo
Models: togethercomputer/evo-1-131k-base

4. OpenHermesPreferences: a dataset of ~1 million AI Preferences argilla/OpenHermesPreferences

5. SpeechBrain 1.0: a toolkit with hundreds of recipes and pretrained models for audio-related tasks, such as speech recognition, diarization, and enhancement. New major release!
HF repos: https://huggingface.co/speechbrain
Website: https://speechbrain.github.io/

6. Tower: a suite of Llama-based multilingual translation models Unbabel/tower-659eaedfe36e6dd29eb1805c

7. AllenAI releases OLMo-7B-Instruct
allenai/olmo-suite-65aeaae8fe5b6b2122b46778

8. DIBT - An crowdsourced effort to human-rate prompts. Its 10k prompts dataset is released ttps://huggingface.co/datasets/DIBT/10k_prompts_ranked

9. ChatMusician: A Llama 2 fine-tuned model for music generation m-a-p/ChatMusician

10. Bonito, an model that converts data into synthetic instruction datasets
GitHub: https://github.com/BatsResearch/bonito
Model: BatsResearch/bonito-v1
Paper: Learning to Generate Instruction Tuning Datasets for Zero-Shot Task Adaptation (2402.18334)

3 replies

·

reacted to vladbogo's post with 👍 9 months ago

Post

Panda-70M is a new large-scale video dataset comprising 70 million high-quality video clips, each paired with textual captions, designed to be used as pre-training for video understanding tasks.

Key Points:
* Automatic Caption Generation: Utilizes an automatic pipeline with multiple cross-modality teacher models to generate captions for video clips.
* Fine-tuned Caption Selection: Employs a fine-tuned retrieval model to select the most appropriate caption from multiple candidates for each video clip.
* Improved Performance: Pre-training on Panda-70M shows significant performance gains in video captioning, text-video retrieval, and text-driven video generation.

Paper: Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers (2402.19479)
Project page: https://snap-research.github.io/Panda-70M/
Code: https://github.com/snap-research/Panda-70M

Congrats to the authors @tschen , @aliaksandr-siarohin et al. for their work!

1 reply

·

commented a paper 10 months ago

MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases

Paper • 2402.14905 • Published Feb 22 • 126 •

13

liked a Space 10 months ago

Running

527

🖼💬

Vision Arena (Testing VLMs side-by-side)

reacted to alielfilali01's post with ❤️ 10 months ago

Post

🎉🥳🎉
Today, we are thrilled to officially launch the "2A2I" Arabic Artificial Intelligence Initiative. This is a community-driven initiative founded on the philosophy of "Small team, Big work" Our goal is to elevate Arabic AI (LLMs, Diffusion Models, ASR, etc.) to the same level as English (and also Chinese 🐉).

Naturally, our focus today is primarily on datasets. We aim to provide high-quality datasets, especially for LLMs this month, to support our future efforts. In line with this, we're excited to introduce the Arabic version of H4-no_robots, find here : 2A2I/H4_no_robots (and yes, we know it's not "no_robots" anymore 😄). Stay tuned for more exciting, high-quality datasets in the next couple of weeks (+ 4 million rows🔥)

In parallel, we're also developing a model 🐪 that we hope will set new high standards for Arabic LLMs. 🔥 This model is planned for release in the coming months.

For more information, please visit our Organization card here : https://huggingface.co/2A2I

If you're interested in Arabic AI and want to help pushing the wheel as well, fill out this form, and let us know your motivation and your exciting ideas 🔥

The form link : https://forms.gle/kZLVuynWFU2FyTm57

If you have any questions, feel free to reach out to us at the email address below.

Additionally, if you believe as we do in this mission and would like to help this community and contribute some compute resources 😉 or any other form of help you might think about, please contact us at the same email address below or reach out to me through LinkedIn 🔥

2A2I Contact Email : arabic.ai.initiative@gmail.com
My LinkedIn : https://www.linkedin.com/in/alielfilali01/

Panic

AI & ML interests

Recent Activity

Organizations

Kernel's activity

Flux Style Shaping

AIDC-AI/Ovis1.6-Gemma2-9B

nisten/Biggie-SmoLlm-0.15B-Base

parler-tts/parler_tts_mini_v0.1

NexaAIDev/Octopus-v2

1bitLLM/bitnet_b1_58-3B

Why llama2 and not mistral 7b?

Yellow-AI-NLP/komodo-7b-base

BK-Lee/MoAI-7B

urchade/gliner_base

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

playgroundai/playground-v2.5-1024px-aesthetic

MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases

Vision Arena (Testing VLMs side-by-side)