nicolo's picture

nicolo

nicolollo

·

AI & ML interests

None yet

Recent Activity

Reacted to vincentg64's post with 🧠 5 days ago

There is no such thing as a Trained LLM https://mltblog.com/3CEJ9Pt What I mean here is that traditional LLMs are trained on tasks irrelevant to what they will do for the user. It’s like training a plane to efficiently operate on the runway, but not to fly. In short, it is almost impossible to train an LLM, and evaluating is just as challenging. Then, training is not even necessary. In this article, I dive on all these topics. ➡️ Training LLMs for the wrong tasks Since the beginnings with Bert, training an LLM typically consists of predicting the next tokens in a sentence, or removing some tokens and then have your algorithm fill the blanks. You optimize the underlying deep neural networks to perform these supervised learning tasks as well as possible. Typically, it involves growing the list of tokens in the training set to billions or trillions, increasing the cost and time to train. However, recently, there is a tendency to work with smaller datasets, by distilling the input sources and token lists. After all, out of one trillion tokens, 99% are noise and do not contribute to improving the results for the end-user; they may even contribute to hallucinations. Keep in mind that human beings have a vocabulary of about 30,000 keywords, and that the number of potential standardized prompts on a specialized corpus (and thus the number of potential answers) is less than a million. ➡️ Read the full articles at https://mltblog.com/3CEJ9Pt, also featuring issues with evaluation metrics and the benefits of untrained LLMs.

liked a Space 6 days ago

AtlaAI/judge-arena

Reacted to reach-vb's post with 👍 12 days ago

What a brilliant week for Open Source AI! Qwen 2.5 Coder by Alibaba - 0.5B / 1.5B / 3B / 7B / 14B/ 32B (Base + Instruct) Code generation LLMs, with 32B tackling giants like Gemnini 1.5 Pro, Claude Sonnet https://huggingface.co/collections/Qwen/qwen25-coder-66eaa22e6f99801bf65b0c2f LLM2CLIP from Microsoft - Leverage LLMs to train ultra-powerful CLIP models! Boosts performance over the previous SOTA by ~17% https://huggingface.co/collections/microsoft/llm2clip-672323a266173cfa40b32d4c Athene v2 Chat & Agent by NexusFlow - SoTA general LLM fine-tuned from Qwen 2.5 72B excels at Chat + Function Calling/ JSON/ Agents https://huggingface.co/collections/Nexusflow/athene-v2-6735b85e505981a794fb02cc Orca Agent Instruct by Microsoft - 1 million instruct pairs covering text editing, creative writing, coding, reading comprehension, etc - permissively licensed https://huggingface.co/datasets/microsoft/orca-agentinstruct-1M-v1 Ultravox by FixieAI - 70B/ 8B model approaching GPT4o level, pick any LLM, train an adapter with Whisper as Audio Encoder https://huggingface.co/collections/reach-vb/ultravox-audio-language-model-release-67373b602af0a52b2a88ae71 JanusFlow 1.3 by DeepSeek - Next iteration of their Unified MultiModal LLM Janus with RectifiedFlow https://huggingface.co/deepseek-ai/JanusFlow-1.3B Common Corpus by Pleais - 2,003,039,184,047 multilingual, commercially permissive and high quality tokens! https://huggingface.co/datasets/PleIAs/common_corpus I'm sure I missed a lot, can't wait for the next week! Put down in comments what I missed! 🤗

View all activity

Organizations

None yet

nicolollo's activity

upvoted a collection 29 days ago

SmolLM2

State-of-the-art compact LLMs for on-device applications: 1.7B, 360M, 135M • 15 items • Updated 1 day ago • 183

upvoted an article about 1 month ago

Article

🇮🇹🇯🇵🇧🇷 Generating multilingual instruction datasets with Magpie 🐦‍⬛

By

•

Oct 21

• 18

upvoted a collection 3 months ago

SFT

9 items • Updated Aug 18 • 2

upvoted an article 4 months ago

Article

Uncensor any LLM with abliteration

By

•

Jun 13

• 374

upvoted a collection 4 months ago

main releases

powerful small models aimed to become good at chat/text while avoiding the usage of system prompts • 4 items • Updated 19 days ago • 2

upvoted an article 5 months ago

Article

The Rise of Agentic Data Generation

By

•

Jul 15

• 78