nicolo
nicolollo
AI & ML interests
None yet
Recent Activity
Reacted to
vincentg64's
post
with 🧠
5 days ago
There is no such thing as a Trained LLM https://mltblog.com/3CEJ9Pt
What I mean here is that traditional LLMs are trained on tasks irrelevant to what they will do for the user. It’s like training a plane to efficiently operate on the runway, but not to fly. In short, it is almost impossible to train an LLM, and evaluating is just as challenging. Then, training is not even necessary. In this article, I dive on all these topics.
➡️ Training LLMs for the wrong tasks
Since the beginnings with Bert, training an LLM typically consists of predicting the next tokens in a sentence, or removing some tokens and then have your algorithm fill the blanks. You optimize the underlying deep neural networks to perform these supervised learning tasks as well as possible. Typically, it involves growing the list of tokens in the training set to billions or trillions, increasing the cost and time to train. However, recently, there is a tendency to work with smaller datasets, by distilling the input sources and token lists. After all, out of one trillion tokens, 99% are noise and do not contribute to improving the results for the end-user; they may even contribute to hallucinations. Keep in mind that human beings have a vocabulary of about 30,000 keywords, and that the number of potential standardized prompts on a specialized corpus (and thus the number of potential answers) is less than a million.
➡️ Read the full articles at https://mltblog.com/3CEJ9Pt, also featuring issues with evaluation metrics and the benefits of untrained LLMs.
Reacted to
reach-vb's
post
with 👍
12 days ago
What a brilliant week for Open Source AI!
Qwen 2.5 Coder by Alibaba - 0.5B / 1.5B / 3B / 7B / 14B/ 32B (Base + Instruct) Code generation LLMs, with 32B tackling giants like Gemnini 1.5 Pro, Claude Sonnet
https://huggingface.co/collections/Qwen/qwen25-coder-66eaa22e6f99801bf65b0c2f
LLM2CLIP from Microsoft - Leverage LLMs to train ultra-powerful CLIP models! Boosts performance over the previous SOTA by ~17%
https://huggingface.co/collections/microsoft/llm2clip-672323a266173cfa40b32d4c
Athene v2 Chat & Agent by NexusFlow - SoTA general LLM fine-tuned from Qwen 2.5 72B excels at Chat + Function Calling/ JSON/ Agents
https://huggingface.co/collections/Nexusflow/athene-v2-6735b85e505981a794fb02cc
Orca Agent Instruct by Microsoft - 1 million instruct pairs covering text editing, creative writing, coding, reading comprehension, etc - permissively licensed
https://huggingface.co/datasets/microsoft/orca-agentinstruct-1M-v1
Ultravox by FixieAI - 70B/ 8B model approaching GPT4o level, pick any LLM, train an adapter with Whisper as Audio Encoder
https://huggingface.co/collections/reach-vb/ultravox-audio-language-model-release-67373b602af0a52b2a88ae71
JanusFlow 1.3 by DeepSeek - Next iteration of their Unified MultiModal LLM Janus with RectifiedFlow
https://huggingface.co/deepseek-ai/JanusFlow-1.3B
Common Corpus by Pleais - 2,003,039,184,047 multilingual, commercially permissive and high quality tokens!
https://huggingface.co/datasets/PleIAs/common_corpus
I'm sure I missed a lot, can't wait for the next week!
Put down in comments what I missed! 🤗
View all activity
Organizations
None yet