Let's go! We are releasing SmolVLM, a smol 2B VLM built for on-device inference that outperforms all models at similar GPU RAM usage and tokens throughputs.
- SmolVLM generates tokens 7.5 to 16 times faster than Qwen2-VL! ๐คฏ - Other models at this size crash a laptop, but SmolVLM comfortably generates 17 tokens/sec on a macbook! ๐ - SmolVLM can be fine-tuned on a Google collab! Or process millions of documents with a consumer GPU! - SmolVLM even outperforms larger models in video benchmarks, despite not even being trained on videos!
This is no Woodstock AI but will be fun nonetheless haha. Iโll be hosting a live workshop with team members next week about the Enterprise Hugging Face hub.
1,000 spots available first-come first serve with some surprises during the stream!
Maybe like me you have always wanted a super easy way to compare llama3.2-1B vs. llama3.2-3B? or the same model with different temperatures?
Trying and comparing warm Inference API models has never been easier! Just go to https://hf.co/playground, set your token and you're ready to go. We'll keep improving, feedback welcome ๐