david

quyet7779

AI & ML interests

None yet

Recent Activity

Reacted to andito's post with 🔥 4 days ago

Let's go! We are releasing SmolVLM, a smol 2B VLM built for on-device inference that outperforms all models at similar GPU RAM usage and tokens throughputs. - SmolVLM generates tokens 7.5 to 16 times faster than Qwen2-VL! 🤯 - Other models at this size crash a laptop, but SmolVLM comfortably generates 17 tokens/sec on a macbook! 🚀 - SmolVLM can be fine-tuned on a Google collab! Or process millions of documents with a consumer GPU! - SmolVLM even outperforms larger models in video benchmarks, despite not even being trained on videos! Check out more! Demo: https://huggingface.co/spaces/HuggingFaceTB/SmolVLM Blog: https://huggingface.co/blog/smolvlm Model: https://huggingface.co/HuggingFaceTB/SmolVLM-Instruct Fine-tuning script: https://github.com/huggingface/smollm/blob/main/finetuning/Smol_VLM_FT.ipynb

liked a Space 16 days ago

Qwen/Qwen2.5-Coder-demo

liked a Space 16 days ago

Kwai-Kolors/Kolors-Virtual-Try-On

View all activity

Organizations

quyet7779's activity

reacted to andito's post with 🔥 4 days ago

Post

3041

Let's go! We are releasing SmolVLM, a smol 2B VLM built for on-device inference that outperforms all models at similar GPU RAM usage and tokens throughputs.

- SmolVLM generates tokens 7.5 to 16 times faster than Qwen2-VL! 🤯
- Other models at this size crash a laptop, but SmolVLM comfortably generates 17 tokens/sec on a macbook! 🚀
- SmolVLM can be fine-tuned on a Google collab! Or process millions of documents with a consumer GPU!
- SmolVLM even outperforms larger models in video benchmarks, despite not even being trained on videos!

Check out more!
Demo: HuggingFaceTB/SmolVLM
Blog: https://huggingface.co/blog/smolvlm
Model: HuggingFaceTB/SmolVLM-Instruct
Fine-tuning script: https://github.com/huggingface/smollm/blob/main/finetuning/Smol_VLM_FT.ipynb

liked 2 Spaces 16 days ago

Running

306

👁

Qwen2.5 Coder Demo

Running on CPU Upgrade

5.63k

👕

Kolors Virtual Try-On

reacted to merve's post with 🔥 16 days ago

Post

4836

OmniVision-968M: a new local VLM for edge devices, fast & small but performant
💨 a new vision language model with 9x less image tokens, super efficient
📖 aligned with DPO for reducing hallucinations
⚡️ Apache 2.0 license 🔥

Demo hf.co/spaces/NexaAIDev/omnivlm-dpo-demo
Model NexaAIDev/omnivision-968M

4 replies

New activity in linhtran92/viet_bud500 2 months ago

Convert to wav file

#10 opened 2 months ago by

quyet7779

updated a model 3 months ago

quyet7779/whisper-small-vi

Updated Sep 13

reacted to mrfakename's post with ❤️ 6 months ago

Post

8921

Introducing StyleTTS 2 detector, an audio classification model to detect StyleTTS 2 vs human-generated content!

Dual-licensed under MIT/Apache 2.0.

Model Weights: mrfakename/styletts2-detector
Spaces: mrfakename/styletts2-detector

2 replies

liked a model 9 months ago

ByteDance/SDXL-Lightning

Text-to-Image • Updated Apr 3 • 166k • 1.93k