vision language models - a adamelliotfields Collection

adamelliotfields 's Collections

spaces

small language models

vision language models

video generation

image generation

papers

vision language models

updated about 4 hours ago

papers and models 🙈

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models

Paper • 2409.17146 • Published Sep 25 • 103
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

Paper • 2409.12191 • Published Sep 18 • 74
mistralai/Pixtral-12B-2409

Image-Text-to-Text • Updated 5 days ago • 538
HuggingFaceTB/SmolVLM-Instruct

Image-Text-to-Text • Updated 9 days ago • 56.1k • 277
showlab/ShowUI-2B

Updated 7 days ago • 9.76k • 170
microsoft/Phi-3-vision-128k-instruct

Text Generation • Updated Aug 20 • 108k • 937
mtgv/MobileVLM_V2-1.7B

Text Generation • Updated Feb 7 • 8.84k • 24
mtgv/MobileVLM_V2-3B

Text Generation • Updated Feb 7 • 360 • 7
xtuner/llava-phi-3-mini

Image-Text-to-Text • Updated Apr 25 • 16 • 24
rhymes-ai/Aria

Image-Text-to-Text • Updated about 21 hours ago • 13.3k • 592
THUDM/glm-edge-v-2b

Image-Text-to-Text • Updated 13 days ago • 2.27k • 7
THUDM/glm-edge-v-5b

Image-Text-to-Text • Updated 13 days ago • 281 • 11
h2oai/h2ovl-mississippi-2b

Text Generation • Updated 26 days ago • 15.7k • 23
google/paligemma2-3b-pt-448

Image-Text-to-Text • Updated 7 days ago • 7.81k • 28