Model Summery
MobileVLM V2 is a family of significantly improved vision language models upon MobileVLM, which proves that a delicate orchestration of novel architectural design, an improved training scheme tailored for mobile VLMs, and rich high-quality dataset curation can substantially benefit VLMs’ performance. Specifically, MobileVLM V2 1.7B achieves better or on-par performance on standard VLM benchmarks compared with much larger VLMs at the 3B scale. Notably, MobileVLM_V2-3B model outperforms a large variety of VLMs at the 7B+ scale.
The MobileVLM_V2-3B was built on our MobileLLaMA-2.7B-Chat to facilitate the off-the-shelf deployment.
Model Sources
- Repository: https://github.com/Meituan-AutoML/MobileVLM
- Paper: MobileVLM V2: Faster and Stronger Baseline for Vision Language Model
How to Get Started with the Model
Inference examples can be found at Github.
- Downloads last month
- 386
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.