SOLO Model Card
Model details
Model type: SOLO is a 7B large vision-language model with a single Transformer architecture for unified vision-language modeling. SOLO accepts both raw image patches (in pixels) and texts as inputs, without using a separate pre-trained vision encoder.
Model date: SOLO-7B was trained in June 2024.
Paper or resources for more information: Paper & Github
Where to send questions or comments about the model: https://github.com/Yangyi-Chen/SOLO/issues
Inference with Huggingface Please check this scripts for an example of performing inference on the model.
- Downloads last month
- 97
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.