metadata
license: apache-2.0
language:
- en
SOLO Model Card
Model details
Model type: SOLO is a 7B large vision-language model with a single Transformer architecture for unified vision-language modeling. SOLO accepts both raw image patches (in pixels) and texts as inputs, without using a separate pre-trained vision encoder.
Model date: SOLO-7B was trained in June 2024.
Paper or resources for more information: Paper & Github
Where to send questions or comments about the model: https://github.com/Yangyi-Chen/SOLO/issues
Inference with Huggingface Please check this scripts for an example of performing inference on the model.