SOLO Model Card

Model details

Model type: SOLO is a 7B large vision-language model with a single Transformer architecture for unified vision-language modeling. SOLO accepts both raw image patches (in pixels) and texts as inputs, without using a separate pre-trained vision encoder.

Model date: SOLO-7B was trained in June 2024.

Paper or resources for more information: Paper & Github

Where to send questions or comments about the model: https://github.com/Yangyi-Chen/SOLO/issues

Inference with Huggingface Please check this scripts for an example of performing inference on the model.