|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
--- |
|
|
|
|
|
|
|
<br> |
|
<br> |
|
|
|
# SOLO Model Card |
|
|
|
## Model details |
|
|
|
**Model type:** |
|
SOLO is a 7B large vision-language model with a single Transformer architecture for unified vision-language modeling. |
|
SOLO accepts both raw image patches (in pixels) and texts as inputs, without using a separate pre-trained vision encoder. |
|
|
|
|
|
**Model date:** |
|
SOLO-7B was trained in June 2024. |
|
|
|
**Paper or resources for more information:** |
|
[Paper]() |
|
& |
|
[Github](https://github.com/Yangyi-Chen/SOLO) |
|
|
|
|
|
|
|
**Where to send questions or comments about the model:** |
|
https://github.com/Yangyi-Chen/SOLO/issues |
|
|
|
|
|
|