Spaces:
Runtime error
Runtime error
language: en | |
datasets: | |
- laion2b | |
# OpenFlamingo-9B | |
[Blog post]() | [Code](https://github.com/mlfoundations/open_flamingo) | [Demo](https://7164d2142d11.ngrok.app) | |
OpenFlamingo is an open source implementation of DeepMind's [Flamingo](https://www.deepmind.com/blog/tackling-multiple-tasks-with-a-single-visual-language-model) models. | |
OpenFlamingo-9B is built off of [CLIP ViT-L/14](https://huggingface.co/openai/clip-vit-large-patch14) and [LLaMA-7B](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/). | |
## Model Details | |
We freeze the pretrained vision encoder and language model, and then we train connecting Perceiver modules and cross-attention layers, following the original Flamingo paper. | |
Our training data is a mixture of [LAION 2B](https://huggingface.co/datasets/laion/laion2B-en) and a large interleaved image-text dataset called Multimodal C4, which will be released soon. | |
The current model is an early checkpoint of an ongoing effort. This checkpoint has seen 5 million interleaved image-text examples from Multimodal C4 and 10 million samples from LAION 2B. | |
## Uses | |
OpenFlamingo-9B is intended to be used **for academic research purposes only.** Commercial use is prohibited, in line with LLaMA's non-commercial license. | |
### Bias, Risks, and Limitations | |
This model may generate inaccurate or offensive outputs, reflecting biases in its training data and pretrained priors. | |
In an effort to mitigate current potential biases and harms, we have deployed a text content filter on model outputs in the OpenFlamingo demo. We continue to red-team the model to understand and improve its safety. | |
## Evaluation | |
We've evaluated this checkpoint on the validation sets for two vision-language tasks: COCO captioning and VQAv2. Results are displayed below. | |
**COCO (CIDEr)** | |
|0-shot|4-shot|8-shot|16-shot|32-shot| | |
|--|--|--|--|--| | |
|65.52|74.28|79.26|81.84|84.52| | |
**VQAv2 (VQA accuracy)** | |
|0-shot|4-shot|8-shot|16-shot|32-shot| | |
|---|---|---|---|---| | |
|43.55|44.05|47.5|48.87|50.34| | |