Update README.md
Browse files
README.md
CHANGED
@@ -15,18 +15,20 @@ Check out also the Google Colab demo to run Llava on a free-tier Google Colab in
|
|
15 |
|
16 |
Or check out our Spaces demo! [![Open in Spaces](https://huggingface.co/datasets/huggingface/badges/resolve/main/open-in-hf-spaces-md-dark.svg)](https://huggingface.co/spaces/llava-hf/llava-4bit)
|
17 |
|
18 |
-
|
19 |
## Model details
|
20 |
|
21 |
**Model type:**
|
22 |
LLaVA is an open-source chatbot trained by fine-tuning LLaMA/Vicuna on GPT-generated multimodal instruction-following data.
|
23 |
It is an auto-regressive language model, based on the transformer architecture.
|
24 |
|
|
|
|
|
|
|
25 |
**Model date:**
|
26 |
-
|
27 |
|
28 |
**Paper or resources for more information:**
|
29 |
-
https://llava
|
30 |
|
31 |
## How to use the model
|
32 |
|
@@ -125,4 +127,17 @@ model = VipLlavaForConditionalGeneration.from_pretrained(
|
|
125 |
|
126 |
## License
|
127 |
Llama 2 is licensed under the LLAMA 2 Community License,
|
128 |
-
Copyright (c) Meta Platforms, Inc. All Rights Reserved.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
15 |
|
16 |
Or check out our Spaces demo! [![Open in Spaces](https://huggingface.co/datasets/huggingface/badges/resolve/main/open-in-hf-spaces-md-dark.svg)](https://huggingface.co/spaces/llava-hf/llava-4bit)
|
17 |
|
|
|
18 |
## Model details
|
19 |
|
20 |
**Model type:**
|
21 |
LLaVA is an open-source chatbot trained by fine-tuning LLaMA/Vicuna on GPT-generated multimodal instruction-following data.
|
22 |
It is an auto-regressive language model, based on the transformer architecture.
|
23 |
|
24 |
+
Vip-LlaVa enhances the training protocol of Llava by marking images and interact with the model using natural cues like a
|
25 |
+
“red bounding box” or “pointed arrow” during training.
|
26 |
+
|
27 |
**Model date:**
|
28 |
+
ViP-LLaVa was released in December 2023.
|
29 |
|
30 |
**Paper or resources for more information:**
|
31 |
+
https://vip-llava.github.io/
|
32 |
|
33 |
## How to use the model
|
34 |
|
|
|
127 |
|
128 |
## License
|
129 |
Llama 2 is licensed under the LLAMA 2 Community License,
|
130 |
+
Copyright (c) Meta Platforms, Inc. All Rights Reserved.
|
131 |
+
|
132 |
+
## Citation
|
133 |
+
To cite this work please use
|
134 |
+
```bibtex
|
135 |
+
@misc{cai2023making,
|
136 |
+
title={Making Large Multimodal Models Understand Arbitrary Visual Prompts},
|
137 |
+
author={Mu Cai and Haotian Liu and Siva Karthik Mustikovela and Gregory P. Meyer and Yuning Chai and Dennis Park and Yong Jae Lee},
|
138 |
+
year={2023},
|
139 |
+
eprint={2312.00784},
|
140 |
+
archivePrefix={arXiv},
|
141 |
+
primaryClass={cs.CV}
|
142 |
+
}
|
143 |
+
```
|