llava-hf
/

vip-llava-13b-hf

image-text-to-text

Model card Files Files and versions Community

nielsr HF staff commited on Feb 5

Commit

f846db6

•

1 Parent(s): 8faa4a7

Update README.md

Files changed (1) hide show

README.md +19 -4

README.md CHANGED Viewed

@@ -15,18 +15,20 @@ Check out also the Google Colab demo to run Llava on a free-tier Google Colab in
 Or check out our Spaces demo! [![Open in Spaces](https://huggingface.co/datasets/huggingface/badges/resolve/main/open-in-hf-spaces-md-dark.svg)](https://huggingface.co/spaces/llava-hf/llava-4bit)
 ## Model details
 **Model type:**
 LLaVA is an open-source chatbot trained by fine-tuning LLaMA/Vicuna on GPT-generated multimodal instruction-following data.
 It is an auto-regressive language model, based on the transformer architecture.
 **Model date:**
-LLaVA-v1.5-7B was trained in September 2023.
 **Paper or resources for more information:**
-https://llava-vl.github.io/
 ## How to use the model
@@ -125,4 +127,17 @@ model = VipLlavaForConditionalGeneration.from_pretrained(
 ## License
 Llama 2 is licensed under the LLAMA 2 Community License,
-Copyright (c) Meta Platforms, Inc. All Rights Reserved.

 Or check out our Spaces demo! [![Open in Spaces](https://huggingface.co/datasets/huggingface/badges/resolve/main/open-in-hf-spaces-md-dark.svg)](https://huggingface.co/spaces/llava-hf/llava-4bit)
 ## Model details
 **Model type:**
 LLaVA is an open-source chatbot trained by fine-tuning LLaMA/Vicuna on GPT-generated multimodal instruction-following data.
 It is an auto-regressive language model, based on the transformer architecture.
+Vip-LlaVa enhances the training protocol of Llava by marking images and interact with the model using natural cues like a
+“red bounding box” or “pointed arrow” during training.
 **Model date:**
+ViP-LLaVa was released in December 2023.
 **Paper or resources for more information:**
+https://vip-llava.github.io/
 ## How to use the model
 ## License
 Llama 2 is licensed under the LLAMA 2 Community License,
+Copyright (c) Meta Platforms, Inc. All Rights Reserved.
+## Citation
+To cite this work please use
+```bibtex
+@misc{cai2023making,
+      title={Making Large Multimodal Models Understand Arbitrary Visual Prompts},
+      author={Mu Cai and Haotian Liu and Siva Karthik Mustikovela and Gregory P. Meyer and Yuning Chai and Dennis Park and Yong Jae Lee},
+      year={2023},
+      eprint={2312.00784},
+      archivePrefix={arXiv},
+      primaryClass={cs.CV}
+}
+```