CircleRadon commited on
Commit
de04e4d
·
verified ·
1 Parent(s): 6d5c0bb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +50 -3
README.md CHANGED
@@ -1,3 +1,50 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ metrics:
6
+ - accuracy
7
+ library_name: transformers
8
+ pipeline_tag: visual-question-answering
9
+ tags:
10
+ - multimodal large language model
11
+ - large video-language model
12
+ ---
13
+
14
+
15
+
16
+
17
+ <p align="center">
18
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/64a3fe3dde901eb01df12398/ZrZPYT0Q3wgza7Vc5BmyD.png" width="100%" style="margin-bottom: 0.2;"/>
19
+ <p>
20
+
21
+
22
+ <h3 align="center"><a href="https://arxiv.org/abs/2406.07476" style="color:#4D2B24">
23
+ VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM</a></h3>
24
+
25
+ <h5 align="center"> If you like our project, please give us a star ⭐ on <a href="https://github.com/DAMO-NLP-SG/VideoRefer">Github</a> for the latest update. </h2>
26
+
27
+ <p align="center">
28
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/64a3fe3dde901eb01df12398/iGpjPujqD1OD4V1n_u70u.png" width="100%" style="margin-bottom: 0.2;"/>
29
+ <p>
30
+
31
+ ## 🌏 Model Zoo
32
+ | Model Name | Visual Encoder | Language Decoder | # Training Frames |
33
+ |:----------------|:----------------|:------------------|:----------------:|
34
+ | [VideoRefer-7B]() | [siglip-so400m-patch14-384](https://huggingface.co/google/siglip-so400m-patch14-384) | [Qwen2-7B-Instruct](https://huggingface.co/Qwen/Qwen2-7B-Instruct) | 16 |
35
+ | [VideoRefer-7B-stage2]() | [siglip-so400m-patch14-384](https://huggingface.co/google/siglip-so400m-patch14-384) | [Qwen2-7B-Instruct](https://huggingface.co/Qwen/Qwen2-7B-Instruct) | 16 |
36
+ | [VideoRefer-7B-stage2.5]() | [siglip-so400m-patch14-384](https://huggingface.co/google/siglip-so400m-patch14-384) | [Qwen2-7B-Instruct](https://huggingface.co/Qwen/Qwen2-7B-Instruct) | 16 |
37
+
38
+
39
+ ## 📑 Citation
40
+
41
+ If you find VideoRefer Suite useful for your research and applications, please cite using this BibTeX:
42
+ ```bibtex
43
+ @article{yuan2024videorefersuite,
44
+ title = {VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM},
45
+ author = {Yuqian Yuan, Hang Zhang, Wentong Li, Zesen Cheng, Boqiang Zhang, Long Li, Xin Li, Deli Zhao, Wenqiao Zhang, Yueting Zhuang, Jianke Zhu, Lidong Bing},
46
+ journal={arXiv},
47
+ year={2024},
48
+ url = {}
49
+ }
50
+ ```