CircleRadon's picture
Update README.md
5aecb8d verified
|
raw
history blame
2.3 kB
metadata
license: apache-2.0
language:
  - en
metrics:
  - accuracy
library_name: transformers
pipeline_tag: visual-question-answering
tags:
  - multimodal large language model
  - large video-language model

VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM

If you like our project, please give us a star ⭐ on Github for the latest update.

🌏 Model Zoo

πŸ“‘ Citation

If you find VideoRefer Suite useful for your research and applications, please cite using this BibTeX:

@article{yuan2024videorefersuite,
  title = {VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM},
  author = {Yuqian Yuan, Hang Zhang, Wentong Li, Zesen Cheng, Boqiang Zhang, Long Li, Xin Li, Deli Zhao, Wenqiao Zhang, Yueting Zhuang, Jianke Zhu, Lidong Bing},
  journal={arXiv},
  year={2024},
  url = {}
}