license: apache-2.0 | |
datasets: | |
- Reself/AuroraCap-trainset | |
base_model: | |
- lmsys/vicuna-7b-v1.5-16k | |
tags: | |
- caption | |
model-index: | |
- name: AuroraCap-7B | |
results: | |
- task: | |
type: video detailed caption | |
dataset: | |
type: VDC | |
name: VDC | |
metrics: | |
- type: Acc | |
value: 38.21 | |
name: VDCScore | |
- type: Acc | |
value: 48.33 | |
name: VDD | |
- type: cider | |
value: 9.51 | |
- type: bleu | |
value: 30.90 | |
name: bleu@1 | |
- type: bleu | |
value: 4.06 | |
name: bleu@4 | |
- type: meteor | |
value: 19.09 | |
- type: rouge | |
value: 21.58 | |
name: rouge-l | |
- task: | |
type: video caption | |
dataset: | |
type: MSR-VTT | |
name: NSR-VTT | |
metrics: | |
- type: cider | |
value: 33.1 | |
- type: bleu | |
value: 58.6 | |
name: bleu@1 | |
- type: bleu | |
value: 21.0 | |
name: bleu@4 | |
- type: meteor | |
value: 23.9 | |
- type: rouge | |
value: 49.5 | |
name: rouge-l | |
- task: | |
type: video caption | |
dataset: | |
type: VATEX | |
name: VATEX | |
metrics: | |
- type: cider | |
value: 33.8 | |
- type: bleu | |
value: 57.1 | |
name: bleu@1 | |
- type: bleu | |
value: 18.4 | |
name: bleu@4 | |
- type: meteor | |
value: 19.0 | |
- type: rouge | |
value: 40.8 | |
name: rouge-l | |
- task: | |
type: video question anwering | |
dataset: | |
type: ActivityNet | |
name: ActivityNet | |
metrics: | |
- type: Acc | |
value: 61.8 | |
- task: | |
type: video question anwering | |
dataset: | |
type: MSVD | |
name: MSVD | |
metrics: | |
- type: Acc | |
value: 62.6 | |
- task: | |
type: video question anwering | |
dataset: | |
type: MSR-VTT | |
name: MSR-VTT | |
metrics: | |
- type: Acc | |
value: 43.5 | |
- task: | |
type: video question anwering | |
dataset: | |
type: iVQA | |
name: iVQA | |
metrics: | |
- type: Acc | |
value: 55.2 | |
<img src="assets/teaser.png" align="center"> | |
## Resources | |
- [Website](https://rese1f.github.io/aurora-web/) | |
- [arXiv: Paper]() | |
- [GitHub: Code](https://github.com/rese1f/aurora) | |
- [Huggingface: AuroraCap Model](https://huggingface.co/collections/Reself/auroracap-66d117ffe13bedda96702013) | |
- [Huggingface: VDC Benchmark](https://huggingface.co/datasets/Reself/Video-Detailed-Caption) | |
- [Huggingface: Trainset](https://huggingface.co/datasets/Reself/AuroraCap-trainset) | |
## Features | |
<img src="assets/vdc_baseline.png" align="center"> | |
AuroraCap is a multimodal large language model for image and video captioning. | |
## Quick Start | |
See [Docs](https://github.com/rese1f/aurora/blob/main/docs/auroracap/README.md). | |
## Citation |