File size: 2,703 Bytes
49ab358
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
---
license: apache-2.0
datasets:
- Reself/AuroraCap-trainset
base_model:
- lmsys/vicuna-7b-v1.5-16k
tags:
- caption
model-index:
- name: AuroraCap-7B
  results:
  - task:
      type: video detailed caption
    dataset:
      type: VDC
      name: VDC
    metrics:
      - type: Acc
        value: 38.21
        name: VDCScore
      - type: Acc
        value: 48.33
        name: VDD
      - type: cider
        value: 9.51
      - type: bleu
        value: 30.90
        name: bleu@1
      - type: bleu
        value: 4.06
        name: bleu@4
      - type: meteor
        value: 19.09
      - type: rouge
        value: 21.58
        name: rouge-l
  - task:
      type: video caption
    dataset:
      type: MSR-VTT
      name: NSR-VTT
    metrics:
      - type: cider
        value: 33.1
      - type: bleu
        value: 58.6
        name: bleu@1
      - type: bleu
        value: 21.0
        name: bleu@4
      - type: meteor
        value: 23.9
      - type: rouge
        value: 49.5
        name: rouge-l
  - task:
      type: video caption
    dataset:
      type: VATEX
      name: VATEX
    metrics:
      - type: cider
        value: 33.8
      - type: bleu
        value: 57.1
        name: bleu@1
      - type: bleu
        value: 18.4
        name: bleu@4
      - type: meteor
        value: 19.0
      - type: rouge
        value: 40.8
        name: rouge-l
  - task:
      type: video question anwering
    dataset:
      type: ActivityNet
      name: ActivityNet
    metrics:
      - type: Acc
        value: 61.8
  - task:
      type: video question anwering
    dataset:
      type: MSVD
      name: MSVD
    metrics:
      - type: Acc
        value: 62.6
  - task:
      type: video question anwering
    dataset:
      type: MSR-VTT
      name: MSR-VTT
    metrics:
      - type: Acc
        value: 43.5
  - task:
      type: video question anwering
    dataset:
      type: iVQA
      name: iVQA
    metrics:
      - type: Acc
        value: 55.2
---

<img src="assets/teaser.png" align="center">

## Resources

- [Website](https://rese1f.github.io/aurora-web/)
- [arXiv: Paper]()
- [GitHub: Code](https://github.com/rese1f/aurora)
- [Huggingface: AuroraCap Model](https://huggingface.co/collections/Reself/auroracap-66d117ffe13bedda96702013)
- [Huggingface: VDC Benchmark](https://huggingface.co/datasets/Reself/Video-Detailed-Caption)
- [Huggingface: Trainset](https://huggingface.co/datasets/Reself/AuroraCap-trainset)
  
## Features

<img src="assets/vdc_baseline.png" align="center">

AuroraCap is a multimodal large language model for image and video captioning. 

## Quick Start
See [Docs](https://github.com/rese1f/aurora/blob/main/docs/auroracap/README.md).
## Citation