wchai's picture
Upload README.md with huggingface_hub
49ab358 verified
|
raw
history blame
2.7 kB
metadata
license: apache-2.0
datasets:
  - Reself/AuroraCap-trainset
base_model:
  - lmsys/vicuna-7b-v1.5-16k
tags:
  - caption
model-index:
  - name: AuroraCap-7B
    results:
      - task:
          type: video detailed caption
        dataset:
          type: VDC
          name: VDC
        metrics:
          - type: Acc
            value: 38.21
            name: VDCScore
          - type: Acc
            value: 48.33
            name: VDD
          - type: cider
            value: 9.51
          - type: bleu
            value: 30.9
            name: bleu@1
          - type: bleu
            value: 4.06
            name: bleu@4
          - type: meteor
            value: 19.09
          - type: rouge
            value: 21.58
            name: rouge-l
      - task:
          type: video caption
        dataset:
          type: MSR-VTT
          name: NSR-VTT
        metrics:
          - type: cider
            value: 33.1
          - type: bleu
            value: 58.6
            name: bleu@1
          - type: bleu
            value: 21
            name: bleu@4
          - type: meteor
            value: 23.9
          - type: rouge
            value: 49.5
            name: rouge-l
      - task:
          type: video caption
        dataset:
          type: VATEX
          name: VATEX
        metrics:
          - type: cider
            value: 33.8
          - type: bleu
            value: 57.1
            name: bleu@1
          - type: bleu
            value: 18.4
            name: bleu@4
          - type: meteor
            value: 19
          - type: rouge
            value: 40.8
            name: rouge-l
      - task:
          type: video question anwering
        dataset:
          type: ActivityNet
          name: ActivityNet
        metrics:
          - type: Acc
            value: 61.8
      - task:
          type: video question anwering
        dataset:
          type: MSVD
          name: MSVD
        metrics:
          - type: Acc
            value: 62.6
      - task:
          type: video question anwering
        dataset:
          type: MSR-VTT
          name: MSR-VTT
        metrics:
          - type: Acc
            value: 43.5
      - task:
          type: video question anwering
        dataset:
          type: iVQA
          name: iVQA
        metrics:
          - type: Acc
            value: 55.2

Resources

Features

AuroraCap is a multimodal large language model for image and video captioning.

Quick Start

See Docs.

Citation