vtsum_blip / README.md
jylins
Initial Commit
048fdcf
|
raw
history blame
762 Bytes
---
license: apache-2.0
task_categories:
- summarization
language:
- en
tags:
- cross-modal-video-summarization
- video-summarization
- video-captioning
pretty_name: VideoXum
size_categories:
- 10K<n<100K
---
# VTSUM-BLIP Model Card
## Model details
**Model type:**
VTSUM-BLIP is an end-to-end cross-modal video summarization model.
**Paper or resources for more information:**
https://videoxum.github.io/
## Training dataset
- VideoXum *training* set: 8K long videos long videos with 80K pairs of aligned video and text summaries.
## Evaluation dataset
- VideoXum *val* set: 2K long videos long videos with 80K pairs of aligned video and text summaries.
- VideoXum *test* set: 4K long videos long videos with 80K pairs of aligned video and text summaries.