Initial Commit

Files changed (3) hide show

README.md CHANGED Viewed

@@ -1,3 +1,31 @@
 ---
 license: apache-2.0
 ---

 ---
 license: apache-2.0
+task_categories:
+- summarization
+language:
+- en
+tags:
+- cross-modal-video-summarization
+- video-summarization
+- video-captioning
+pretty_name: VideoXum
+size_categories:
+- 10K<n<100K
 ---
+# VTSUM-BLIP Model Card
+## Model details
+**Model type:**
+VTSUM-BLIP is an end-to-end cross-modal video summarization model.
+**Paper or resources for more information:**
+https://videoxum.github.io/
+## Training dataset
+- VideoXum *training* set: 8K long videos long videos with 80K pairs of aligned video and text summaries.
+## Evaluation dataset
+- VideoXum *val* set: 2K long videos long videos with 80K pairs of aligned video and text summaries.
+- VideoXum *test* set: 4K long videos long videos with 80K pairs of aligned video and text summaries.

vtsum_tt.pth ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:b8a39b35dfe57f6fe0af666d314a699d5d49f9b82d8a8808ba95f0a32cdeeebd
+size 581582010

vtsum_tt_ca.pth ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:57451c4712afcee8d2f5c817c8c7399c2f03e0394d884e409afe9699b2896cdb
+size 591040593