Video Inference or training

by dosun - opened Apr 12, 2023

Apr 12, 2023

I have tried to run this model for video captioning. However, it only returns a caption for each frame. In the original paper, the model supports video through multiple frames. Is this support at HuggingFace as well?

nielsr

Apr 12, 2023

Hi,

For video captioning I'd recommend taking a look at the GIT checkpoints fine-tuned on video datasets, like https://huggingface.co/microsoft/git-base-vatex

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment