arxiv:2411.19941

Perception Test 2024: Challenge Summary and a Novel Hour-Long VideoQA Benchmark

Published on Nov 29

Authors:

Abstract

Following the successful 2023 edition, we organised the Second Perception Test challenge as a half-day workshop alongside the IEEE/CVF European Conference on Computer Vision (ECCV) 2024, with the goal of benchmarking state-of-the-art video models and measuring the progress since last year using the Perception Test benchmark. This year, the challenge had seven tracks (up from six last year) and covered low-level and high-level tasks, with language and non-language interfaces, across video, audio, and text modalities; the additional track covered hour-long video understanding and introduced a novel video QA benchmark 1h-walk VQA. Overall, the tasks in the different tracks were: object tracking, point tracking, temporal action localisation, temporal sound localisation, multiple-choice video question-answering, grounded video question-answering, and hour-long video question-answering. We summarise in this report the challenge tasks and results, and introduce in detail the novel hour-long video QA benchmark 1h-walk VQA.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2411.19941 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2411.19941 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2411.19941 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.