PuSH - Publication Server of Helmholtz Zentrum München

EgoCVR: An egocentric benchmark for fine-grained composed video retrieval.

In: (18th European Conference on Computer Vision, ECCV 2024, 29 September - 4 October 2024, Milan). Berlin [u.a.]: Springer, 2025. 1-17 (Lect. Notes Comput. Sc. ; 15095 LNCS)
DOI
Open Access Green as soon as Postprint is submitted to ZB.
In Composed Video Retrieval, a video and a textual description which modifies the video content are provided as inputs to the model. The aim is to retrieve the relevant video with the modified content from a database of videos. In this challenging task, the first step is to acquire large-scale training datasets and collect high-quality benchmarks for evaluation. In this work, we introduce EgoCVR, a new evaluation benchmark for fine-grained Composed Video Retrieval using large-scale egocentric video datasets. EgoCVR consists of 2,295 queries that specifically focus on high-quality temporal video understanding. We find that existing Composed Video Retrieval frameworks do not achieve the necessary high-quality temporal video understanding for this task. To address this shortcoming, we adapt a simple training-free method, propose a generic re-ranking framework for Composed Video Retrieval, and demonstrate that this achieves strong results on EgoCVR. Our code and benchmark are freely available at https://github.com/ExplainableML/EgoCVR.
Altmetric
Additional Metrics?
Edit extra informations Login
Publication type Article: Conference contribution
Keywords Fine-grained Video Understanding ; Video Retrieval
ISSN (print) / ISBN 0302-9743
e-ISSN 1611-3349
Conference Title 18th European Conference on Computer Vision, ECCV 2024
Conference Date 29 September - 4 October 2024
Conference Location Milan
Quellenangaben Volume: 15095 LNCS, Issue: , Pages: 1-17 Article Number: , Supplement: ,
Publisher Springer
Publishing Place Berlin [u.a.]