3D-CSL: SELF-SUPERVISED 3D CONTEXT SIMILARITY LEARNING FOR NEAR-DUPLICATE VIDEO RETRIEVAL

Rui Deng, Qian Wu, Yuke Li

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Poster 11 Oct 2023

In this paper, we introduce 3D-CSL, a compact pipeline for Near-Duplicate Video Retrieval (NDVR), and explore a novel self-supervised learning strategy for video similarity learning. Most previous NDVR methods depend a lot on pair-wise labeled data, so that be limited by the scale of datasets and cannot optimize complex but efficient backbones, e.g., 3D transformers. In order to break this limitation, we explore the self-supervised similarity learning for the NDVR task and propose FCS loss, a novel triplet loss, and ShotMix, a novel video-specific augmentation, which enhances the self-supervised video similarity learning significantly. With this premise, the compact 3D pipeline we propose shows a great advantage in extracting global spatiotemporal dependencies in videos and achieves the best balance between efficiency and effectiveness. Furthermore, we also propose PredMAE to pretrain the 3D transformer with video prediction task as a pretext task to boost the downstream NDVR task without any human labels. The experiments on FIVR-200K and CC_WEB_VIDEO demonstrate the superiority and reliability of our method, which achieves the state-of-the-art performance on clip-level NDVR. Code is released in https://github.com/dun-research/3D-CSL.

Tags:

self-supervised learning

Near-Duplicate Video Retrieval

transformer

3D-CSL: SELF-SUPERVISED 3D CONTEXT SIMILARITY LEARNING FOR NEAR-DUPLICATE VIDEO RETRIEVAL

Rui Deng, Qian Wu, Yuke Li

More Like This

Short Course Bundle: ICIP 2023 COURSE 2: Short Course: Unboxing Advancements in Biomedical Image Processing (Parts 1-4)

The Changing Landscape of Speech Foundation Models

Slides: The Changing Landscape of Speech Foundation Models

Join the IEEE Signal Processing Society