In this paper, we introduce 3D-CSL, a compact pipeline for Near-Duplicate Video Retrieval (NDVR), and explore a novel self-supervised learning strategy for video similarity learning. Most previous NDVR methods depend a lot on pair-wise labeled data, so that be limited by the scale of datasets and cannot optimize complex but efficient backbones, e.g., 3D transformers.
More