Decaying Contrast for Fine-grained Video Representation Learning

Heng Zhang (Gaoling School of Artificial Intelligence,Renmin University of China); Bing Su (Renmin University of China)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

06 Jun 2023

Prior contrast-based methods for video representation learning mainly focus on clip discrimination while ignoring the context and relationship of clips from the same video in the temporal dimension. As a consequence, the learned spatiotemporal representations for successive clips are inconsistent and hence perform poorly in fine-grained downstream tasks such as video fragment retrieval or localization. In this paper, we propose a decaying strategy to grasp the gradual evolution along the temporal dimension for fine-grained spatiotemporal representation learning, which consists of two novel contrastive losses. The external decaying contrastive loss is designed to increase the relative similarity of clips from the same video while the internal decaying contrastive loss aims to maintain the discrimination of clips. Experimental results show that the proposed decaying contrastive training approach achieves a significant improvement in the fine-grained video retrieval task on multiple benchmark datasets.

Tags:

Image and video content analysis

Decaying Contrast for Fine-grained Video Representation Learning

Heng Zhang (Gaoling School of Artificial Intelligence,Renmin University of China); Bing Su (Renmin University of China)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

IMAGE COMPLETION VIA DUAL-PATH COOPERATIVE FILTERING

PROGRESSIVE REFINEMENT LEARNING BASED ON FEATURE CROSS PERCEPTION FOR RESIDENTIAL AREAS SEMANTIC SEGMENTATION

OPT: One-shot Pose-Controllable Talking Head Generation

Join the IEEE Signal Processing Society