Skip to main content

Decaying Contrast for Fine-grained Video Representation Learning

Heng Zhang (Gaoling School of Artificial Intelligence,Renmin University of China); Bing Su (Renmin University of China)

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
06 Jun 2023

Prior contrast-based methods for video representation learning mainly focus on clip discrimination while ignoring the context and relationship of clips from the same video in the temporal dimension. As a consequence, the learned spatiotemporal representations for successive clips are inconsistent and hence perform poorly in fine-grained downstream tasks such as video fragment retrieval or localization. In this paper, we propose a decaying strategy to grasp the gradual evolution along the temporal dimension for fine-grained spatiotemporal representation learning, which consists of two novel contrastive losses. The external decaying contrastive loss is designed to increase the relative similarity of clips from the same video while the internal decaying contrastive loss aims to maintain the discrimination of clips. Experimental results show that the proposed decaying contrastive training approach achieves a significant improvement in the fine-grained video retrieval task on multiple benchmark datasets.

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00