Temporal Contrastive Learning with Curriculum
Shuvendu Roy (Queen's University); Ali Etemad (Queen's University)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
We present ConCur, a contrastive video representation learning method that uses curriculum learning to impose a dynamic sampling strategy in contrastive training. More specifically, ConCur starts the contrastive training with easy positive samples (temporally close and semantically similar clips), and as the training progresses, it increases the temporal span effectively sampling hard positives (temporally away and semantically dissimilar). To learn better context-aware representations, we also propose an auxiliary task of predicting the temporal distance between a positive pair of clips. We conduct extensive experiments on two popular action recognition datasets, UCF101, and HMDB51, on which our proposed method achieves superior performance on video action recognition and video retrieval. Detailed ablation studies show the effectiveness of each of the components of our proposed method.