Skip to main content

Rethink pair-wise self-supervised cross-modal retrieval from a contrastive learning perspective

Tiantian Gong (Nanjing University of Aeronautics and Astronautics); Junsheng Wang (Nanjing University of Science And Technology); Liyan Zhang (Nanjing University of Aeronautics and Astronautics)

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
06 Jun 2023

Cross-modal retrieval often faces the challenges of eliminating modality gap, learning robust modality invariance and semantic discrimination. Existing self-supervised cross-modal approaches still suffer from the faulty negative sample selection strategy and the lack of reliable high-level semantic discriminative guidance. Therefore, we propose a robust self-supervised co-training instance and semantic discrimination learning method (RCL) for cross-modal retrieval. Specifically, by the k-reciprocal nearest neighbor to generate pairwise pseudo-labels, we can correctly select negative samples and better filter the false negative ones, and thus pull semantically similar instances closer in a similar supervised contrastive learning. In addition, we use prototype contrastive learning to learn high-level semantic discriminative representations from different semantic groups, which pull instances and prototype vectors closer to better learn the semantic structure of multimodal data. Extensive experiments demonstrate the effectiveness of our method on cross-modal datasets.

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00