Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 00:12:08
21 Sep 2021

Zero-shot cross-modal retrieval (ZS-CMR) performs the task of cross-modal retrieval where the classes of test categories have a different scope than the training categories. It borrows the intuition from zero-shot learning which targets to transfer the knowledge inferred during the training phase for seen classes to the testing phase for unseen classes. It mimics the real-world scenario where new object categories are continuously populating the multi-media data corpus. Unlike existing ZS-CMR approaches which use generative adversarial networks (GANs) to generate more data, we propose Inter-Modality Fusion based Attention (IMFA) and a framework ZS_INN_FUSE(Zero-Shot cross-modal retrieval using INNer product with image-text FUSEd). It exploits the rich semantics of textual data as guidance to infer additional knowledge during the training phase. This is achieved by generating attention weights through the fusion of image and text modalities to focus on the important regions in an image. We carefully create a zero-shot split based on the large-scale MS-COCO and Flickr30k datasets to perform experiments. The results show that our method achieves improvement over the ZS-CMR baseline and self-attention mechanism, demonstrating the effectiveness of inter-modality fusion in a zero-shot scenario.

Value-Added Bundle(s) Including this Product

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00