Skip to main content

Exploring Entity-Level Spatial Relationships For Image-Text Matching

Yaxian Xia, Lun Huang, Wenmin Wang, Xiao-Yong Wei, Jie Chen

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 09:34
04 May 2020

Exploring the entity-level (i.e., objects in an image, words in a text) spatial relationship contributes to understanding multimedia content precisely. The ignorance of spatial information in previous works probably leads to misunderstandings of image contents. For instance, sentences `Boats are on the water' and `Boats are under the water' describe the same objects, but correspond to different sceneries. To this end, we utilize the relative position of objects to capture entity-level spatial relationships for image-text matching. Specifically, we fuse semantic and spatial relationships of image objects in a visual intra-modal relation module. The module performs promisingly to understand image contents and improve object representation learning. It contributes to capturing entity-level latent correspondence of image-text pairs. Then the query (text) plays a role of textual context to refine the interpretable alignments of image-text pairs in the inter-modal relation module. Our proposed method achieves state-of-the-art results on MS-COCO and Flickr30K datasets.

Value-Added Bundle(s) Including this Product

More Like This

  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00