Exploring Entity-Level Spatial Relationships For Image-Text Matching

Yaxian Xia, Lun Huang, Wenmin Wang, Xiao-Yong Wei, Jie Chen

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 09:34

04 May 2020

Exploring the entity-level (i.e., objects in an image, words in a text) spatial relationship contributes to understanding multimedia content precisely. The ignorance of spatial information in previous works probably leads to misunderstandings of image contents. For instance, sentences `Boats are on the water' and `Boats are under the water' describe the same objects, but correspond to different sceneries. To this end, we utilize the relative position of objects to capture entity-level spatial relationships for image-text matching. Specifically, we fuse semantic and spatial relationships of image objects in a visual intra-modal relation module. The module performs promisingly to understand image contents and improve object representation learning. It contributes to capturing entity-level latent correspondence of image-text pairs. Then the query (text) plays a role of textual context to refine the interpretable alignments of image-text pairs in the inter-modal relation module. Our proposed method achieves state-of-the-art results on MS-COCO and Flickr30K datasets.

Tags:

sps conference

icassp 2020 virtual conference

May 2020

icassp 2020

Exploring Entity-Level Spatial Relationships For Image-Text Matching

Yaxian Xia, Lun Huang, Wenmin Wang, Xiao-Yong Wei, Jie Chen

Value-Added Bundle(s) Including this Product

ICASSP 2020 Virtual Conference - Presentation Videos Product Bundle

More Like This

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

IEEE ICASSP 2024, 1 4-19 April 2024, Seoul, Korea. Conference Presentation Videos Bundle

ICIP 2022, October 16-19, 2022, Bordeaux, France - Presentation Videos Product Bundle

Join the IEEE Signal Processing Society