A CONTEXT-BASED NETWORK FOR REFERRING IMAGE SEGMENTATION
Xinyu Li, Yu Liu, Kaiping Xu, Zhehuan Zhao, Sipei Liu
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 07:15
Referring image segmentation is an important task aiming at segmenting out the object referred by a natural language expression. Current works usually employ the methods of concatenating the visual and linguistic features. They underestimate the importance of language-to-vision and object-to-object relationships when the natural language expression has multiple entities. Therefore, we propose a new network named Context-Based Network(CBN) to improve the accuracy of locating the correct referent. The CBN is composed of two modules: Intra Relation Selection(Intra-RS) and Inter Relation Selection(Inter-RS). The Intra-RS can capture object-to-object relationships in an embedding visual and linguistic feature space and the Inter-RS uses the multi-scale linguistic features as a guide to match the most similar region from the image feature maps. Besides, we apply spatial pyramid pooling to get global information to solve the limited receptive field problem. Experimental results on four public datasets showed that CBN achieved comparable performance to the other state-of-art methods.