A TEXT-GUIDED GRAPH STRUCTURE FOR IMAGE CAPTIONING
Depeng Wang, Zhenzhen Hu, Yuanen Zhou, Xueliang Wang, Le Wu, Richang Hong
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 09:01
Image captioning task requires a comprehensive understanding of visual content and has received a significant amount of attention. Recent studies have revealed that modelling relationships between visual objects imply a high-level semantic feature. However, most existing relationship modelling methods for image captioning heavily rely on the object detection results and handcrafted structured label to build the graph model. In this paper, we explore the relationships in a text-guided way via the descriptions from similar images to provide the context clues. We propose a novel framework named Text-Guided Graph (TGG) to employ image-related text to help build the relationship between objects in the image and incorporate the high-level graph information and captions associated with a certain image. Experiments conducted on the MS COCO dataset demonstrate the effectiveness of our text-guided graph model under various standard evaluation metrics.