Skip to main content

Captioning Transformer With Scene Graph Guiding

Haishun Chen, Ying Wang, Xin Yang, Jie Li

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 00:05:59
22 Sep 2021

Image captioning is a challenging task which aims to generate descriptions of images. Most existing approaches adopt the encoder-decoder architecture, where encoder takes the image as input and decoder predicts corresponding word sequence. However, a common defect of these methods is that the abundant semantic relationships between relevant regions are ignored, leading the decoder to give a misled caption. To alleviate this issue, we propose a novel model, which utilizes sufficient semantic relationships provided by scene graph to guide the word generation process. To some extent, the scene graph narrows the semantic gap between images and descriptions, and hence improves the quality of generated sentences. Extensive experimental results demonstrate that our model achieves superior performance on various quantitative metrics.

Value-Added Bundle(s) Including this Product

More Like This

  • SPS
    Members: Free
    IEEE Members: $25.00
    Non-members: $40.00
  • SPS
    Members: Free
    IEEE Members: $25.00
    Non-members: $40.00
  • SPS
    Members: Free
    IEEE Members: $25.00
    Non-members: $40.00