Captioning Transformer With Scene Graph Guiding

Haishun Chen, Ying Wang, Xin Yang, Jie Li

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:05:59

22 Sep 2021

Image captioning is a challenging task which aims to generate descriptions of images. Most existing approaches adopt the encoder-decoder architecture, where encoder takes the image as input and decoder predicts corresponding word sequence. However, a common defect of these methods is that the abundant semantic relationships between relevant regions are ignored, leading the decoder to give a misled caption. To alleviate this issue, we propose a novel model, which utilizes sufficient semantic relationships provided by scene graph to guide the word generation process. To some extent, the scene graph narrows the semantic gap between images and descriptions, and hence improves the quality of generated sentences. Extensive experimental results demonstrate that our model achieves superior performance on various quantitative metrics.

Tags:

signal processing society

IEEE icip 2021

september 19-22

virtual conference

2021

sps

virtual conference icip 2021

icip 2021

Captioning Transformer With Scene Graph Guiding

Haishun Chen, Ying Wang, Xin Yang, Jie Li

Value-Added Bundle(s) Including this Product

ICIP 2021 Virtual Conference - Presentation Videos Product Bundle

More Like This

Welcome and Opening Remarks for the IEEE SustainTech Leadership Forum

Panel: Building Sustainable Cities for Tomorrow

Panel: Unleashing the Potential of Virtual Power Plants for Sustainable Energy Solutions

Join the IEEE Signal Processing Society