DivCon: Learning Concept Sequences for Semantically Diverse Image Captioning

Yue Zheng (Tsinghua University); Ya-Li Li (Tsinghua University); Shengjin Wang (Tsinghua University)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

06 Jun 2023

Human generated image captions contain diverse semantic concepts, while this is still a difficult task for machines. The frequency distribution of semantic concepts in datasets is usually extremely imbalanced, leading to models repeatedly describe frequently occurring semantic concepts, resulting in a decline in the semantic diversity. In this paper, we propose a novel two-step method for diverse image captioning, generating descriptions with more diverse semantic concepts (DivCon). Firstly, we developed a concept sequence generator to auto-regressively generate concept sequences. This benefits the model by decoding sequences in a small searching space. Then a sentence generator takes as input the concept sequences and generates descriptions for each sequence. Experiments show that DivCon can generate captions containing diverse semantic concepts and pay more attention to the less occurring concepts. In the diverse image captioning task, DivCon achieves the state-of-the-art results on MSCOCO dataset with oracle CIDEr and SPICE scores of 1.684 and 0.302.

Tags:

Machine learning for image processing

DivCon: Learning Concept Sequences for Semantically Diverse Image Captioning

Yue Zheng (Tsinghua University); Ya-Li Li (Tsinghua University); Shengjin Wang (Tsinghua University)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Learning Generalizable Light Field Networks from Few Images

M2TSR: Multi-range and Mix-grained Transformer for Single Image Super-Resolution

Multistage Spatial Context Models for Learned Image Compression

Join the IEEE Signal Processing Society