DivCon: Learning Concept Sequences for Semantically Diverse Image Captioning
Yue Zheng (Tsinghua University); Ya-Li Li (Tsinghua University); Shengjin Wang (Tsinghua University)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Human generated image captions contain diverse semantic concepts, while this is still a difficult task for machines. The frequency distribution of semantic concepts in datasets is usually extremely imbalanced, leading to models repeatedly describe frequently occurring semantic concepts, resulting in a decline in the semantic diversity. In this paper, we propose a novel two-step method for diverse image captioning, generating descriptions with more diverse semantic concepts (DivCon). Firstly, we developed a concept sequence generator to auto-regressively generate concept sequences. This benefits the model by decoding sequences in a small searching space. Then a sentence generator takes as input the concept sequences and generates descriptions for each sequence. Experiments show that DivCon can generate captions containing diverse semantic concepts and pay more attention to the less occurring concepts. In the diverse image captioning task, DivCon achieves the state-of-the-art results on MSCOCO dataset with oracle CIDEr and SPICE scores of 1.684 and 0.302.