TEXT TO IMAGE SYNTHESIS WITH BIDIRECTIONAL GENERATIVE ADVERSARIAL NETWORK

Zixu Wang, Zhe Quan, Zhi-Jie Wang, Xinjian Hu, Yangyang Chen

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 09:31

08 Jul 2020

Generating realistic images from text descriptions is a challenging problem in computer vision. Although previous works have shown remarkable progress, guaranteeing semantic consistency between text descriptions and images remains challenging. To generate semantically consistent images, we propose two semantics-enhanced modules and a novel Textual-Visual Bidirectional Generative Adversarial Network (TVBi-GAN). Specifically, this paper proposes a semantics-enhanced attention module and a semantics-enhanced batch normalization module. These modules improve consistency of synthesized images by involving precisely semantic features. What’s more, an encoder network is proposed to extract semantic features from images. During adversarial process, encoder could guide our generator to explore corresponding features behind descriptions. With extensive experiments on CUB and COCO datasets, we demonstrate that our TVBi-GAN outperforms existing state-of-the-art methods.

Tags:

icme 2020

sps conference