STRUCTURE-AWARE GENERATIVE ADVERSARIAL NETWORK FOR TEXT-TO-IMAGE GENERATION

Wenjie Chen, Zhangkai Ni, Hanli Wang

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Lecture 10 Oct 2023

Text-to-image generation aims at synthesizing photo-realistic images from textual descriptions. Existing methods typically align images with the corresponding texts in a joint semantic space. However, the presence of the modality gap in the joint semantic space leads to misalignment. Meanwhile, the limited receptive field of the convolutional neural network leads to structural distortions of generated images. In this work, a structure-aware generative adversarial network (SaGAN) is proposed for (1) semantically aligning multimodel features in the joint semantic space in a learnable manner; and (2) improving the structure and contour of generated images by the designed content-invariant negative samples. Compared with the state-of-the-art models, experimental results show that SaGAN achieves over 30.1% and 8.2% improvements in terms of FID on CUB and COCO datasets, respectively.

Tags:

Text-to-image generation

generative adversarial network

negative data augmentation

STRUCTURE-AWARE GENERATIVE ADVERSARIAL NETWORK FOR TEXT-TO-IMAGE GENERATION

Wenjie Chen, Zhangkai Ni, Hanli Wang

More Like This

LOW-SAMPLING-FREQUENCY PLANE WAVE MEDICAL ULTRASOUND IMAGING BASED ON ADVERSARIAL LEARNING

OMISSION-FREE INPAINTING: A THREE-STAGE APPROACH TO ENSURE OBJECT GENERATION

MDFD: STUDY OF DISTRIBUTED NON-IID SCENARIOS AND FRECHET DISTANCE-BASED EVALUATION

Join the IEEE Signal Processing Society