CSTNET: Enhancing Global-To-Local interactions For Image Captioning
Xin Yang, Ying Wang, Haishun Chen, Jie Li
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:09:09
Image translation from human faces to anime ones brings a low-end, efficient way to create animation characters for animation industry. However, due to the significant inter-domain difference between anime images and human photos, existing image-to-image translation approaches cannot address this task well. To solve this dilemma, we propose HyProGAN, an exemplar-guided image-to-image translation model without paired data. The key contribution of HyProGAN is that it introduces a novel hybrid and progressive training strategy that expands the unidirectional translation between two domains into the bidirectional intra-domain and inter-domain translation. To enhance the consistency between input and output, we further propose a local masking loss to align the facial features between the human face and the generated anime face. Extensive experiments demonstrate the superiority of HyProGAN against state-of-the-art models.