CRFAST: CLIP-BASED REFERENCE-GUIDED FACIAL IMAGE SEMANTIC TRANSFER
Ailin Li (College of Computer Science and Technology, Zhejiang University); Lei Zhao (Zhejiang University); Zhizhong Wang (Zhejiang University); Zhiwen Zuo (Zhejiang University); Wei Xing (Zhejiang University); Dongming Lu (Zhejiang University)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
This paper presents a new task for reference-guided facial image semantic transfer: the source facial image is translated to the output image with the high-level semantic attributes from the reference image while maintaining identity preservation. To this end, we employ the powerful generative capability of StyleGAN generator and the rich semantic knowledge of CLIP encoder to accomplish such a task. Additionally, a novel contrastive loss is designed to comprehensively explore the rich semantic information of CLIP for facial semantic concepts. This loss guides the semantic transfer toward desired directions from different perspectives in the pre-defined CLIP space. Besides, a simple yet effective semantic-preserved modulation module is proposed to explicitly map CLIP embeddings of reference image to the latent space. Experiments demonstrate that our approach achieves realistic facial image semantic transfer driven by reference images with various facial semantics.