OPT: One-shot Pose-Controllable Talking Head Generation

Jin Liu (1. Institute of Information Engineering,Chinese Academy of Sciences. 2. School of Cyber Security, University of Chinese Academy of Sciences); Xi Wang (Institute of Information Engineering, Chinese Academy of Sciences ); Xiaomeng Fu (1. Institute of Information Engineering, Chinese Academy of Sciences. 2. School of Cyber Security, University of Chinese Academy of Sciences); chai yesheng (Institute of Information Engineering，Chinese Academy of Sciences); Cai Yu (1. Institute of Information Engineering,Chinese Academy of Sciences. 2. School of Cyber Security, University of Chinese Academy of Sciences); Jiao Dai (Institute of Information Engineering,Chinese Academy of Sciences); Jizhong Han (Institute of Information Engineering,Chinese Academy of Sciences)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

09 Jun 2023

One-shot talking head generation produces lip-sync talking heads based on arbitrary audio and one source face. To guarantee the naturalness and realness, recent methods propose to achieve free pose control instead of simply editing mouth areas. However, existing methods do not preserve accurate identity of source face when generating head motions. To solve the identity mismatch problem and achieve high-quality free pose control, we present One-shot Pose-controllable Talking head generation network (OPT). Specifically, the Audio Feature Disentanglement Module separates content features from audios, eliminating the influence of speaker-specific information contained in arbitrary driving audios. Later, the mouth expression feature is extracted from the content feature and source face, during which the landmark loss is designed to enhance the accuracy of facial structure and identity preserving quality. Finally, to achieve free pose control, controllable head pose features from reference videos are fed into the Video Generator along with the expression feature and source face to generate new talking heads. Extensive quantitative and qualitative experimental results verify that OPT generates high-quality pose-controllable talking heads with no identity mismatch problem, outperforming previous SOTA methods.

Tags:

Image and video content analysis

OPT: One-shot Pose-Controllable Talking Head Generation

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Semi-Federated Learning for Edge Intelligence with Imperfect SIC

ENHANCED GM-PHD FILTER FOR REAL TIME SATELLITE MULTI-TARGET TRACKING

IMAGE COMPLETION VIA DUAL-PATH COOPERATIVE FILTERING

Join the IEEE Signal Processing Society