AUDIO-DRIVEN HIGH DEFINETION AND LIP-SYNCHRONIZED TALKING FACE GENERATION BASED ON FACE REENACTMENT

Xianyu Wang (Huawei Technologies Co., Ltd.); Yuhan Zhang (Peking University); Weihua He (Tsinghua University); Yaoyuan Wang (Huawei Technologies Co., Ltd.); Minglei Li (Huawei Technologies Co., Ltd.); Yuchen Wang (Huawei Technologies Co., Ltd.); Jingyi Zhang (Huawei Technologies Co., Ltd.); Shunbo Zhou (Huawei Cloud); Ziyang Zhang (HUAWEI TECHNOLOGIES CO.LTD)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

06 Jun 2023

Generating audio-driven photo-realistic talking face has received intensive attention due to its ability to bring more new human-computer interaction experiences. However, previous works struggled to balance high definition, lip synchronization, and low customization costs, which would degrade the user experience. In this paper, a novel audio-driven talking face generation method was proposed, which subtly converts the problem of improving video definition into the problem of face reenactment to produce both lip-synchronized and high-definition face video. The framework is decoupled, meaning that the same trained model can be used on arbitrary characters and audio without further customizing training for specific people, thus significantly reducing costs. Experiment results show that our proposed method achieves the high video definition, and comparable lip synchronization performance with the existing state-of-the-art methods.

Tags:

Multimedia analysis and synthesis

AUDIO-DRIVEN HIGH DEFINETION AND LIP-SYNCHRONIZED TALKING FACE GENERATION BASED ON FACE REENACTMENT

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

TWO-STREAM JOINT-TRAINING FOR SPEAKER INDEPENDENT ACOUSTIC-TO-ARTICULATORY INVERSION

Code-Switching Speech Synthesis Based on Self- Supervised Learning and Domain Adaptive Speaker Encoder

Detecting Out-of-distribution Examples via Class-conditional Impressions Reappearing

Join the IEEE Signal Processing Society