AUDIO-DRIVEN HIGH DEFINETION AND LIP-SYNCHRONIZED TALKING FACE GENERATION BASED ON FACE REENACTMENT
Xianyu Wang (Huawei Technologies Co., Ltd.); Yuhan Zhang (Peking University); Weihua He (Tsinghua University); Yaoyuan Wang (Huawei Technologies Co., Ltd.); Minglei Li (Huawei Technologies Co., Ltd.); Yuchen Wang (Huawei Technologies Co., Ltd.); Jingyi Zhang (Huawei Technologies Co., Ltd.); Shunbo Zhou (Huawei Cloud); Ziyang Zhang (HUAWEI TECHNOLOGIES CO.LTD)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Generating audio-driven photo-realistic talking face has received intensive attention due to its ability to bring more new human-computer interaction experiences. However, previous works struggled to balance high definition, lip synchronization, and low customization costs, which would degrade the user experience. In this paper, a novel audio-driven talking face generation method was proposed, which subtly converts the problem of improving video definition into the problem of face reenactment to produce both lip-synchronized and high-definition face video. The framework is decoupled, meaning that the same trained model can be used on arbitrary characters and audio without further customizing training for specific people, thus significantly reducing costs. Experiment results show that our proposed method achieves the high video definition, and comparable lip synchronization performance with the existing state-of-the-art methods.