Skip to main content

WavSyncSwap: End-to-End Portrait-Customized Audio-Driven Talking Face Generation

Weihong Bao (Tsinghua University); Liyang Chen (Tsinghua University); Chaoyong Zhou (Ping An Technology); Sicheng Yang (Tsinghua University); Zhiyong Wu (Tsinghua University)

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
07 Jun 2023

Audio-driven talking face with portrait customization enhances the flexibility of avatar applications for different scenarios, such as online meetings, mixed reality, and data generation. Among the existing methods, audio-driven talking face and face swapping are typically viewed as separate tasks that are cascaded to achieve the objective. Using state-of-the-art methods Wav2Lip and SimSwap for this purpose, we meet some issues: affected mouth synchronization, lost texture information, and slow inference speed. To resolve these issues, we propose an end-to-end model that combines the advantages of both approaches. Our approach generates highly-synchronized mouth with the aid of a pre-trained lip-sync discriminator. And identity information is provided by ArcFace and the ID injection module in the model because of its strong correlation with facial texture. Experimental results demonstrate that our method achieves lip-sync accuracy comparable to real synced videos, preserves more texture details than cascade methods, and alleviates the blurring of Wav2Lip. Also, our approach improves the inference speed.

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00