Parallel Wavegan: A Fast Waveform Generation Model Based On Generative Adversarial Networks With Multi-Resolution Spectrogram

Ryuichi Yamamoto, Eunwoo Song, Jae-Min Kim

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 15:33

04 May 2020

We propose Parallel WaveGAN, a distillation-free, fast, and small-footprint waveform generation method using a generative adversarial network. In the proposed method, a non-autoregressive WaveNet is trained by jointly optimizing multi-resolution spectrogram and adversarial loss functions, which can effectively capture the time-frequency distribution of the realistic speech waveform. As our method does not require density distillation used in the conventional teacher-student framework, the entire model can be easily trained. Furthermore, our model is able to generate high-fidelity speech even with its compact architecture. In particular, the proposed Parallel WaveGAN has only 1.44 M parameters and can generate 24 kHz speech waveform 28.68 times faster than real-time on a single GPU environment. Perceptual listening test results verify that our proposed method achieves 4.16 mean opinion score within a Transformer-based text-to-speech framework, which is comparative to the best distillation-based Parallel WaveNet system.

Tags:

sps conference

icassp 2020 virtual conference

May 2020

icassp 2020

Parallel Wavegan: A Fast Waveform Generation Model Based On Generative Adversarial Networks With Multi-Resolution Spectrogram

Ryuichi Yamamoto, Eunwoo Song, Jae-Min Kim

Value-Added Bundle(s) Including this Product

ICASSP 2020 Virtual Conference - Presentation Videos Product Bundle

More Like This

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

IEEE ICASSP 2024, 1 4-19 April 2024, Seoul, Korea. Conference Presentation Videos Bundle

ICIP 2022, October 16-19, 2022, Bordeaux, France - Presentation Videos Product Bundle

Join the IEEE Signal Processing Society