Efficient Adversarial Audio Synthesis Via Progressive Upsampling

Youngwoo Cho, Minwook Chang, Sanghyeon Lee, Hyoungwoo Lee, Gerard Jounghyun Kim, Jaegul Choo

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:11:12

09 Jun 2021

This paper proposes a novel generative model called \toolname, which progressively synthesizes high-quality audio in raw-waveform. Progressive upsampling GAN (PUGAN) leverages the previous idea of the progressive generation of higher-resolution output by stacking multiple encoder-decoder architectures. Compared to the existing state-of-the-art model called WaveGAN, which uses a single decoder architecture, our model generates audio signals and converts them to a higher resolution in a progressive manner, while using a significantly smaller number of parameters, e.g., 3.17x smaller for 16 kHz output, than the WaveGAN. Our experiments show that the audio signals can be generated in real-time with comparable quality to that of WaveGAN with respect to the inception scores and human perception.

Chairs:

Sven Shepstone

Tags:

signal processing society

IEEE icassp 2021

virtual conference

2021

sps

virtual conference icassp 2021

june 6-11 2021

icassp 2021