Efficient Adversarial Audio Synthesis Via Progressive Upsampling
Youngwoo Cho, Minwook Chang, Sanghyeon Lee, Hyoungwoo Lee, Gerard Jounghyun Kim, Jaegul Choo
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:11:12
This paper proposes a novel generative model called \toolname, which progressively synthesizes high-quality audio in raw-waveform. Progressive upsampling GAN (PUGAN) leverages the previous idea of the progressive generation of higher-resolution output by stacking multiple encoder-decoder architectures. Compared to the existing state-of-the-art model called WaveGAN, which uses a single decoder architecture, our model generates audio signals and converts them to a higher resolution in a progressive manner, while using a significantly smaller number of parameters, e.g., 3.17x smaller for 16 kHz output, than the WaveGAN. Our experiments show that the audio signals can be generated in real-time with comparable quality to that of WaveGAN with respect to the inception scores and human perception.
Chairs:
Sven Shepstone