A Synthetic Corpus Generation Method for Neural Vocoder Training

Zilin Wang (Tsinghua University); peng liu (transsion); Jun Chen (Tsinghua University); Sipan Li (Tsinghua University); Baijin Feng (TAL Education Group); He Gang (TAL Education Group); Zhiyong Wu (Tsinghua University); Helen Meng (The Chinese University of Hong Kong)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

06 Jun 2023

Nowadays, neural vocoders are preferred for their ability to synthesize high-fidelity audio. However, training a neural vocoder requires a massive corpus of high-quality real audio, and the audio recording process is often labor-intensive. In this work, we propose a synthetic corpus generation method for neural vocoder training, which can easily generate synthetic audio with an unlimited number at nearly no cost. We explicitly model the prior characteristics of audio from multiple target domains simultaneously (e.g., speeches, singing voices, and instrumental pieces) to equip the generated audio data with these characteristics. And we show that our synthetic corpus allows the neural vocoder to achieve competitive results without any real audio in the training process. To validate the effectiveness of our proposed method, we performed empirical experiments on both speech and music utterances in subjective and objective metrics. The experimental results show that the neural vocoder trained with the synthetic corpus produced by our method can generalize to multiple target scenarios and has excellent singing voice (MOS: 4.20) and instrumental piece (MOS: 4.00) synthesis results.

Tags:

Speech production, perception and psychoacoustics

A Synthetic Corpus Generation Method for Neural Vocoder Training

Zilin Wang (Tsinghua University); peng liu (transsion); Jun Chen (Tsinghua University); Sipan Li (Tsinghua University); Baijin Feng (TAL Education Group); He Gang (TAL Education Group); Zhiyong Wu (Tsinghua University); Helen Meng (The Chinese University of Hong Kong)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Summary on the Multimodal Information based Speech Processing (MISP) 2022 Challenge

Auditory EEG Decoding Challenge

Spoken Language Understanding Grand Challenge

Join the IEEE Signal Processing Society