Noise Level Limited Sub-Modeling For Diffusion Probabilistic Vocoders
Takuma Okamoto, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:10:05
Although diffusion probabilistic vocoders WaveGrad and DiffWave can realize real-time high-fidelity speech synthesis with a simple loss function in training, all noise components with full noise level range are predicted by one model in all iterations. This paper proposes a simple but effective noise level limited sub-modeling framework for diffusion probabilistic vocoders as Sub-WaveGrad and Sub-DiffWave. In the proposed method, DiffWave conditioned on continuous noise level as WaveGrad and spectral enhancement post-filtering are also provided. The proposed Sub-WaveGrad and Sub-DiffWave models are realized by using 10 sub-models. These models are separately trained with different limited noise levels, and only necessary sub-models are used according to the noise schedule in inference. The results of experiments using a Japanese female speech corpus indicate that both the proposed Sub-WaveGrad and Sub-DiffWave outperform vanilla WaveGrad and DiffWave in terms of the model accuracy and synthesis quality while keeping the inference speed.
Chairs:
Jiangyan Yi