Waveffjord: Ffjord-Based Vocoder For Statistical Parametric Speech Synthesis
Ning-Qian Wu, Zhen-Hua Ling
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 08:50
Free-form Jacobian of Reversible Dynamics(FFJORD) is a flow-based invertible generative model defined by ordinary differential equations (ODE). Inspired by WaveGlow, in this paper, we propose WaveFFJORD, a neural vocoder that can synthesize speech waveforms from acoustic features, by combining FFJORD and WaveNet. WaveFFJORD can generate speech waveforms directly by the black-box ODE solvers, without the need for autoregressive structures. Our experimental results show that WaveFFJORD can achieve a smaller model size, lower memory cost, and better speech quality than WaveGlow. Besides, the ODE framework allows users to control the generation speed and quality by setting the error tolerance of the ODE solvers.