Efficient Shallow Wavenet Vocoder Using Multiple Samples Output Based On Laplacian Distribution And Linear Prediction
Patrick Lumban Tobing, Yi-Chiao Wu, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 13:16
This paper presents a novel way for an efficient implementation scheme of shallow WaveNet vocoder with multiple samples (segment) output based on the use of Laplacian distribution and linear prediction. In our previous work, we have proposed a shallow architecture for WaveNet vocoder that utilizes only 9 dilated convolutional layers while capable of generating high-quality speech with the use of Laplacian distribution in speech samples modeling. However, there is still a lot of room for improvements to increase the computation efficiency, such as by the inference of segment output and the use of a more compact structure. In this work, we tackle this issue by proposing a simple implementation scheme of segment output modeling, that can be easily extended into other neural vocoders, where the Laplacian distribution parameters of multiple samples are estimated simultaneously. Further, to preserve the dependencies of the samples within the segment, we also propose utilizing linear prediction (LP) to compute the distribution parameters, where data-driven LP-coefficients are estimated by the WaveNet vocoder along with locations and scales. Finally, a shallower WaveNet vocoder with 6 layers is deployed. The experimental results demonstrate that the proposed LP-based Laplacian distribution can alleviate the quality degradation caused by segment generation.