Lightspeech: Lightweight Non-Autoregressive Multi-Speaker Text-To-Speech
song li, beibei ouyang, lin li, qingyang hong
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 0:12:30
With the development of deep learning, end-to-end neural text-to-speech systems have achieved significant improvements on high-quality speech synthesis. However, most of these systems are attention-based autoregressive models, resulting in slow synthesis speed and large model parameters. In this paper, we propose a new lightweight non-autoregressive multi-speaker speech synthesis system, named LightSpeech, which utilizes the lightweight feedforward neural networks to accelerate synthesis and reduce the amount of parameters. With the speaker embedding, LightSpeech achieves multi-speaker speech synthesis extremely quickly. Experiments on the LibriTTS dataset show that, compared with FastSpeech, our smallest LightSpeech model achieves a 9.27x Mel-spectrogram generation acceleration on CPU, and the model size and parameters are compressed by 37.06x and 37.36x, respectively.