End-To-End Code-Switching Tts With Cross-Lingual Language Model
Xuehao Zhou, Xiaohai Tian, Grandee Lee, Rohan Kumar Das, Haizhou Li
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 12:13
Code-switching text-to-speech (TTS) aims to enable a system to speak two languages with a single voice and in the same utterance. In this paper, we propose to incorporate cross-lingual word embedding into an end-to-end TTS system, to improve the voice rendering. The cross-lingual word embedding, generated from a pre-trained cross-lingual language model, is able to encode words of two languages in the same embedding space, therefore, allows words across languages to share each other's contextual information, which is useful for the voice rendering of code-switching content. To investigate the effectiveness of this idea, we conduct studies on two multi-speaker monolingual corpora, namely, THCHS30 Mandarin and LibriTTS English database. The evaluation results show that our proposed framework outperforms the baseline systems when presented with code-switching text input, and achieves state-of-the-art performance.