Skip to main content

Multi-speaker Multi-lingual VQTTS System for LIMMITS 2023 Challenge

Chenpeng Du (Shanghai Jiao Tong University); Yiwei Guo (Shanghai Jiao Tong University); Feiyu Shen (Shanghai Jiao Tong University); Kai Yu (Shanghai Jiao Tong University)

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
10 Jun 2023

In this paper, we describe the systems developed by the SJTU X-LANCE team for LIMMITS 2023 Challenge and we mainly focus on the winning system on naturalness for track 1. The aim of this challenge is to build a multi-speaker multi-lingual text-to-speech (TTS) system for Marathi, Hindi, and Telugu. Each of the languages has a male and a female speaker in the given dataset. In track 1, only 5 hours data from each speaker can be selected to train the TTS model. Our system is based on the recently proposed VQTTS that utilizes VQ acoustic feature rather than mel-spectrogram. We introduce additional speaker embeddings and language embeddings to VQTTS for controlling the speaker and language information. In the cross-lingual evaluations where we need to synthesize speech in a cross-lingual speaker's voice, we provide a native speaker's embedding to the acoustic model and the target speaker's embedding to the vocoder. In the subjective MOS listening test on naturalness, our system achieves 4.77 which ranks first.

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00