Skip to main content

A New High Quality Trajectory Tiling Based Hybrid Tts In Real Time

Feng-Long Xie, Xin-Hui Li, Wen-Chao Su, Li Lu, Frank K. Soong

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 00:05:37
08 Jun 2021

A trajectory tiling based, hybrid TTS is revisited in this study for improving its synthesis performance. A combination of Transformer encoder and RNN based decoder architecture where two-level, at both word and Chinese phonetic alphabet letter levels, linguistic representation is exploited to generate a cogent and smooth speech parameter trajectory. And then a segment candidate lattice is constructed by minimizing the log spectral distortion of mel-spectrograms and RMSE of F0 between the generated trajectory and candidates. Normalized cross-correlation is used to find the best sequence of “waveform tiles” in the lattice for synthesizing the final speech waveforms. Subjective A/B preference tests show that the new hybrid system outperforms our earlier trajectory-tiling hybrid baseline TTS (67% vs 11%) and the state-of-the-art, real-time TTS system constructed with Tacotron 2 and LPCNet (56% vs 27%).

Chairs:
Yu Zhang

Value-Added Bundle(s) Including this Product

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00