Conversational End-To-End Tts For Voice Agents

Haohan Guo, Shaofei Zhang, Frank Soong, Lei He, Lei Xie

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 0:13:42

19 Jan 2021

End-to-end neural TTS has achieved excellent performance in reading style speech synthesis. However, it鈥檚 still a challenge to build a high-quality conversational TTS due to the limitations of the corpus and modeling capability. This study aims at building a conversational TTS for a voice agent under sequence to sequence modeling framework. We firstly construct a spontaneous conversational speech corpus well designed for the voice agent with a new recording scheme ensuring both recording quality and conversational speaking style. Secondly, we propose a conversation context-aware end-to-end TTS approach which has an auxiliary encoder and a conversational context encoder to reinforce the information about the current utterance and its context in a conversation as well. Experimental results show that the proposed methods produce more natural prosody in accordance with the conversational context, with significant preference gains at both utterance-level and conversation-level. Moreover, we find that the model has the ability to express some spontaneous behaviors, like fillers and repeated words, which makes the conversational speaking style more realistic.

Tags:

sps conference

slt 2021

Conversational End-To-End Tts For Voice Agents

Haohan Guo, Shaofei Zhang, Frank Soong, Lei He, Lei Xie

Value-Added Bundle(s) Including this Product

SLT 2021 Virtual Conference - Presentation Videos Product Bundle

More Like This

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

IEEE ICASSP 2024, 1 4-19 April 2024, Seoul, Korea. Conference Presentation Videos Bundle

ICIP 2022, October 16-19, 2022, Bordeaux, France - Presentation Videos Product Bundle

Join the IEEE Signal Processing Society