Improving Prosody With Linguistic And Bert Derived Features In Multi-Speaker Based Mandarin Chinese Neural Tts

Yujia Xiao, Lei He, Huaiping Ming, Frank K. Soong

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 12:27

04 May 2020

Recent advances of neural TTS have made âhuman parityâ synthesized speech possible when a large amount of studio-quality training data from a voice talent is available. However, with only limited, casual recordings from an ordinary speaker, human-like TTS is still a big challenge, in addition to other artifacts like incomplete sentences, repetition of words, etc. Chinese, a language, of which the text is different from that of other roman-letter based languages like English, has no blank space between adjacent words, hence word segmentation errors can cause serious semantic confusions and unnatural prosody. In this study, with a multi-speaker TTS to accommodate the insufficient training data of a target speaker, we investigate linguistic features and Bert-derived information to improve the prosody of our Mandarin Chinese TTS. Three factors are studied: phone-related and prosody-related linguistic features; better predicted breaks with a refined Bert-CRF model; augmented phoneme sequence with character embedding derived from a Bert model. Subjective tests on in- and out-domain tasks of News, Chat and Audiobook, have shown that all factors are effective for improving prosody of our Mandarin TTS. The model with additional character embeddings from Bert is the best one, which outperforms the baseline by 0.17 MOS gain.

Tags:

sps conference

icassp 2020 virtual conference

May 2020

icassp 2020

Improving Prosody With Linguistic And Bert Derived Features In Multi-Speaker Based Mandarin Chinese Neural Tts

Yujia Xiao, Lei He, Huaiping Ming, Frank K. Soong

Value-Added Bundle(s) Including this Product

ICASSP 2020 Virtual Conference - Presentation Videos Product Bundle

More Like This

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

IEEE ICASSP 2024, 1 4-19 April 2024, Seoul, Korea. Conference Presentation Videos Bundle

ICIP 2022, October 16-19, 2022, Bordeaux, France - Presentation Videos Product Bundle

Join the IEEE Signal Processing Society