Semi-Supervised Speaker Adaptation For End-To-End Speech Synthesis With Pretrained Models

Katsuki Inoue, Masanobu Abe, Sunao Hara, Tomoki Hayashi, Ryuichi Yamamoto, Shinji Watanabe

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 14:44

04 May 2020

Recently, end-to-end text-to-speech (TTS) models have achieved a remarkable performance, however, requiring a large amount of paired text and speech data for training. On the other hand, we can easily collect unpaired dozen minutes of speech recordings for a target speaker without corresponding text data. To make use of such accessible data, the proposed method leverages the recent great success of state-of-the-art end-to-end automatic speech recognition (ASR) systems and obtains corresponding transcriptions from pretrained ASR models. Although these models could only provide text output instead of intermediate linguistic features like phonemes, end-to-end TTS can be well trained with such raw text data directly. Thus, the proposed method can greatly simplify a speaker adaptation pipeline by consistently employing end-to-end ASR/TTS ecosystems. The experimental results show that our proposed method achieved comparable performance to a paired data adaptation method in terms of subjective speaker similarity and objective cepstral distance measures.

Tags:

sps conference

icassp 2020 virtual conference

May 2020

icassp 2020

Semi-Supervised Speaker Adaptation For End-To-End Speech Synthesis With Pretrained Models

Katsuki Inoue, Masanobu Abe, Sunao Hara, Tomoki Hayashi, Ryuichi Yamamoto, Shinji Watanabe

Value-Added Bundle(s) Including this Product

ICASSP 2020 Virtual Conference - Presentation Videos Product Bundle

More Like This

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

IEEE ICASSP 2024, 1 4-19 April 2024, Seoul, Korea. Conference Presentation Videos Bundle

ICIP 2022, October 16-19, 2022, Bordeaux, France - Presentation Videos Product Bundle

Join the IEEE Signal Processing Society