PVAE-TTS: ADAPTIVE TEXT-TO-SPEECH VIA PROGRESSIVE STYLE ADAPTATION

Ji-Hyun Lee, Sang-Hoon Lee, Ji-Hoon Kim, Seong-Whan Lee

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:05:51

08 May 2022

Adaptive text-to-speech (TTS) has attracted increasing interests for the purpose of training TTS systems without tons of high quality data. Nevertheless, existing adaptive TTS systems still show low adaptation quality for novel speakers, since it is hard to learn an extensive speaking style with limited data. To address this issue, we propose progressive variational autoencoder (PVAE) which generates data with adapting to style gradually. PVAE learns a progressively style-normalized representation, which is a key component of progressive style adaptation. We extend PVAE to PVAE-TTS, a multi-speaker adaptive TTS model which generates natural speech with high adaptation quality for novel speakers. To further improve the adaptation quality, we propose dynamic style layer normalization (DSLN) which utilizes a convolution operation. The experimental results demonstrate the superiority of PVAE-TTS in terms of both subjective and objective evaluations.

Tags:

speaker adaptation

text-to-speech

speech synthesis

adaptive tts

PVAE-TTS: ADAPTIVE TEXT-TO-SPEECH VIA PROGRESSIVE STYLE ADAPTATION

Ji-Hyun Lee, Sang-Hoon Lee, Ji-Hoon Kim, Seong-Whan Lee

Value-Added Bundle(s) Including this Product

ICASSP 2022, May 2022 Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Short Course Bundle: ICASSP 2022 COURSE 5: Speech Technology for Health: From Technical Foundations to Applications (Parts 1-3)

An Overview of Voice Conversion and Its Challenges: From Statistical Modeling to Deep Learning

Slides for: An Overview of Voice Conversion and Its Challenges: From Statistical Modeling to Deep Learning

Join the IEEE Signal Processing Society