Bi-Level Style And Prosody Decoupling Modeling For Personalized End-To-End Speech Synthesis

Ruibo Fu, Jianhua Tao, Zhengqi Wen, Jiangyan Yi, Tao Wang, Chunyu Qiang

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:11:01

10 Jun 2021

End-to-end framework can generate high-quality and high-similarity speech in the personalized speech synthesis task. However, the generalization of out-of-domain texts is still a challenging task. Limited target data leads to unacceptable errors and poor prosody and similarity performance of the synthetic speech. In this paper, we present a bi-level function decoupling framework to realise separate modeling and controlling for solving above problems. Firstly, on the style representation modeling level, compared with the conventional methods that use single embedding to model all the text dependent discrepancies, it is proposed that the speaker embedding and prosody embedding are modeled separately based on the reference audio and phonetic posteriorgram (PPG) by a multi-head attention mechanism. Secondly, on the model structure level, the decoder model structure is factored into average-net and adaptation-net, where the duration prosody controlling and speaker timbre imitation are mainly designed in relatively separate areas. Experimental results on Mandarin dataset show that the proposed methods lead to an improvement on both robustness, naturalness and similarity.

Chairs:

Hung-yi Lee

Tags:

signal processing society

IEEE icassp 2021

virtual conference

2021

sps

virtual conference icassp 2021

june 6-11 2021

icassp 2021

Bi-Level Style And Prosody Decoupling Modeling For Personalized End-To-End Speech Synthesis

Ruibo Fu, Jianhua Tao, Zhengqi Wen, Jiangyan Yi, Tao Wang, Chunyu Qiang

Value-Added Bundle(s) Including this Product

ICASSP 2021 Virtual Conference - Presentation Videos Product Bundle

More Like This

Panel: Building Sustainable Cities for Tomorrow

Panel: Unleashing the Potential of Virtual Power Plants for Sustainable Energy Solutions

Bundle: 2024 IEEE SustainTech Leadership Forum

Join the IEEE Signal Processing Society