Unsupervised Style And Content Separation By Minimizing Mutual Information For Speech Synthesis

Ting-Yao Hu, Ashish Shrivastava, Oncel Tuzel, Chandra Dhir

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 14:26

04 May 2020

We present a method to generate speech from input text and a style vector that is extracted from a reference speech signal in an unsupervised manner, i.e., no style annotation, such as speaker information, is required. Existing unsupervised methods, during training, generate speech by computing style from the corresponding ground truth sample and use a decoder to combine the style vector with the input text. Training the model in such a way leaks content information into the style vector. The decoder can use the leaked content and ignore some of the input text to minimize the reconstruction loss. At inference time, when the reference speech does not match the content input, the output may not contain all of the content of the input text. We refer to this problem as "content leakage", which we address by explicitly estimating and minimizing the mutual information between the style and the content through an adversarial training formulation. The main goal of the method is to preserve the input content in the synthesized speech signal, which we measure by the word error rate (WER) and show substantial improvements over state-of-the-art unsupervised speech synthesis methods.

Tags:

sps conference

icassp 2020 virtual conference

May 2020

icassp 2020

Unsupervised Style And Content Separation By Minimizing Mutual Information For Speech Synthesis

Ting-Yao Hu, Ashish Shrivastava, Oncel Tuzel, Chandra Dhir

Value-Added Bundle(s) Including this Product

ICASSP 2020 Virtual Conference - Presentation Videos Product Bundle

More Like This

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

IEEE ICASSP 2024, 1 4-19 April 2024, Seoul, Korea. Conference Presentation Videos Bundle

ICIP 2022, October 16-19, 2022, Bordeaux, France - Presentation Videos Product Bundle

Join the IEEE Signal Processing Society