Supervised And Unsupervised Approaches For Controlling Narrow Lexical Focus In Sequence-To-Sequence Speech Synthesis

Slava Shechtman, Raul Fernandez, David Haws

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 0:15:28

19 Jan 2021

Although Sequence-to-Sequence (S2S) architectures have become state-of-the-art in speech synthesis, capable of generating outputs that approach the perceptual quality of natural samples, they are limited by a lack of flexibility when it comes to controlling the output. In this work we present a framework capable of controlling the prosodic output via a set of concise, interpretable, disentangled parameters. We apply this framework to the realization of emphatic lexical focus, proposing a variety of architectures designed to exploit different levels of supervision based on the availability of labeled resources. We evaluate these approaches via listening tests that demonstrate we are able to successfully realize controllable focus while maintaining the same, or higher, naturalness over an established baseline, and we explore how the different approaches compare when synthesizing in a target voice with or without labeled data.

Tags:

sps conference

slt 2021

Supervised And Unsupervised Approaches For Controlling Narrow Lexical Focus In Sequence-To-Sequence Speech Synthesis

Slava Shechtman, Raul Fernandez, David Haws

Value-Added Bundle(s) Including this Product

SLT 2021 Virtual Conference - Presentation Videos Product Bundle

More Like This

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

IEEE ICASSP 2024, 1 4-19 April 2024, Seoul, Korea. Conference Presentation Videos Bundle

ICIP 2022, October 16-19, 2022, Bordeaux, France - Presentation Videos Product Bundle

Join the IEEE Signal Processing Society