Stable Checkpoint Selection And Evaluation In Sequence To Sequence Speech Synthesis

Slava Shechtman, David Haws, Raul Fernandez

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:12:14

11 Jun 2021

Autoregressive Attentive Sequence-to-Sequence (S2S) speech synthesis is considered state-of-the-art in terms of speech quality and naturalness, as evaluated on a finite set of testing utterances. However, it can occasionally suffer from stability issues at inference time, such as local intelligibility problems or utterance incompletion. Frequently, a model's stability varies from one checkpoint to another, even after the training loss shows signs of convergence, making the selection of a stable model a tedious and time-consuming task. In this work we propose a novel stability metric designed for automatic checkpoint selection based on incomplete utterance counts within a validation set. The metric is based solely on attention matrix analysis in inference mode and requires no ground-truth output targets. The proposal runs 125 times faster than real-time on a GPU (Tesla-K80), allowing convenient incorporation during training to filter out unstable checkpoints, and we demonstrate, via objective and perceptual metrics, its effectiveness in selecting a robust model that attains a good trade-off between stability and quality.

Chairs:

Erica Cooper

Tags:

signal processing society

IEEE icassp 2021

virtual conference

2021

sps

virtual conference icassp 2021

june 6-11 2021

icassp 2021

Stable Checkpoint Selection And Evaluation In Sequence To Sequence Speech Synthesis

Slava Shechtman, David Haws, Raul Fernandez

Value-Added Bundle(s) Including this Product

ICASSP 2021 Virtual Conference - Presentation Videos Product Bundle

More Like This

Keynote: Navigating the Transition to Sustainable Energy Solutions in a Power-Hungry World

Panel: Leveraging Technology to Achieve Carbon Neutrality of Buildings and Factories

Panel: Charting the Course for Future-Ready Data Centers in the Era of Sustainability

Join the IEEE Signal Processing Society