A TRANSFER LEARNING APPROACH FOR PRONUNCIATION SCORING

Marcelo Sancinetti, Jazmín Vidal, Cyntia Bonomi, Luciana Ferrer

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:08:47

09 May 2022

Phone-level pronunciation scoring is a very challenging task, with performance far from that of human annotators. Standard systems generate a score for each phone in a phrase using models trained for automatic speech recognition (ASR) with native data only. Better performance has been shown when using systems that are trained specifically for the task using non-native data. Yet, such systems face the challenge that datasets labelled for this task are scarce and usually small. In this paper, we present a transfer learning-based approach that leverages a model trained for ASR, adapting it for the task of pronunciation scoring. We analyze the effect of several hyperparameters and design choices and compare the performance with a state-of-the-art goodness of pronunciation (GOP) system. Our final model is, on average, 20% better than the GOP model on a cost function that prioritizes low rates of unnecessary corrections.

Tags:

goodness of pronunciation

transfer learning

phone-level pronunciation scoring

A TRANSFER LEARNING APPROACH FOR PRONUNCIATION SCORING

Marcelo Sancinetti, Jazmín Vidal, Cyntia Bonomi, Luciana Ferrer

Value-Added Bundle(s) Including this Product

ICASSP 2022, May 2022 Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

(Slides) Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition

PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition

STAGE OF DECAY ESTIMATION EXPLOITING EXOGENOUS AND ENDOGENOUS IMAGE ATTRIBUTES TO MINIMIZE MANUAL LABELING EFFORTS AND MAXIMIZE CLASSIFICATION PERFORMANCE

Join the IEEE Signal Processing Society