Skip to main content

A TRANSFER LEARNING APPROACH FOR PRONUNCIATION SCORING

Marcelo Sancinetti, Jazm�n Vidal, Cyntia Bonomi, Luciana Ferrer

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 00:08:47
09 May 2022

Phone-level pronunciation scoring is a very challenging task, with performance far from that of human annotators. Standard systems generate a score for each phone in a phrase using models trained for automatic speech recognition (ASR) with native data only. Better performance has been shown when using systems that are trained specifically for the task using non-native data. Yet, such systems face the challenge that datasets labelled for this task are scarce and usually small. In this paper, we present a transfer learning-based approach that leverages a model trained for ASR, adapting it for the task of pronunciation scoring. We analyze the effect of several hyperparameters and design choices and compare the performance with a state-of-the-art goodness of pronunciation (GOP) system. Our final model is, on average, 20% better than the GOP model on a cost function that prioritizes low rates of unnecessary corrections.

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00