Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 00:09:44
09 May 2022

The efficacy and robustness of Ordinal Regression (OR) in assessing speech pronunciation for language learning at phrase level has been shown before. However, for assessing phoneme pronunciation, we need to: 1. collect human scoring annotations for phoneme tokens of a short duration (60-70 ms); 2. train an ordinal regression model for each phoneme with the corresponding training and inference costs. In this paper, we propose to train a Universal Ordinal Regression (UOR) model instead of multiple, separate models for different phonemes, and evaluate its performance accordingly. A single universal binary classifier in UOR is trained to make a binary preference decision (better or worse) between a pair of two tokens with the same phoneme ID. In inference, labeled anchored tokens of specific phoneme ID in the training data are paired with test phoneme token to make binary preference decisions. By evaluating the new UOR on Speechocean762, a public speech database for pronunciation evaluation, we show the advantages of the proposed new approach. Improvements of Pearson Correlation Coefficient by 16.7% and Mean Square Error by 25.0%, all relatively, are obtained against the state-of-the-art systems.

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00