A UNIVERSAL ORDINAL REGRESSION FOR ASSESSING PHONEME-LEVEL PRONUNCIATION
Shaoguang Mao, Frank Soong, Yan Xia, Jonathan Tien
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:09:44
The efficacy and robustness of Ordinal Regression (OR) in assessing speech pronunciation for language learning at phrase level has been shown before. However, for assessing phoneme pronunciation, we need to: 1. collect human scoring annotations for phoneme tokens of a short duration (60-70 ms); 2. train an ordinal regression model for each phoneme with the corresponding training and inference costs. In this paper, we propose to train a Universal Ordinal Regression (UOR) model instead of multiple, separate models for different phonemes, and evaluate its performance accordingly. A single universal binary classifier in UOR is trained to make a binary preference decision (better or worse) between a pair of two tokens with the same phoneme ID. In inference, labeled anchored tokens of specific phoneme ID in the training data are paired with test phoneme token to make binary preference decisions. By evaluating the new UOR on Speechocean762, a public speech database for pronunciation evaluation, we show the advantages of the proposed new approach. Improvements of Pearson Correlation Coefficient by 16.7% and Mean Square Error by 25.0%, all relatively, are obtained against the state-of-the-art systems.