Multi-lingual pronunciation assessment with unified phoneme set and language-specific embeddings
Binghuai Lin (MIG, Tencent Science and Technology Ltd.); Liyuan wang (Tencent Technology Co., Ltd)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Automatic pronunciation assessment is commonly trained and applied for a specific language, which is not practical in multi-lingual or low-resource scenarios. In this paper, we propose a unified method to take advantage of multi-lingual data for multi-lingual pronunciation assessment. To this end, we first construct a concise unified phoneme set for multi-lingual phoneme recognition based on a pre-trained acoustic model. In this way we can not only share language-independent knowledge but also try to discriminate language-specific information for pronunciation assessment. Second, we employ language-specific embeddings for different languages, which act like language-specific assessment criteria to adaptively adjust the feature weights based on an attention mechanism. The whole network is optimized in a unified framework. Experimental results based on multi-lingual datasets demonstrate its superiority to different baselines in Pearson correlation coefficient (PCC). We also illustrate the generalizability of the proposed method for both seen and unseen data.