DATA AUGMENTATION FOR LONG-TAILED AND IMBALANCED POLYPHONE DISAMBIGUATION IN MANDARIN
Yang Zhang, Haitong Zhang, Yue Lin
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:09:38
Polyphone disambiguation is an important module in Mandarin Chinese text-to-speech (TTS). Recently, neural-network-based(NN-based) models have achieved a great improvement on polyphone disambiguation. However, a long-tailed and imbalanced distribution is usually observed in the training data of polyphone disambiguation, resulting in an unsatisfying performance on the low-frequent polyphone in the imbalanced pinyin set, and the least-frequent polyphonic characters and polyphones. In this paper, we proposed a simple data-augmentation method based on the pre-trained mask language model BERT to mitigate the long-tailed and imbalanced distribution problem. We incorporate a weighted sampling technique in the data augmentation method to balance the data distribution, and a useful filtering strategy to remove some noisy augmented data. Experimental results show that the proposed data-augmentation method can improve the prediction accuracy, especially for those low-frequent polyphone in the imbalanced pinyin set, and the least-frequent polyphonic characters and polyphones.