DATA AUGMENTATION FOR LONG-TAILED AND IMBALANCED POLYPHONE DISAMBIGUATION IN MANDARIN

Yang Zhang, Haitong Zhang, Yue Lin

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:09:38

10 May 2022

Polyphone disambiguation is an important module in Mandarin Chinese text-to-speech (TTS). Recently, neural-network-based(NN-based) models have achieved a great improvement on polyphone disambiguation. However, a long-tailed and imbalanced distribution is usually observed in the training data of polyphone disambiguation, resulting in an unsatisfying performance on the low-frequent polyphone in the imbalanced pinyin set, and the least-frequent polyphonic characters and polyphones. In this paper, we proposed a simple data-augmentation method based on the pre-trained mask language model BERT to mitigate the long-tailed and imbalanced distribution problem. We incorporate a weighted sampling technique in the data augmentation method to balance the data distribution, and a useful filtering strategy to remove some noisy augmented data. Experimental results show that the proposed data-augmentation method can improve the prediction accuracy, especially for those low-frequent polyphone in the imbalanced pinyin set, and the least-frequent polyphonic characters and polyphones.

Tags:

long-tailed and imbalanced distribution

polyphone disambiguation

data-augmentation