Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 00:09:38
10 May 2022

Polyphone disambiguation is an important module in Mandarin Chinese text-to-speech (TTS). Recently, neural-network-based(NN-based) models have achieved a great improvement on polyphone disambiguation. However, a long-tailed and imbalanced distribution is usually observed in the training data of polyphone disambiguation, resulting in an unsatisfying performance on the low-frequent polyphone in the imbalanced pinyin set, and the least-frequent polyphonic characters and polyphones. In this paper, we proposed a simple data-augmentation method based on the pre-trained mask language model BERT to mitigate the long-tailed and imbalanced distribution problem. We incorporate a weighted sampling technique in the data augmentation method to balance the data distribution, and a useful filtering strategy to remove some noisy augmented data. Experimental results show that the proposed data-augmentation method can improve the prediction accuracy, especially for those low-frequent polyphone in the imbalanced pinyin set, and the least-frequent polyphonic characters and polyphones.

More Like This