Joint Alignment Learning-Attention Based Model For Grapheme-To-Phoneme Conversion

Yonghe Wang, Feilong Bao, Hui Zhang, Guanglai Gao

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:09:37

10 Jun 2021

Sequence-to-sequence attention-based models for grapheme-to-phoneme (G2P) conversion have gained significant interests. The attention-based encoder-decoder framework learns the mapping of input to output tokens by selectively focusing on relevant information, and has been shown well performance. However, the attention mechanism can result in non-monotonic alignments, resulting in poor G2P conversion performance. In this paper, we present a novel approach to optimize the G2P conversion model directly alignment grapheme-phoneme sequence by using alignment learning (AL) as the loss function. Besides, we propose a multi-task learning method that uses a joint alignment learning model and attention model to predict the proper alignments and thus improve the accuracy of G2P conversion. Evaluations on Mongolian and CMUDict tasks show that alignment learning as the loss function can effectively train G2P conversion model. Further, our multi-task method can significantly outperform both the alignment learning-based model and attention-based model.

Chairs:

Eric Fosler-Lussier

Tags:

signal processing society

IEEE icassp 2021

virtual conference

2021

sps

virtual conference icassp 2021

june 6-11 2021

icassp 2021