Investigation into phone-based subword units for Multilingual end-to-end speech recognition
Saierdaer Yusuyin (Xinjiang University); Hao Huang (Xinjiang University); Junhua Liu (University of Science and Technology of China); Cong Liu (iFLYTEK Research)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Multilingual automatic speech recognition (ASR) models with phones as modeling units have have improved greatly in low-resource and similar-language scenarios, which benefits from shared representation across languages. Meanwhile, subwords have demonstrated their effectiveness for monolingual end-to-end recognition systems. In this paper, we investigate the use of phone-based subwords, specifically Byte Pair Encoding (BPE), as modeling units for multilingual end-to-end speech recognition. To explore the possibilities of phone-based BPE (PBPE) for multilingual ASR, we first use three types of multilingual BPE training methods for similar low-resource languages in Central Asia. Then, by adding three high-resource European languages to the experiments, we analyze language sharing degree in similar and low-resource scenarios. Finally, we propose a method to adjust the bigram statistics in the BPE algorithm and show that the PBPE representation leads to accuracy improvements in multilingual scenarios. The experiments show that PBPE outperforms phone, character and character-based BPE as output representation units. Particularly, the best PBPE model in multilingual experiments achieves a 25% relative improvement on a low-resource language compared to a character-based BPE system.