MASSIVELY MULTILINGUAL ASR: A LIFELONG LEARNING SOLUTION

Bo Li, Ruoming Pang, Yu Zhang, Tara Sainath, Trevor Strohman, Parisa Haghani, Yun Zhu, Brian Farris, Neeraj Gaur, Manasa Prasad

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:14:52

08 May 2022

The development of end-to-end models has largely sped up the research in massively multilingual automatic speech recognition (MMASR). Previous research has demonstrated the feasibility to build high quality MMASR models. In this work, we study the impact of adding more languages and propose a lifelong learning approach to build high quality MMASR systems. Experiments on a 66-language Voice Search task show that we can take a model built on 15 languages and continue training to obtain a 32-language model and similarly to further build a 67-language model. More importantly, models developed in this way achieve better quality compared to those trained from scratch. It maintains similar performance on old languages and achieves competing results on new ones. This would potentially speed up the development of universal ASR models that recognize speech from any language, any domain and any environment by reusing knowledge learned beforehand.

Tags:

lifelong learning

multilingual

massive

MASSIVELY MULTILINGUAL ASR: A LIFELONG LEARNING SOLUTION

Bo Li, Ruoming Pang, Yu Zhang, Tara Sainath, Trevor Strohman, Parisa Haghani, Yun Zhu, Brian Farris, Neeraj Gaur, Manasa Prasad

Value-Added Bundle(s) Including this Product

ICASSP 2022, May 2022 Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

EXPERTS VERSUS ALL-ROUNDERS: TARGET LANGUAGE EXTRACTION FOR MULTIPLE TARGET LANGUAGES

TOWARDS LIFELONG LEARNING OF MULTILINGUAL TEXT-TO-SPEECH SYNTHESIS

LANGUAGE ADAPTIVE CROSS-LINGUAL SPEECH REPRESENTATION LEARNING WITH SPARSE SHARING SUB-NETWORKS

Join the IEEE Signal Processing Society