MULTILINGUAL SECOND-PASS RESCORING FOR AUTOMATIC SPEECH RECOGNITION SYSTEMS

Neeraj Gaur, Tongzhou Chen, Ehsan Variani, Parisa Haghani, Bhuvana Ramabhadran, Pedro Moreno

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:09:08

08 May 2022

Second-pass rescoring is a well known technique to improve the performance of Automatic Speech Recognition (ASR)systems. Neural Oracle Search (NOS), which selects the most likely hypothesis from an N-best hypothesis list by integrating information from multiple sources has shown success in rescoring for RNN-T first-pass models. Multilingual first-pass speech recognition models often outperform their monolingual counterparts when trained on related or low-resource languages. In this paper, we investigate the use of the NOS rescoring model on a first-pass multilingual model and show that similar to the first-pass model, the rescoring model can be made multilingual. Our first-pass multilingual model does not require a language-id and we make a realistic assumption that an estimate of the language-id would be available for second-pass rescoring. We conduct comprehensive experiments on two sets of languages, one consisting of related low-resource languages, and the other with a high resource language added to the first set to analyze the performance of the multilingual NOS rescorer under different settings. Our experimental results show that multilingual NOS can improve the first-pass multilingual model resulting in average word error rate reduction of 9.4% for the first case, and 8.4% for the second, and outperforming the monolingual counterparts.

Tags:

speech recognition

n-best rescoring

multilingual

rnn-t