BILINGUAL END-TO-END ASR WITH BYTE-LEVEL SUBWORDS

Liuhui Deng, Roger Hsiao, Arnab Ghoshal

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:10:47

08 May 2022

In this paper, we investigate how the output representation of an end-to-end neural network affects multilingual automatic speech recognition (ASR). We study different representations including character-level, byte-level, byte pair encoding (BPE), and byte- level byte pair encoding (BBPE) representations, and analyze their strengths and weaknesses. We focus on developing a single end-to- end model to support utterance-based bilingual ASR, where speakers do not alternate between two languages in a single utterance but may change languages across utterances. We conduct our experiments on English and Mandarin dictation tasks, and we find that BBPE with penalty schemes can improve utterance-based bilingual ASR performance by 2% to 5% relative even with smaller number of outputs and fewer parameters. We conclude with analysis that indicates directions for further improving multilingual ASR.

Tags:

bilingual speech recognition

end-to-end neural network

byte-level subwords

BILINGUAL END-TO-END ASR WITH BYTE-LEVEL SUBWORDS

Liuhui Deng, Roger Hsiao, Arnab Ghoshal

Value-Added Bundle(s) Including this Product

ICASSP 2022, May 2022 Virtual and In-Person Conference - Presentation Videos Product Bundle

Sorry, no results were found

Join the IEEE Signal Processing Society