IMPROVING THE LATENCY AND QUALITY OF CASCADED ENCODERS

Tara Sainath, Yanzhang He, Arun Narayanan, Rami Botros, Weiran Wang, David Qiu, Chung-cheng Chiu, Rohit Prabhavalkar, Alexander Gruenstein, Anmol Gulati, Bo Li, David Rybach, Emmanuel Guzman, Ian McGraw, James Qin, Krzysztof Choromanski, Qiao Liang, Robert David, Ruoming Pang, Shuoyiin Chang, Trevor Strohman, W. Ronny Huang, Wei Han, Yonghui Wu, Yu Zhang

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:15:13

12 May 2022

In this paper, we explore reducing computational latency of the 2-pass cascaded encoder model [1]. Specifically, we experiment with reducing the size of the causal 1st-pass and adding capacity to the non-causal 2nd-pass, such that the overall latency can be reduced without loss of quality. In addition, we explore using a confidence model for deciding to stop 2nd-pass recognition if we are confident in the 1st-pass hypothesis. Overall, we are able to reduce latency by a factor of 1.7X, compared to the baseline cascaded encoder from [1]. Secondly, with the added capacity in the non-causal 2nd-pass, we find that we can improve WER by up to 7% relative using wav2vec and minimum word-error-rate (MWER) training.

Tags:

end-to-end asr

two-pass asr

rnnt

second-pass asr

long-form asr

IMPROVING THE LATENCY AND QUALITY OF CASCADED ENCODERS

Value-Added Bundle(s) Including this Product

ICASSP 2022, May 2022 Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

GPU-ACCELERATED FORWARD-BACKWARD ALGORITHM WITH APPLICATION TO LATTICE-FREE MMI

USTED: IMPROVING ASR WITH A UNIFIED SPEECH AND TEXT ENCODER-DECODER

TRANSFORMER-BASED STREAMING ASR WITH CUMULATIVE ATTENTION

Join the IEEE Signal Processing Society