IMPROVING THE LATENCY AND QUALITY OF CASCADED ENCODERS
Tara Sainath, Yanzhang He, Arun Narayanan, Rami Botros, Weiran Wang, David Qiu, Chung-cheng Chiu, Rohit Prabhavalkar, Alexander Gruenstein, Anmol Gulati, Bo Li, David Rybach, Emmanuel Guzman, Ian McGraw, James Qin, Krzysztof Choromanski, Qiao Liang, Robert David, Ruoming Pang, Shuoyiin Chang, Trevor Strohman, W. Ronny Huang, Wei Han, Yonghui Wu, Yu Zhang
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:15:13
In this paper, we explore reducing computational latency of the 2-pass cascaded encoder model [1]. Specifically, we experiment with reducing the size of the causal 1st-pass and adding capacity to the non-causal 2nd-pass, such that the overall latency can be reduced without loss of quality. In addition, we explore using a confidence model for deciding to stop 2nd-pass recognition if we are confident in the 1st-pass hypothesis. Overall, we are able to reduce latency by a factor of 1.7X, compared to the baseline cascaded encoder from [1]. Secondly, with the added capacity in the non-causal 2nd-pass, we find that we can improve WER by up to 7% relative using wav2vec and minimum word-error-rate (MWER) training.