Transducer-Based Streaming Deliberation For Cascaded Encoders

Ke Hu, Tara Sainath, Arun Narayanan, Ruoming Pang, Trevor Strohman

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:10:12

12 May 2022

Previous research on applying deliberation networks to automatic speech recognition has achieved excellent results. The attention decoder based deliberation model often works as a rescorer to improve first-pass recognition results, and requires the full first-pass hypothesis for second-pass deliberation. In this work, we propose a transducer-based streaming deliberation model. The joint network of a transducer decoder often receives inputs from the encoder and the prediction network. We propose to use attention to the first-pass text hypothesis as the third input to the joint network. The proposed transducer based deliberation model naturally streams, making it more desirable for on-device applications. We also show that the model improves rare word recognition, with relative WER reductions ranging from 3.6% to 10.4% for a variety of test sets. Our model does not use any additional text data for training.

Tags:

transducer

rare word recognition

streaming deliberation

attention

Transducer-Based Streaming Deliberation For Cascaded Encoders

Ke Hu, Tara Sainath, Arun Narayanan, Ruoming Pang, Trevor Strohman

Value-Added Bundle(s) Including this Product

ICASSP 2022, May 2022 Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

ATTEN-ADAPTER: A UNIFIED ATTENTION-BASED ADAPTER FOR EFFICIENT TUNING

Cross-Inferential Networks for Source-free Unsupervised Domain Adaptation

IMPROVEMENT OF IMAGE SEGMENTATION MODEL FOR HANDWRITTEN NOTEBOOK ANALYTICS

Join the IEEE Signal Processing Society