Audio-Attention Discriminative Language Model For Asr Rescoring
Ankur Gandhe, Ariya Rastrow
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 12:51
End-to-end approaches for automatic speech recognition benefit from modeling the probability of the word sequence given the input audio stream directly in a single neural network. However, compared to conventional ASR systems, these models typically require more data to achieve results comparable results. In addition, conventional systems have already been optimized for various production environments and use cases. In this work, we propose to combine the benefits of end to end approaches with a conventional system using an attention-based \emph{discriminative language model} that learns to re-score the output of a first-pass ASR system. We show that learning to re-rank a list of potential ASR outputs is much simpler than learning to generate the hypothesis and our model can get upto 8\% improvement in word error rate even when the amount of training data is a fraction of training data used for training the first-pass system.