Lstm-Based One-Pass Decoder For Low-Latency Streaming
Javier Jorge, Adrià Giménez, Javier Iranzo-Sánchez, Joan Albert Silvestre-Cerdà , Jorge Civera, Albert Sanchis, Alfons Juan
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 14:27
Current state-of-the-art models based on Long-Short Term Memory (LSTM) networks have been extensively used in automatic speech recognition (ASR) to improve the performance of these systems. However, using them under a streaming setup is not straightforward due to real-time constraints. In this paper we present a novel streaming decoder that includes a bidirectional LSTM acoustic model as well as an unidirectional LSTM language model to perform the decoding efficiently while keeping the performance comparable to an off-line setup. We perform a one-pass decoding using a sliding window scheme for a bidirectional LSTM acoustic model and an LSTM language model. Our approach has been implemented and assessed under a pure streaming setup, and deployed into our production systems. We report WER and latency figures for the well-known LibriSpeech and TED-LIUM tasks, obtaining competitive WER results with low-latency responses.