Lstm-Based One-Pass Decoder For Low-Latency Streaming

Javier Jorge, AdriÃ GimÃ©nez, Javier Iranzo-SÃ¡nchez, Joan Albert Silvestre-CerdÃ , Jorge Civera, Albert Sanchis, Alfons Juan

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 14:27

04 May 2020

Current state-of-the-art models based on Long-Short Term Memory (LSTM) networks have been extensively used in automatic speech recognition (ASR) to improve the performance of these systems. However, using them under a streaming setup is not straightforward due to real-time constraints. In this paper we present a novel streaming decoder that includes a bidirectional LSTM acoustic model as well as an unidirectional LSTM language model to perform the decoding efficiently while keeping the performance comparable to an off-line setup. We perform a one-pass decoding using a sliding window scheme for a bidirectional LSTM acoustic model and an LSTM language model. Our approach has been implemented and assessed under a pure streaming setup, and deployed into our production systems. We report WER and latency figures for the well-known LibriSpeech and TED-LIUM tasks, obtaining competitive WER results with low-latency responses.

Tags:

sps conference

icassp 2020 virtual conference

May 2020

icassp 2020