Skip to main content

FAST AND PARALLEL DECODING FOR TRANSDUCER

Wei Kang (Xiaomi Corp., Beijing, China); Liyong Guo (Xiaomi Corp.); Fangjun Kuang (Xiaomi Corp.); Long Lin (Xiaomi Corp., Beijing, China); Mingshuang Luo (Xiaomi Corp., Beijing, China); Zengwei Yao (Xiaomi Corp., Beijing, China); Xiaoyu Yang (Xiaomi Corp., Beijing, China); Piotr Żelasko (Johns Hopkins University); Daniel Povey (Johns Hopkins University)

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
08 Jun 2023

The transducer architecture is becoming increasingly popular in the field of speech recognition, because it is naturally streaming as well as high in accuracy. One of the drawbacks of transducer is that it is difficult to decode in a fast and parallel way due to an unconstrained number of symbols that can be emitted per time step. In this work, we introduce a constrained version of transducer loss to learn strictly monotonic alignments between the sequences; we also improve the standard greedy search and beam search algorithms by limiting the number of symbols that can be emitted per time step in transducer decoding, making it more efficient to decode in parallel with batches. Furthermore, we propose an FSA-based parallel beam search algorithm that can run with graphs on GPU efficiently. The experiment results show that we can get slightly better WERs as well as gain significant decoding speedup. Our work is open-sourced and publicly available.

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00