Alignment-Length Synchronous Decoding For Rnn Transducer

George Saon, Zoltan Tuske, Kartik Audhkhasi

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 16:41

04 May 2020

We present a beam decoding strategy for recurrent neural network transducers which has the characteristic that all competing hypotheses within the beam have the same alignment length (number of output symbols plus BLANK symbols). We contrast the proposed technique with time-synchronous decoding where the competing hypotheses within the beam correspond to the same input frames (but can have different length output sequences). Experiments on the Switchboard 2000 hours corpus show that alignment-length synchronous decoding (ALSD) is 25% faster than time-synchronous decoding (TSD) for the same accuracy because ALSD performs 42% fewer joint network evaluations and hypothesis expansions during the search. Additionally, we discuss the bene?t of caching and batching the prediction and joint network evaluations, of using pre?x trees instead of full output vocabulary expansions, and of performing hypothesis recombination after pruning. With open beam decoding, we reach a 6.2% / 10.9% word error rate on the Switchboard and CallHome Hub5 2000 evaluation testsets which compares favorably to other published single-model results on this corpus.

Tags:

sps conference

icassp 2020 virtual conference

May 2020

icassp 2020