Skip to main content

Exploring A Zero-Order Direct Hmm Based On Latent Attention For Automatic Speech Recognition

Parnia Bahar, Nikita Makarov, Albert Zeyer, Ralf Schlüter, Hermann Ney

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 13:23
04 May 2020

In this paper, we study a simple yet elegant latent variable attention model for automatic speech recognition (ASR) which enables an integration of attention sequence modeling into the direct hidden Markov model (HMM) concept. We use a sequence of hidden variables that establishes a mapping from output labels to input frames. Inspired by the direct HMM model, we assume a decomposition of the label sequence posterior into emission and transition probabilities using zero-order assumption and incorporate both Transformer and LSTM attention models into it. The method keeps the explicit alignment as part of the stochastic model and combines the ease of the end-to-end training of the attention model as well as an efficient and simple beam search. To study the effect of the latent model, we qualitatively analyze the alignment behavior of the different approaches. Our experiments on three ASR tasks show promising results in WER with more focused alignments in comparison to the attention models.

Value-Added Bundle(s) Including this Product

More Like This

  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00