Adaptive End-pointing with Deep Contextual Multi-armed Bandits

Do June Min (University of Michigan); Andreas Stolcke (Amazon); Anirudh Raju (Amazon Alexa); Colin Vaz (Amazon); Di He (Amazon Alexa); Venkatesh Ravichandran (Amazon); Viet Anh Trinh (Amazon)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

06 Jun 2023

Current endpointing (EP) solutions learn in a supervised framework, which does not allow the model to incorporate feedback and improve in an online setting. Also, it is common practice to utilize costly gridsearch to find the best configuration for an endpointing model. In this paper, we aim to provide a solution for adaptive endpointing by proposing an efficient method for choosing an optimal endpointing configuration given utterance-level audio features in an online setting, while avoiding hyperparameter grid-search. Our method does not require ground truth labels, and uses only online learning from reward signals. Specifically, we propose a deep contextual multi-armed bandit-based approach, combining the representational power of neural networks with the action exploration behavior of Thompson modeling algorithms. We compare our approach to several baselines, and show that our deep bandit models also succeed in reducing early cutoff errors while maintaining low latency.

Tags:

New algorithms and approaches for speech recognition

Adaptive End-pointing with Deep Contextual Multi-armed Bandits

Do June Min (University of Michigan); Andreas Stolcke (Amazon); Anirudh Raju (Amazon Alexa); Colin Vaz (Amazon); Di He (Amazon Alexa); Venkatesh Ravichandran (Amazon); Viet Anh Trinh (Amazon)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

I3D: Transformer architectures with input-dependent dynamic depth for speech recognition

Noise-aware target extension with self-distillation for robust speech recognition

PRACTICE OF THE CONFORMER ENHANCED AUDIO-VISUAL HUBERT ON MANDARIN AND ENGLISH

Join the IEEE Signal Processing Society