DYNAMIC SPEECH ENDPOINT DETECTION WITH REGRESSION TARGETS

Dawei Liang (UT Austin); Hang Su (Meta Platforms Inc); Tarun Singh (Meta Platforms Inc); Jay Mahadeokar (Meta Platforms Inc); Shanil Puri (Meta Platforms Inc); Jiedan Zhu (Meta Platforms Inc); Edison Thomaz (The University of Texas at Austin); Mike Seltzer (Meta Platforms Inc)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

06 Jun 2023

Interactive voice assistants have been widely used as input interfaces in various scenarios, e.g. on smart homes devices, wearables and on AR devices. Detecting the end of a speech query, i.e. speech end-pointing, is an important task for voice assistants to interact with users. Traditionally, speech end-pointing is based on pure classification methods along with arbitrary binary targets. In this paper, we propose a novel regression-based speech end-pointing model, which enables an end-pointer to adjust its detection behavior based on context of user queries. Specifically, we present a pause modeling method and show its effectiveness for dynamic end-pointing. Based on our experiments with vendor-collected smartphone and wearables speech queries, our strategy shows a better trade-off between endpointing latency and accuracy, compared to the traditional classification-based method. We further discuss the benefits of this model and generalization of the framework in the paper.

Tags:

Segmentation, tagging, and parsing

DYNAMIC SPEECH ENDPOINT DETECTION WITH REGRESSION TARGETS

Dawei Liang (UT Austin); Hang Su (Meta Platforms Inc); Tarun Singh (Meta Platforms Inc); Jay Mahadeokar (Meta Platforms Inc); Shanil Puri (Meta Platforms Inc); Jiedan Zhu (Meta Platforms Inc); Edison Thomaz (The University of Texas at Austin); Mike Seltzer (Meta Platforms Inc)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

SIAST: A Slot Imbalance-Aware Self-Training Scheme for Semi-Supervised Slot Filling

Absolute decision corrupts absolutely: conservative online speaker diarisation

ANCIENT CHINESE WORD SEGMENTATION AND PART-OF-SPEECH TAGGING USING DISTANT SUPERVISION

Join the IEEE Signal Processing Society