Exploring Sequence-to-Sequence Transformer-Transducer Models for Keyword Spotting

Beltrán Labrador (Audias - Universidad Autónoma de Madrid); Guanlong Zhao (Google); Ignacio Lopez Moreno (Google); Angelo Scorza Scarpati (Google); Liam Fowl (Google); Quan Wang (Google)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

07 Jun 2023

In this paper, we present a novel approach to adapt a sequence-to-sequence Transformer-Transducer ASR system to the keyword spotting (KWS) task. We achieve this by replacing the keyword in the text transcription with a special token and training the system to detect the token in an audio stream. At inference time, we create a decision function inspired by conventional KWS approaches, to make our approach more suitable for the KWS task. Furthermore, we introduce a specific keyword spotting loss by adapting the sequence-discriminative Minimum Bayes-Risk training technique. We find that our approach significantly outperforms ASR based KWS systems. When compared with a conventional keyword spotting system, our proposal has similar performance while bringing the advantages and flexibility of sequence-to-sequence training. Additionally, when combined with the conventional KWS system, our approach can improve the performance at any operation point.

Tags:

Word spotting, VAD, and other topics in speech recognition

Exploring Sequence-to-Sequence Transformer-Transducer Models for Keyword Spotting

Beltrán Labrador (Audias - Universidad Autónoma de Madrid); Guanlong Zhao (Google); Ignacio Lopez Moreno (Google); Angelo Scorza Scarpati (Google); Liam Fowl (Google); Quan Wang (Google)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

The DKU Post-Challenge Audio-Visual Wake Word Spotting System for the 2021 MISP Challenge: Deep Analysis

FEDERATED LEARNING FOR ASR BASED ON WAV2VEC 2.0

Peak-First CTC: Reducing the Peak Latency of CTC Models by Applying Peak-First Regularization

Join the IEEE Signal Processing Society