Lightweight feature encoder for wake-up word detection based on self-supervised speech representation

Hyungjun Lim (LG AI Research); Younggwan Kim (LG AI Research); Kiho Yeom (LG AI Research); Eunjoo Seo (LG AI Research); Hoodong Lee (LG AI Research); Stanley Jungkyu Choi (LG AI Research); Honglak Lee (LG AI Research)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

07 Jun 2023

Self-supervised learning method that provides generalized speech representations has recently received increasing attention. Wav2vec 2.0 is the most famous example, showing remarkable performance in numerous downstream speech processing tasks. Despite its success, it is challenging to use it directly for wake-up word detection on mobile devices due to its expensive computational cost. In this work, we propose LiteFEW, a lightweight feature encoder for wake-up word detection that preserves the inherent ability of wav2vec 2.0 with a minimum scale. In the method, the knowledge of the pre-trained wav2vec 2.0 is compressed by introducing an auto-encoder-based dimensionality reduction technique and distilled to LiteFEW. Experimental results on the open-source "Hey Snips" dataset show that the proposed method applied to various model structures significantly improves the performance, achieving over 20% of relative improvements with only 64k parameters.

Tags:

Word spotting, VAD, and other topics in speech recognition

Lightweight feature encoder for wake-up word detection based on self-supervised speech representation

Hyungjun Lim (LG AI Research); Younggwan Kim (LG AI Research); Kiho Yeom (LG AI Research); Eunjoo Seo (LG AI Research); Hoodong Lee (LG AI Research); Stanley Jungkyu Choi (LG AI Research); Honglak Lee (LG AI Research)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

The DKU Post-Challenge Audio-Visual Wake Word Spotting System for the 2021 MISP Challenge: Deep Analysis

FEDERATED LEARNING FOR ASR BASED ON WAV2VEC 2.0

Peak-First CTC: Reducing the Peak Latency of CTC Models by Applying Peak-First Regularization

Join the IEEE Signal Processing Society