Filterbank Learning for Noise-Robust Small-Footprint Keyword Spotting

Iván López-Espejo (Aalborg University); RAM CHARAN M CHANDRA SHEKAR (University of Texas at Dallas); Zheng-Hua Tan (Aalborg University); Jesper Jensen (Aalborg University); John H Hansen (Univ. of Texas at Dallas)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

06 Jun 2023

In the context of keyword spotting (KWS), the replacement of handcrafted speech features by learnable features has not yielded superior KWS performance. In this study, we demonstrate that filterbank learning outperforms handcrafted speech features for KWS whenever the number of filterbank channels is severely decreased. Reducing the number of channels might yield certain KWS performance drop, but also a substantial energy consumption reduction, which is key when deploying common always-on KWS on low-resource devices. Experimental results on a noisy version of the Google Speech Commands Dataset show that filterbank learning adapts to noise characteristics to provide a higher degree of robustness to noise, especially when dropout is integrated. Thus, switching from typically used 40-channel log-Mel features to 8-channel learned features leads to a relative KWS accuracy loss of only 3.5% while simultaneously achieving a 6.3x energy consumption reduction.

Tags:

Robust speech recognition and adaptation

Filterbank Learning for Noise-Robust Small-Footprint Keyword Spotting

Iván López-Espejo (Aalborg University); RAM CHARAN M CHANDRA SHEKAR (University of Texas at Dallas); Zheng-Hua Tan (Aalborg University); Jesper Jensen (Aalborg University); John H Hansen (Univ. of Texas at Dallas)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

DATA2VEC-AQC: SEARCH FOR THE RIGHT TEACHING ASSISTANT IN THE TEACHER-STUDENT TRAINING SETUP

BENCHMARK OF PHYSIOLOGICAL MODEL BASED AND DEEP LEARNING BASED REMOTE PHOTOPLETHYSMOGRAPHY IN AUTOMOTIVE

FAST AND PARALLEL DECODING FOR TRANSDUCER

Join the IEEE Signal Processing Society