METRIC LEARNING FOR USER-DEFINED KEYWORD SPOTTING

Jaemin Jung (KAIST); Youkyum Kim (KAIST); Jihwan Park (42dot Inc.); Youshin Lim (42dot); Byeong-Yeol Kim (42dot); Youngjoon Jang (KAIST); Joon Son Chung (KAIST)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

06 Jun 2023

The goal of this work is to detect new spoken terms defined by users. While most previous works address Keyword Spotting (KWS) as a closed-set classification problem, this limits their transferability to unseen terms. The ability to define custom keywords has advantages in terms of user experience. In this paper, we propose a metric learning-based training strategy for user-defined keyword spotting. In particular, we make the following contributions: (1) we construct a large-scale keyword dataset with an existing speech corpus and propose a filtering method to remove data that degrade model training; (2) we propose a metric learning-based two-stage training strategy, and demonstrate that the proposed method improves the performance on the user-defined keyword spotting task by enriching their representations; (3) to facilitate the fair comparison in the user-defined KWS field, we propose unified evaluation protocol and metrics. Our proposed system does not require an incremental training on the user-defined keywords, and outperforms previous works by a significant margin on the Google Speech Commands dataset using the proposed as well as the existing metrics.

Tags:

Word spotting, VAD, and other topics in speech recognition