METRIC LEARNING FOR USER-DEFINED KEYWORD SPOTTING
Jaemin Jung (KAIST); Youkyum Kim (KAIST); Jihwan Park (42dot Inc.); Youshin Lim (42dot); Byeong-Yeol Kim (42dot); Youngjoon Jang (KAIST); Joon Son Chung (KAIST)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
The goal of this work is to detect new spoken terms defined by users. While most previous works address Keyword Spotting (KWS) as a closed-set classification problem, this limits their transferability to unseen terms. The ability to define custom keywords has advantages in terms of user experience. In this paper, we propose a metric learning-based training strategy for user-defined keyword spotting. In particular, we make the following contributions: (1) we construct a large-scale keyword dataset with an existing speech corpus and propose a filtering method to remove data that degrade model training; (2) we propose a metric learning-based two-stage training strategy, and demonstrate that the proposed method improves the performance on the user-defined keyword spotting task by enriching their representations; (3) to facilitate the fair comparison in the user-defined KWS field, we propose unified evaluation protocol and metrics.
Our proposed system does not require an incremental training on the user-defined keywords, and outperforms previous works by a significant margin on the Google Speech Commands dataset using the proposed as well as the existing metrics.