Mining Effective Negative Training Samples For Keyword Spotting
Jingyong Hou, Yangyang Shi, Mari Ostendorf, Mei-Yuh Hwang, Lei Xie
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 14:49
Max-pooling neural network architectures have been proven to be useful for keyword spotting (KWS), but standard training methods suffer from a class-imbalance problem when using all frames from negative utterances. To address the problem, we propose an innovative algorithm, Regional Hard-Example (RHE) mining, to find effective negative training samples, in order to control the ratio of negative vs. positive data. To maintain the diversity of the negative samples, multiple non-contiguous difficult frames per negative training utterance are dynamically selected during training, based on the model statistics at each training epoch. Further, to improve model learning, we introduce a weakly constrained max-pooling method for positive training utterances, which constrains max-pooling over the keyword ending frames only at early stages of training. Finally, data augmentation is combined to bring further improvement. We assess the algorithms by conducting experiments on wake-up word detection tasks with two different neural network architectures. The experiments consistently show that the proposed methods provide significant improvements compared to a strong baseline. At a false alarm rate of once per hour, our methods achieve 45-58% relative reduction in false rejection rates over a strong baseline.