Fast Lattice-Free Keyword Filtering For Accelerated Spoken Term Detection
Jonathan Wintrode, Jenny Wilkes
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 11:50
We present a novel set of keyword detection techniques to accelerate spoken term detection for known queries with minimal loss in accuracy. Using only ASR frame-level acoustic posteriors we can train multiple models to effectively detect non-target segments for which we need not perform full lattice decoding. We estimate phone n-gram soft counts for each segment in a single pass over the frame-level output. From this we can efficiently detect a fixed set of keywords with both linear and DNN-based classifiers. Furthermore we can train the linear classifiers on a small number of labeled examples. Experiments on the PSC and VAST English subset of NIST's 2019 OpenSAT evaluation demonstrate we can filter out half of the test audio segments while only increasing the keyword miss rate by under 3%.