Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 11:50
04 May 2020

We present a novel set of keyword detection techniques to accelerate spoken term detection for known queries with minimal loss in accuracy. Using only ASR frame-level acoustic posteriors we can train multiple models to effectively detect non-target segments for which we need not perform full lattice decoding. We estimate phone n-gram soft counts for each segment in a single pass over the frame-level output. From this we can efficiently detect a fixed set of keywords with both linear and DNN-based classifiers. Furthermore we can train the linear classifiers on a small number of labeled examples. Experiments on the PSC and VAST English subset of NIST's 2019 OpenSAT evaluation demonstrate we can filter out half of the test audio segments while only increasing the keyword miss rate by under 3%.

Value-Added Bundle(s) Including this Product

More Like This

  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00