Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 14:54
04 May 2020

We propose an environment adaptation approach that improves deep speech enhancement models via minimizing the Kullback- Leibler divergence between posterior probabilities produced by a multi-condition senone classifier (teacher) fed with noisy speech features and a clean-condition senone classifier (student) fed with enhanced speech features to transfer an existing deep neural network (DNN) speech enhancer to specific noisy environments without using noisy/clean paired target waveforms needed in conventional DNN-based spectral regression. Our solution not only improves listening quality in the enhanced speech but also boosts noise robustness of existing automatic speech recognition (ASR) systems trained on clean data if employed as a pre-processing step before speech feature extraction. Experimental results show steady gains in objective quality measurements as a result of a teacher network producing adaptation targets for a student enhancement model to adjust its parameters in unseen noise conditions. The proposed technique is particularly advantageous in environments that are not handled effectively by the unadapted DNN-based enhancer, as we find that only very little data from a specific operating condition is required to yield good improvements. Finally, higher gains in speech quality directly translate to larger improvements in ASR.

Value-Added Bundle(s) Including this Product

More Like This

  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00