Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 00:12:57
08 May 2022

Single-channel speech enhancement for automatic speech recognition (ASR) has been widely studied. However, most speech enhancement methods conduct over suppression and introduce distortion, which limits performance gains or even deteriorates the back-end performance. The key to solving this problem is preserving the integrity of speech while suppressing the background noises. Therefore, we propose a time domain progressive learning (TDPL) approach for speech enhancement and ASR. TDPL model consists of encoder, progressive enhancer and decoder. Both SNR-increased intermediate target with less speech distortion and clean target with better listening quality/intelligibility are learned, which are provided for ASR pre-processing and speech communication, respectively. Additionally, we also present an SNR constriction loss that is fit for TDPL to further improve ASR performance. We evaluate the proposed methods on CHiME-4 real evaluation set. The results show that the TDPL method significantly outperforms time domain speech enhancement methods and frequency domain progressive learning methods in ASR task, and the intermediate output of TDPL achieves a 36.3% relative word error rate reduction with a powerful ASR back-end without retraining. Moreover, the estimated clean output achieves certain improvement on CHiME-4 simulation evaluation set in terms of PESQ and STOI measures.

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00