A TIME DOMAIN PROGRESSIVE LEARNING APPROACH WITH SNR CONSTRICTION FOR SINGLE-CHANNEL SPEECH ENHANCEMENT AND RECOGNITION

Zhaoxu Nian, Jun Du, Yu Ting Yeung, Renyu Wang

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:12:57

08 May 2022

Single-channel speech enhancement for automatic speech recognition (ASR) has been widely studied. However, most speech enhancement methods conduct over suppression and introduce distortion, which limits performance gains or even deteriorates the back-end performance. The key to solving this problem is preserving the integrity of speech while suppressing the background noises. Therefore, we propose a time domain progressive learning (TDPL) approach for speech enhancement and ASR. TDPL model consists of encoder, progressive enhancer and decoder. Both SNR-increased intermediate target with less speech distortion and clean target with better listening quality/intelligibility are learned, which are provided for ASR pre-processing and speech communication, respectively. Additionally, we also present an SNR constriction loss that is fit for TDPL to further improve ASR performance. We evaluate the proposed methods on CHiME-4 real evaluation set. The results show that the TDPL method significantly outperforms time domain speech enhancement methods and frequency domain progressive learning methods in ASR task, and the intermediate output of TDPL achieves a 36.3% relative word error rate reduction with a powerful ASR back-end without retraining. Moreover, the estimated clean output achieves certain improvement on CHiME-4 simulation evaluation set in terms of PESQ and STOI measures.

Tags:

progressive learning

automatic speech recognition

speech enhancement

time domain

snr constriction

A TIME DOMAIN PROGRESSIVE LEARNING APPROACH WITH SNR CONSTRICTION FOR SINGLE-CHANNEL SPEECH ENHANCEMENT AND RECOGNITION

Zhaoxu Nian, Jun Du, Yu Ting Yeung, Renyu Wang

Value-Added Bundle(s) Including this Product

ICASSP 2022, May 2022 Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Short Course Bundle: ICASSP 2022 COURSE 5: Speech Technology for Health: From Technical Foundations to Applications (Parts 1-3)

Audio Signal Enhancement: A Weakly Supervised Deep Learning Approach

Diffusion Models for Speech Enhancement and Restoration

Join the IEEE Signal Processing Society