Masking And Inpainting: A Two-Stage Speech Enhancement Approach For Low Snr And Non-Stationary Noise
Xiang Hao, Xiangdong Su, Shixue Wen, Wei Chen, Zhiyu Wang, Feilong Bao, Yiqian Pan
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 13:31
Currently, low signal-to-noise ratio (SNR) and non-stationary noise cause severe performance degradation for most of speech enhancement models. For better speech enhancement at the above scenarios, this paper proposes a two-stage approach that consists of binary masking and spectrogram inpainting. In the binary masking stage, we first obtain binary mask by hardening soft mask and then use it to remove time-frequency points that are dominated by severe noise. In the spectrogram inpainting stage, we use a CNN with partial convolution to perform inpainting on the masked spectrogram from the previous stage. We compared our approach with two powerful baselines, including Wave-U-Net and CRN, on a low SNR dataset containing lots of non-stationary noises. The experimental results show that our approach outperformed the baselines and achieved the state-of-the-art performance.