Two-stage UNet with multi-axis gated multilayer perceptron for monaural noisy-reverberant speech enhancement
Zehua Zhang (Harbin Institute of Technology(Shenzhen)); Shiyun Xu (Harbin Institute of Technology(Shenzhen)); Xuyi Zhuang (Harbin Institute of Technology(Shenzhen)); Lianyu Zhou (Harbin Institute of Technology(Shenzhen)); Heng Li (Harbin Institute of Technology(Shenzhen)); Mingjiang Wang (Harbin Institute of Technology Shenzhen)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
In denoising and de-reverberation tasks, the dominant methods are complex spectral masking and complex spectral mapping. To combine advantages and improve speech enhancement performance, we propose a two-stage UNet (TSUNet) to estimate complex spectral masking and complex spectral mapping. We use a multi-axis gated multilayer perceptron to build global and local attention modules of linear complexity for extracting speech features. Furthermore, we use the residual channel attention block to further filter out important speech features. On the blind test dataset of the Deep Noise Suppression Challenge, our proposed TSUNet has a massive advantage over other state-of-the-art models. TSUNet performs significantly better than the most recent models at noisy-reverberant speech enhancement.