Half-temporal and half-frequency attention U2Net for speech signal improvement
Zehua Zhang (Harbin Institute of Technology(Shenzhen)); Shiyun Xu (Harbin Institute of Technology(Shenzhen)); Xuyi Zhuang (Harbin Institute of Technology(Shenzhen)); Yukun Qian (Harbin Institute of Technology (Shenzhen)); Lianyu Zhou (Harbin Institute of Technology(Shenzhen)); Mingjiang Wang (Harbin Institute of Technology Shenzhen)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
During communication, volume changes, noise, and reverberation can disturb speech signals, significantly affecting the quality and intelligibility of speech. In the context of the ICASSP 2023 Signal Processing Grand Challenge, the first Speech Signal Improvement Grand Challenge (SIG) is organized to improve the quality of speech signals during communication. This paper proposes half-temporal and half-frequency attention U$^2$Net for improving full-band speech signal. Channel-spectrum attention is proposed for the skip connection between the encoder and decoder. The proposed model achieves 0.353, 1.289, 0.604, 0.625, and 0.924 improvements in signal, noise, overall, reverberation, and loudness, respectively, in the SIG subjective test. The proposed model achieved fourth place in the SIG real-time track, showing excellent denoising and de-reverberation performance.