A LIGHTWEIGHT FOURIER CONVOLUTIONAL ATTENTION ENCODER FOR MULTI-CHANNEL SPEECH ENHANCEMENT

Siyu Sun (Wuhan University); Jian Jin (RTC Lab, ByteDance); Zhe Han (RTC Lab, ByteDance); Xianjun Xia (RTC Lab, ByteDance); Li Chen (ByteDance ); Yijian Xiao (RTC Lab, ByteDance); Piao Ding (RTC Lab, ByteDance); Shenyi Song (RTC Engineering, ByteDance); Roberto Togneri (The University of Western Australian); Haijian Zhang (Wuhan University)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

06 Jun 2023

Beamforming weights prediction via deep neural networks has been one of the main methods in multi-channel speech enhancement tasks. The spectral-spatial cues are crucial in beamforming weights estimation, however, many existing works fail to optimally predict the beamforming weights with an absence of adequate spectral-spatial information learning. To tackle this challenge, we propose a Fourier convolutional attention encoder (FCAE) to provide a global receptive field over the frequency axis and boost the learning of spectral contexts and cross-channel features. Besides, a new convolutional recurrent encoder-decoder (CRED) structure is proposed in this work, within which FCAEs, attention blocks with skip connections and a deep feedback sequential memory network (DFSMN) serving as recurrent module are involved. The proposed CRED structure is exploited to capture the spectral-spatial joint information to obtain accurate estimation of beamforming weights. Experimental results demonstrate the superiority of the proposed approach with only 0.74M parameters and a PESQ improvement from 2.225 to 2.359 on the ConferencingSpeech2021 challenge development test set.

Tags:

Audio signal enhancement and restoration

A LIGHTWEIGHT FOURIER CONVOLUTIONAL ATTENTION ENCODER FOR MULTI-CHANNEL SPEECH ENHANCEMENT

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

MAID: A Conditional Diffusion Model For Long Music Audio Inpainting

CENTRALIZED CASCADE MULTI-CHANNEL NOISE REDUCTION AND ACOUSTIC FEEDBACK CANCELLATION IN A WIRELESS ACOUSTIC SENSOR AND ACTUATOR NETWORK

A MODEL-BASED HEARING COMPENSATION METHOD USING A SELF-SUPERVISED FRAMEWORK

Join the IEEE Signal Processing Society