EXPLORE RELATIVE AND CONTEXT INFORMATION WITH TRANSFORMER FOR JOINT ACOUSTIC ECHO CANCELLATION AND SPEECH ENHANCEMENT
Xingwei Sun, Chenbin Cao, Qinglong Li, Linzhang Wang, Fei Xiang
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:08:04
This paper proposes a joint acoustic echo cancellation (AEC) and speech enhancement method with adaptive filter and deep neural network (DNN) model. A partitioned block adaptive filter is adopted for linear AEC followed by a convolutional neural network and transformer based model to suppress the residual echo, noise, and reverberation. The DNN model has three modules: encoder, dual-path transformer (DPT) and decoder. The encoder is adopted to explore the potential relationships of far-end and near-end signals with the attention mechanism of transformer. The DPT module is further used to explore context information in both time and frequency dimension. The attention mask is used in transformer to realize real-time process. The complex spectra mask is finally estimated by the decoder to recover the target speech. Our proposed DNN model is trained on the ICASSP 2022 AEC Challenge datasets and placed fourth in the challenge with satisfactory performance on subjective and word acceptance rate evaluation.