END-TO-END COMPLEX-VALUED MULTIDILATED CONVOLUTIONAL NEURAL NETWORK FOR JOINT ACOUSTIC ECHO CANCELLATION AND NOISE SUPPRESSION
Karn N. Watcharasupat, Thi Ngoc Tho Nguyen, Woon-Seng Gan, Shengkui Zhao, Bin Ma
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:10:42
Echo and noise suppression is an integral part of a full-duplex communication system. Many recent acoustic echo cancellation (AEC) systems rely on a separate adaptive filtering module for linear echo suppression and a neural module for residual echo suppression. However, in practice, adaptive filtering modules require time to converge and remain susceptible to changes in the acoustic environment. This introduces unnecessary delays to AEC systems using this two-stage framework, despite neural modules already having the capability to suppress both linear and nonlinear echo components. In this paper, we exploit the offset-compensating property of complex time-frequency masks and propose an end-to-end complex-valued neural network architecture. The building block of the proposed model is a pseudocomplex extension of the densely-connected multidilated DenseNet (D3Net), resulting in a very small network of only 354K parameters. The architecture utilized the multi-resolution nature of the D3Net to eliminate the need for pooling, allowing feature extraction using large receptive fields without any loss of output resolution. We also propose a dual-mask technique for joint echo and noise suppression with simultaneous speech enhancement. Evaluation on both synthetic and real test sets demonstrated promising results across multiple energy-based metrics and perceptual proxies.