IMPROVING ACOUSTIC ECHO CANCELLATION BY MIXING SPEECH LOCAL AND GLOBAL FEATURES WITH TRANSFORMER
yajie liu (School of Computer Science, Wuhan University); Xinmeng Xu (Wuhan University); Weiping Tu (Wuhan University); Yuhong Yang (Wuhan University); Li Xiao (School of Computer Science, Wuhan University)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
We propose MiT-Net, a novel mix-transformer neural network with a pyramid encoder operating in the time domain, for the task of acoustic echo cancellation. The MiT-Net formulates acoustic echo cancellation as a supervised speech separation problem, in which near-end speech is separated from a single microphone recording and sent to the far end, and consists of two key components. First, we apply a pyramid encoder, which adopts the coarse-to-fine structure, to extract the latent correlations between double-end signals and to fuse them in a multiscale manner. Second, we propose a mix-transformer, a combination of local and global attention in a parallel way, to leverage local and global speech information for separation. Experimental results show that the proposed method outperforms recent AEC methods in terms of objective evaluation metrics. In addition, exploring the correlation between speech local and global features by using the mix-transformer significantly improves the system performance and shows more robustness than the conventional transformer.