TFCNET: TIME-FREQUENCY DOMAIN CORRECTOR FOR SPEECH SEPARATION
Weinan Tong (Tsinghua University); Jiaxu Zhu (Tsinghua University); Jun Chen (Tsinghua University); Zhiyong Wu (Tsinghua University); Shiyin Kang (XVerse Inc.); Helen Meng (The Chinese University of Hong Kong)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Deep learning-based methods have made significant achievements in speech separation. Especially the time-domain separation methods have achieved the best performance in recent years. However, time-domain methods are unstable for waveform transformation, which is prone to amplitude and phase errors. Considering the robustness of time-frequency (T-F) domain methods, we propose an innovative network architecture called Time-Frequency Domain Corrector Network (TFCNet), which consists of a time-domain separator and a specially-designed T-F domain corrector. The corrector module is added after the time-domain separation step to correct the real and imaginary parts information in the T-F domain. The proposed model achieves state-of-the-art performance with an SI-SDRi of 22.2dB on the WSJ0-2mix dataset and an SI-SDRi of 19.4dB on the Libri-2mix dataset.