TFCNET: TIME-FREQUENCY DOMAIN CORRECTOR FOR SPEECH SEPARATION

Weinan Tong (Tsinghua University); Jiaxu Zhu (Tsinghua University); Jun Chen (Tsinghua University); Zhiyong Wu (Tsinghua University); Shiyin Kang (XVerse Inc.); Helen Meng (The Chinese University of Hong Kong)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

07 Jun 2023

Deep learning-based methods have made significant achievements in speech separation. Especially the time-domain separation methods have achieved the best performance in recent years. However, time-domain methods are unstable for waveform transformation, which is prone to amplitude and phase errors. Considering the robustness of time-frequency (T-F) domain methods, we propose an innovative network architecture called Time-Frequency Domain Corrector Network (TFCNet), which consists of a time-domain separator and a specially-designed T-F domain corrector. The corrector module is added after the time-domain separation step to correct the real and imaginary parts information in the T-F domain. The proposed model achieves state-of-the-art performance with an SI-SDRi of 22.2dB on the WSJ0-2mix dataset and an SI-SDRi of 19.4dB on the Libri-2mix dataset.

Tags:

Speech enhancement and separation

TFCNET: TIME-FREQUENCY DOMAIN CORRECTOR FOR SPEECH SEPARATION

Weinan Tong (Tsinghua University); Jiaxu Zhu (Tsinghua University); Jun Chen (Tsinghua University); Zhiyong Wu (Tsinghua University); Shiyin Kang (XVerse Inc.); Helen Meng (The Chinese University of Hong Kong)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Incorporating Visual Information Reconstruction into Progressive Learning for Optimizing Audio-Visual Speech Enhancement

Fast and Efficient Speech Enhancement with Variational Autoencoders

SINGLE-CHANNEL SPEECH ENHANCEMENT WITH DEEP COMPLEX U-NETWORKS AND PROBABILISTIC LATENT SPACE MODELS

Join the IEEE Signal Processing Society