TFPSNET: TIME-FREQUENCY DOMAIN PATH SCANNING NETWORK FOR SPEECH SEPARATION

Lei Yang, Wei Liu, Weiqin Wang

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:09:11

09 May 2022

Speech separation has been very successful with deep learning techniques. In this paper, we propose time-frequency (T-F) domain path scanning network (TFPSNet) for speech separation task. The connections between frequency bins in frequency path, time path, and T-F path are modeled by transformer. We also introduce T-F path loss function to improve the performance further. The proposed TFPSNet could learn more details of frequency structure and separate the feature in T-F domain. Experiments show that proposed model achieves state-of-the-art (SOTA) performance on public WSJ0-2mix datasets. It reaches 21.1dB SI-SDRi on WSJ0-2mix, and 19.7dB SI-SDRi on Libri-2mix. Furthermore, our approach has good generalizability. The model trained on WSJ0-2mix dataset achieves 18.7dB SI-SDRi on Libri-2mix test set without any fine-tuning work. This result is even 0.5dB higher than DPTNet trained on Libri-2mix dataset.

Tags:

speech separation

source separation

t-f domain

deep learning

transformer

TFPSNET: TIME-FREQUENCY DOMAIN PATH SCANNING NETWORK FOR SPEECH SEPARATION

Lei Yang, Wei Liu, Weiqin Wang

Value-Added Bundle(s) Including this Product

ICASSP 2022, May 2022 Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Signal Processing and Deep Learning for Practical Active Noise Control

Short Course Bundle: ICASSP 2023 COURSE 2: Graph Signal Processing and Geometric Learning: A Foundational Approach (Parts 1-4)

Short Course Bundle: ICASSP 2023 COURSE 1: A Hands-on Approach for Implementing Stochastic Optimization Algorithms from Scratch (Parts 1-4)

Join the IEEE Signal Processing Society