An Online Speaker-Aware Speech Separation Approach Based On Time-Domain Representation

Hui Wang, Yan Song, Zeng-Xi Li, Li-Rong Dai, Ian McLoughlin

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 13:49

04 May 2020

Despite the significant progress of deep learning based speech separation methods, it remains challenging to extract and track the speech from target speakers, especially in a single-channel multiple speaker situation. Previously, the authors proposed a source-aware context network to exploit the temporal context in mixtures and estimated sources for online speech separation. In this paper, we propose a speaker-aware approach based on the source-aware context network structure, in which the speaker information is explicitly modeled by an auxiliary speaker identification branch. Then speech separation and speaker tracking can be jointly optimized by multi-task learning. Furthermore, we study the effectiveness of time-domain representation by proposing a raw sparse waveform encoder to preserve discriminative information. Experimental results on the WSJ0-2mix benchmark show that the proposed system significantly improves Signal-to-Distortion Ratio (SDR) performance.

Tags:

sps conference

icassp 2020 virtual conference

May 2020

icassp 2020

An Online Speaker-Aware Speech Separation Approach Based On Time-Domain Representation

Hui Wang, Yan Song, Zeng-Xi Li, Li-Rong Dai, Ian McLoughlin

Value-Added Bundle(s) Including this Product

ICASSP 2020 Virtual Conference - Presentation Videos Product Bundle

More Like This

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

IEEE ICASSP 2024, 1 4-19 April 2024, Seoul, Korea. Conference Presentation Videos Bundle

ICIP 2022, October 16-19, 2022, Bordeaux, France - Presentation Videos Product Bundle

Join the IEEE Signal Processing Society