Slogd: Speaker Location Guided Deflation Approach To Speech Separation

Sunit Sivasankaran, Emmanuel Vincent, Dominique Fohr

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 11:53

04 May 2020

Speech separation is the process of separating multiple speakers from an audio recording. In this work we propose to separate the sources using a Speaker LOcalization Guided Deflation (SLOGD) approach wherein we estimate the sources iteratively. In each iteration we first estimate the location of the speaker and use it to estimate a mask corresponding to the localized speaker. The estimated source is removed from the mixture before estimating the location and mask of the next source. Experiments are conducted on a reverberated, noisy multichannel version of the well-studied WSJ-2MIX dataset using word error rate (WER) as a metric. The proposed method achieves a WER of 44.2%, a 34% relative improvement over the system without separation and 17% relative improvement over Conv-TasNet.

Tags:

sps conference

icassp 2020 virtual conference

May 2020

icassp 2020

Slogd: Speaker Location Guided Deflation Approach To Speech Separation

Sunit Sivasankaran, Emmanuel Vincent, Dominique Fohr

Value-Added Bundle(s) Including this Product

ICASSP 2020 Virtual Conference - Presentation Videos Product Bundle

More Like This

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

IEEE ICASSP 2024, 1 4-19 April 2024, Seoul, Korea. Conference Presentation Videos Bundle

ICIP 2022, October 16-19, 2022, Bordeaux, France - Presentation Videos Product Bundle

Join the IEEE Signal Processing Society