Speakerfilter: Deep Learning-Based Target Speaker Extraction Using Anchor Speech

ShuLin He, Hao Li, Xueliang Zhang

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 13:01

04 May 2020

Speaker extraction aims to separate a target speaker from multiple voices which is useful for applications, e.g. teleconference. In many practical cases, it has an opportunity to get a piece voice of the target speaker in advance, which provides useful information for speaker extraction. This paper addresses the problem of extracting the target speaker from the mixture using a short piece of anchor speech. To effectively utilize anchor speech, we propose a multi-level feature extraction and seamlessly integrate the features into a speech separation model. Experiments are conducted on the two-speaker dataset (WSJ0-mix2) which is widely used for speaker extraction. The systematic evaluation shows that the proposed method significantly outperforms the previous methods and achieves a signal-to-distortion ratio (SDR) improvement of 11.3 dB on the unprocessed mixture.

Tags:

sps conference

icassp 2020 virtual conference

May 2020

icassp 2020

Speakerfilter: Deep Learning-Based Target Speaker Extraction Using Anchor Speech

ShuLin He, Hao Li, Xueliang Zhang

Value-Added Bundle(s) Including this Product

ICASSP 2020 Virtual Conference - Presentation Videos Product Bundle

More Like This

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

IEEE ICASSP 2024, 1 4-19 April 2024, Seoul, Korea. Conference Presentation Videos Bundle

ICIP 2022, October 16-19, 2022, Bordeaux, France - Presentation Videos Product Bundle

Join the IEEE Signal Processing Society