Time-Domain Speech Extraction With Spatial Information And Multi Speaker Conditioning Mechanism

Jisi Zhang, Cătălin Zorilă, Rama Doddipatla, Jon Barker

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:12:53

09 Jun 2021

In this paper, we present a novel multi-channel speech extraction system to simultaneously extract multiple clean individual sources from a mixture in noisy and reverberant environments. The proposed method is built on an improved multi-channel time-domain speech separation network which employs speaker embeddings to identify and extract multiple targets without label permutation ambiguity. To efficiently inform the speaker information to the extraction model, we propose a new speaker conditioning mechanism by designing an additional speaker branch for receiving external speaker embedding. Experiments on 2-channel WHAMR! data show that the proposed system improves by 9% relative the source separation performance over a strong multi-channel baseline, and it increases the speech recognition accuracy by more than 16% relative over the same baseline.

Chairs:

Dorothea Kolossa

Tags:

signal processing society

IEEE icassp 2021

virtual conference

2021

sps

virtual conference icassp 2021

june 6-11 2021

icassp 2021

Time-Domain Speech Extraction With Spatial Information And Multi Speaker Conditioning Mechanism

Jisi Zhang, Cătălin Zorilă, Rama Doddipatla, Jon Barker

Value-Added Bundle(s) Including this Product

ICASSP 2021 Virtual Conference - Presentation Videos Product Bundle

More Like This

Welcome and Opening Remarks for the IEEE SustainTech Leadership Forum

Panel: Building Sustainable Cities for Tomorrow

Panel: Unleashing the Potential of Virtual Power Plants for Sustainable Energy Solutions

Join the IEEE Signal Processing Society