Beamformer-Guided Target Speaker Extraction

Mohamed Elminshawi (International Audio Laboratories Erlangen); Srikanth Raj Chetupalli (Fraunhofer IIS); Emanuel Habets (AudioLabs Erlangen)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

07 Jun 2023

We propose a Beamformer-guided Target Speaker Extraction (BG-TSE) method to extract a target speaker’s voice from a multi-channel recording informed by the direction of arrival of the target. The proposed method employs a front-end beamformer steered towards the target speaker to provide an auxiliary signal to a single-channel TSE system. By allowing for time-varying embeddings in the single-channel TSE block, the proposed method fully exploits the correspondence between the front-end beamformer output and the target speech in the microphone signal. Experimental evaluation on simulated multi-channel 2-speaker mixtures, in both anechoic and reverberant conditions, demonstrates the advantage of the proposed method compared to recent single-channel and multi-channel baselines.

Tags:

Applications of machine learning

Beamformer-Guided Target Speaker Extraction

Mohamed Elminshawi (International Audio Laboratories Erlangen); Srikanth Raj Chetupalli (Fraunhofer IIS); Emanuel Habets (AudioLabs Erlangen)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Joint Cryo-ET Alignment and Reconstruction with Neural Deformation Fields

HDNet: Hierarchical Dynamic Network for Gait Recognition using Millimeter-Wave Radar

FINER-GRAINED DECOMPOSITION FOR PARALLEL QUANTUM MIMO PROCESSING

Join the IEEE Signal Processing Society