Neural Target Speech and Sound Extraction: An Overview

Marc Delcroix

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 01:20:22

06 Jun 2024

Humans can listen to a desired sound within a complex acoustic scene consisting of a mixture of various sounds. This phenomenon, called the cocktail party effect or selective hearing, enables us to listen to an interlocutor in a noisy cafe, focus on a particular instrument in a song, or notice a siren on the road. One of the long-term goals of speech and audio processing research is to reproduce the selective hearing ability of humans computationally. In this webinar, the presenter will discuss target speech/sound extraction (TSE), which is one approach towards achieving this goal. TSE isolates the speech signal of a target speaker or a target sound from a mixture of several speakers or sounds using clues that identify the target in the mixture. Such clues might be a spatial clue indicating the direction of the target, a video of the target, or a prerecorded enrollment audio from which the speaker’s voice or the target sound characteristics can be derived. TSE is an emerging field of research that has received increased attention in recent years because it offers a practical approach to the cocktail party problem and involves aspects of signal processing such as audio, visual, and array processing, as well as deep learning. In this webinar, he will introduce the foundation and present recent research on neural-based TSE for speech and arbitrary sounds. The presenter will guide the audience through the different major approaches, emphasizing the similarities among frameworks and discussing potential future directions.

Tags:

SPS Webinar 2024

neural target speech

sound extraction

Speech and Language Processing

Neural Target Speech and Sound Extraction: An Overview

Marc Delcroix

More Like This

Teaching Foundation Models New Skills: Insights and Experiences

Federated Learning in The Age of Foundation Models

Alternating GD & Minimization (AltGDmin) for Fast Communication-Efficient Federated Learning

Join the IEEE Signal Processing Society