EXPERTS VERSUS ALL-ROUNDERS: TARGET LANGUAGE EXTRACTION FOR MULTIPLE TARGET LANGUAGES

Marvin Borsdorf, Kevin Scheck, Tanja Schultz, Haizhou Li

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:15:07

12 May 2022

Target language extraction (TLE) is a novel task in the field of selective auditory attention, which seeks to extract all speech signals that are spoken in a target language from other sources in a multilingual cocktail party. In our prior studies, a TLE model was trained to extract a predefined, single target language, referred to as Single-TLE. In this paper, we extend the Single-TLE framework to Multi-TLE. Multi-TLE models can also extract all speech signals of one specific target language, but they are optimized on a set of multiple target languages during training. As such, they learn the characteristics of several target languages and can replace multiple Single-TLE models without retraining. We perform experiments on the GlobalPhoneMCP database and incorporate a dynamic language mixing scheme for training. The Multi-TLE model does not only outperform Single-TLE models, but when given a language ID as additional input, it is also able to extract the speech of a specific target language from a mixture which contains multiple learned target languages.

Tags:

cocktail party problem

selective auditory attention

globalphone

target language extraction

multilingual

EXPERTS VERSUS ALL-ROUNDERS: TARGET LANGUAGE EXTRACTION FOR MULTIPLE TARGET LANGUAGES

Marvin Borsdorf, Kevin Scheck, Tanja Schultz, Haizhou Li

Value-Added Bundle(s) Including this Product

ICASSP 2022, May 2022 Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

IMPROVED LANGUAGE IDENTIFICATION THROUGH CROSS-LINGUAL SELF-SUPERVISED LEARNING

LANGUAGE ADAPTIVE CROSS-LINGUAL SPEECH REPRESENTATION LEARNING WITH SPARSE SHARING SUB-NETWORKS

MULTILINGUAL SECOND-PASS RESCORING FOR AUTOMATIC SPEECH RECOGNITION SYSTEMS

Join the IEEE Signal Processing Society