EXPERTS VERSUS ALL-ROUNDERS: TARGET LANGUAGE EXTRACTION FOR MULTIPLE TARGET LANGUAGES
Marvin Borsdorf, Kevin Scheck, Tanja Schultz, Haizhou Li
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:15:07
Target language extraction (TLE) is a novel task in the field of selective auditory attention, which seeks to extract all speech signals that are spoken in a target language from other sources in a multilingual cocktail party. In our prior studies, a TLE model was trained to extract a predefined, single target language, referred to as Single-TLE. In this paper, we extend the Single-TLE framework to Multi-TLE. Multi-TLE models can also extract all speech signals of one specific target language, but they are optimized on a set of multiple target languages during training. As such, they learn the characteristics of several target languages and can replace multiple Single-TLE models without retraining. We perform experiments on the GlobalPhoneMCP database and incorporate a dynamic language mixing scheme for training. The Multi-TLE model does not only outperform Single-TLE models, but when given a language ID as additional input, it is also able to extract the speech of a specific target language from a mixture which contains multiple learned target languages.