A Study Of Child Speech Extraction Using Joint Speech Enhancement And Separation In Realistic Conditions
Xin Wang, Jun Du, Lei Sun, Alejandrina Cristia, Chin-Hui Lee
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 13:55
In this paper, we design a novel joint framework of speech enhancement and speech separation for child speech extraction in realistic conditions, targeting the problem of extracting child speech from daily conversations in BabyTrain mega corpus. To the best of our knowledge, it is the first discussion of a feasible method for child speech extraction in realistic conditions. First, we make detailed analysis of the BabyTrain mega corpus, which is recorded in adverse environments. We observe problems of background noises, reverberations and child speech that is partially obscured by adult speech (for instance due to speaker overlap but also imitation by the adult). Motivated by this, we conduct a joint framework of speech enhancement and speech separation for child speech extraction. To measure the extraction results in realistic conditions, we propose several objective measurements to evaluate the performance of the our system, which is different from those commonly used for simulation data. Compared with the unprocessed approach and classification approach, our proposed approach can yield the best performance among all subsets of BabyTrain.