In-Domain And Out-Of-Domain Data Augmentation To Improve Children''s Speaker Verification System In Limited Data Scenario
Syed Shahnawazuddin, Waquar Ahmad, Nagaraj Adiga, Avinash Kumar
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 12:50
In this paper, we present our efforts towards developing a robust automatic speaker verification (ASV) system for children when the domain-specific data is limited. For that purpose, we have studied the effect of in-domain and out-of-domain data augmentation. Several different combinations of data augmentation are studied in this work. Speed and pitch perturbation of children's speech are employed for synthetically creating in-domain data to be used for augmentation. For out-of-domain data augmentation, on the other hand, adults' speech is pooled together with children's speech. At the same time, voice conversion (VC) is also applied on adults' speech to alter the acoustic attributes. VC of adults' speech makes it perceptually similar to that of children's speech. The converted adults' data is then used for augmentation. The ASV systems developed in this study employ x-vectors derived using a time-delay deep neural network. In addition to that, probabilistic linear discriminant analysis (PLDA) is used for scoring the performance. The explored methods of data augmentation are noted to reduce the equal error rate as well as minimum decision cost function by a large margin.