Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 12:50
04 May 2020

In this paper, we present our efforts towards developing a robust automatic speaker verification (ASV) system for children when the domain-specific data is limited. For that purpose, we have studied the effect of in-domain and out-of-domain data augmentation. Several different combinations of data augmentation are studied in this work. Speed and pitch perturbation of children's speech are employed for synthetically creating in-domain data to be used for augmentation. For out-of-domain data augmentation, on the other hand, adults' speech is pooled together with children's speech. At the same time, voice conversion (VC) is also applied on adults' speech to alter the acoustic attributes. VC of adults' speech makes it perceptually similar to that of children's speech. The converted adults' data is then used for augmentation. The ASV systems developed in this study employ x-vectors derived using a time-delay deep neural network. In addition to that, probabilistic linear discriminant analysis (PLDA) is used for scoring the performance. The explored methods of data augmentation are noted to reduce the equal error rate as well as minimum decision cost function by a large margin.

Value-Added Bundle(s) Including this Product

More Like This

  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00