Knowledge Distillation And Random Erasing Data Augmentation For Text-Dependent Speaker Verification
Victoria Mingote, Antonio Miguel, Dayana Ribas, Alfonso Ortega, Eduardo Lleida
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 10:53
This paper explores the Knowledge Distillation (KD) approach and a data augmentation technique to improve the generalization ability and robustness of text-dependent speaker verification (SV) systems. The KD method consists of two neural networks, known as Teacher and Student, where the student is trained to replicate the predictions from the teacher, so it learns their variability during the training process. To provide robustness to the distillation process, we apply Random Erasing (RE), a data augmentation technique which was created to improve the generalization ability of the neural networks. We have developed two alternatives of the combination of KD and RE, which, produce a more robust system with better performance, since the student network can learn from teacher predictions of data not existing in the original dataset. All the alternatives were tested on the RSR2015-Part I database, where the proposed variants outperform reference system based on a single network using RE.