Text-Independent Speaker Verification With Adversarial Learning On Short Utterances
Kai Liu, Huan Zhou
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 11:16
A text-independent speaker verification system suffers severe performance degradation under short utterance condition. To address the problem, in this paper, we propose an adversarially learned embedding mapping model that directly map short embedding to enhanced embedding with more discriminability. In particular, a Wasserstein GAN and various alternative loss functions are proposed. These loss function have distinct optimization objectives and some of them are uncommon to the speaker verification research area. Different from most prior studies, our main objective in this study is to investigate the effectiveness of those loss functions by conducting numerous ablation studies. Experiments on Voxceleb dataset verified some of loss functions are beneficial. Additionally, some compelling findings on uncommon loss functions confirm the potential of our study. Lastly, our proposed system, even without any fine-tuning, achieves meaningful advancements over the baseline, with 4% relative improvements on EER and 7% on minDCF for the challenging 2sec-2sec speaker verification.