Contrastive-Mixup Learning for Improved Speaker Verification

Xin Zhang, Minho Jin, Roger Cheng, Ruirui Li, Eunjung Han, Andreas Stolcke

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:13:30

11 May 2022

This paper proposes a novel formulation of prototypical loss with mixup for speaker verification. Mixup is a simple yet efficient data augmentation technique that fabricates a weighted combination of random data point and label pairs for deep neural network training. Mixup has attracted increasing attention due to its ability to improve robustness and generalization of deep neural networks. Although mixup has shown success in diverse domains, most applications have centered around closed-set classification tasks. In this work, we propose contrastive-mixup, a novel augmentation strategy that learns distinguishing representations based on a distance metric. During training, mixup operations generate convex interpolations of both inputs and virtual labels. Moreover, we have reformulated the prototypical loss function such that mixup is enabled on metric learning objectives. To demonstrate its generalization given limited training data, we conduct experiments by varying the number of available utterances from each speaker in the VoxCeleb database. Experimental results show that applying contrastive-mixup outperforms the existing baseline, resulting in up to 16% error reduction, especially when the number of training utterances per speaker is limited.

Tags:

speaker verification

mixup

prototypical loss

metric learning

Contrastive-Mixup Learning for Improved Speaker Verification

Xin Zhang, Minho Jin, Roger Cheng, Ruirui Li, Eunjung Han, Andreas Stolcke

Value-Added Bundle(s) Including this Product

ICASSP 2022, May 2022 Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

ADAPTIVE SEMI-SUPERVISED MIXUP WITH IMPLICIT LABEL LEARNING AND SAMPLE RATIO BALANCING

PROGRESSIVE MIXUP AUGMENTED TEACHER-STUDENT LEARNING FOR UNSUPERVISED DOMAIN ADAPTATION

Product Image Representation Learning on Large Scale Noisy Datasets

Join the IEEE Signal Processing Society