PUSHING THE LIMITS OF SELF-SUPERVISED SPEAKER VERIFICATION USING REGULARIZED DISTILLATION FRAMEWORK

Yafeng Chen (Speech Lab, Alibaba Group); Siqi Zheng (Alibaba Group); Hui Wang (Speech Lab, Alibaba Group); Luyao Cheng (Speech Lab, Alibaba Group); Qian Chen (Speech Lab, DAMO Academy, Alibaba Group)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

06 Jun 2023

Training robust speaker verification systems without speaker labels has long been a challenging task. Previous studies observed a large performance gap between self-supervised and fully supervised methods. In this paper, we apply a non-contrastive self-supervised learning framework called DIstillation with NO labels (DINO) and propose two regularization terms applied to embeddings in DINO. One regularization term guarantees the diversity of the embeddings, while the other regularization term decorrelates the variables of each embedding. The effectiveness of various data augmentation techniques are explored, on both time and frequency domain. A range of experiments conducted on the VoxCeleb datasets demonstrate the superiority of the regularized DINO framework in speaker verification. Our method achieves the state-of-the-art speaker verification performance under a single-stage self-supervised setting on VoxCeleb.

Tags:

Speaker verification and anti-spoofing

PUSHING THE LIMITS OF SELF-SUPERVISED SPEAKER VERIFICATION USING REGULARIZED DISTILLATION FRAMEWORK

Yafeng Chen (Speech Lab, Alibaba Group); Siqi Zheng (Alibaba Group); Hui Wang (Speech Lab, Alibaba Group); Luyao Cheng (Speech Lab, Alibaba Group); Qian Chen (Speech Lab, DAMO Academy, Alibaba Group)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Improving Transformer-Based Networks with Locality for Automatic Speaker Verification

Learning From Yourself: A Self-Distillation Method for Fake Speech Detection

Predictive SkiM: Contrastive Predictive Coding for Low-Latency Online Speech Separation

Join the IEEE Signal Processing Society