Multi-Level Deep Neural Network Adaptation For Speaker Verification Using Mmd And Consistency Regularization

Weiwei Lin, Na Li, Dan Su, Dong Yu, Man-Mai Mak

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 12:37

04 May 2020

Adapting speaker verification (SV) systems to a new environment is a very challenging task. Current adaptation methods in SV mainly focus on the backend, i.e, adaptation is carried out after the speaker embeddings have been created. In this paper, we present a DNN-based adaptation method using maximum mean discrepancy (MMD). Our method exploits two important aspects neglected by previous research. First, instead of minimizing domain discrepancy at utterance-level alone, our method minimizes domain discrepancy at both frame-level and utterance-level, which we believe will make the adaptation more robust to the duration discrepancy between training data and test data. Second, we introduce a consistency regularization for unlabelled target-domain data. The consistency regularization encourages the target speaker embeddings robust to adverse perturbations. Experiments on NIST SRE 2016 and 2018 show that our DNN adaptation works significantly better than the previously proposed DNN adaptation methods. What's more, our method works well with backend adaptation. By combining the proposed method with backend adaptation, we achieve a 9% improvement over backend adaptation in SRE18.

Tags:

sps conference

icassp 2020 virtual conference

May 2020

icassp 2020