DISENTANGLED SPEAKER EMBEDDING FOR ROBUST SPEAKER VERIFICATION

Lu Yi, Man-Wai Mak

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:12:59

11 May 2022

Entanglement of speaker features and redundant features may lead to poor performance when evaluating speaker verification systems on an unseen domain. To address this issue, we propose an InfoMax domain separation and adaptation network (InfoMax?DSAN) to disentangle the domain-specific features and domain-invariant speaker features based on domain adaptation techniques. A frame-based mutual information neural estimator is proposed to maximize the mutual information between frame-level features and input acoustic features, which can help retain more useful information. Furthermore, we propose adopting triplet loss based on the idea of self-supervised learning to overcome the label mismatch problem. Experimental results on VOiCES Challenge 2019 demonstrate that our proposed method can help learn more discriminative and robust speaker embeddings.

Tags:

self-supervised learning

mutual information

domain adaptation

speaker verification

DISENTANGLED SPEAKER EMBEDDING FOR ROBUST SPEAKER VERIFICATION

Lu Yi, Man-Wai Mak

Value-Added Bundle(s) Including this Product

ICASSP 2022, May 2022 Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Short Course Bundle: ICIP 2023 COURSE 2: Short Course: Unboxing Advancements in Biomedical Image Processing (Parts 1-4)

Slides: The Changing Landscape of Speech Foundation Models

The Changing Landscape of Speech Foundation Models

Join the IEEE Signal Processing Society