Discriminative Speaker Representation via Contrastive Learning with Class-Aware Attention in Angular Space
Zhe LI (Hong Kong Polytechnic University); Man-Wai MAK (The Hong Kong Polytechnic University); Helen Meng (The Chinese University of Hong Kong)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
The challenges in applying contrastive learning to speaker verification (SV) are that the softmax-based contrastive loss lacks discriminative power and that the hard negative pairs can easily influence learning. To overcome the first challenge, we propose a contrastive learning SV framework incorporating an additive angular margin into the supervised contrastive loss in which the margin improves the speaker representation's discrimination ability. For the second challenge, we introduce a class-aware attention mechanism through which hard negative samples contribute less significantly to the supervised contrastive loss. We also employed gradient-based multi-objective optimization to balance the classification and contrastive loss. Experimental results on CN-Celeb and Voxceleb1 show that this new learning objective can cause the encoder to find an embedding space that exhibits great speaker discrimination across languages.