Skip to main content

ATTENTION BACK-END FOR AUTOMATIC SPEAKER VERIFICATION WITH MULTIPLE ENROLLMENT UTTERANCES

Chang Zeng, Junichi Yamagishi, Xin Wang, Erica Cooper, Xiaoxiao Miao

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 00:15:04
09 May 2022

Probabilistic linear discriminant analysis (PLDA) or cosine similarity have been widely used in traditional speaker verification systems as back-end techniques to measure pairwise similarities. To make better use of multiple enrollment utterances, we propose a novel attention back-end model that is applied on the utterance-level features. Specifically, we use scaled-dot self-attention and feed-forward self-attention networks as architectures that learn the intra-relationships of enrollment utterances. To verify the proposed model, we conduct a series of experiments on the CNCeleb and VoxCeleb datasets by combining it with several state-of-the-art speaker encoders including TDNN and ResNet. Experimental results obtained using multiple enrollment utterances on CNCeleb show that the proposed attention back-end model leads to lower EER and minDCF scores than its PLDA and cosine similarity counterparts for each speaker encoder, and an experiment on VoxCeleb demonstrates that our model can be used even for a single enrollment case.

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00