Multi-Resolution Multi-Head Attention In Deep Speaker Embedding

Zhiming Wang, Shuo Fang, Kaisheng Yao, Xiaolong Li

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 14:58

04 May 2020

Pooling is an essential component to capture long-term speaker characteristics for speaker recognition. This paper proposes simple but effective pooling methods to compute attentive weights for better temporal aggregation over the variable-length input speech, enabling the end-to-end neural network to have improved performance for discriminating among speakers. Particularly, we observe that using multiple heads for attentive pooling over the entire encoded sequence, a method we term as global multi-head attention, significantly improves performance in comparison to various pooling methods, including the recently proposed multi-head attention [1]. To improve diversity of attention heads, we further propose multi-resolution multi-head attention for pooling that has an additional temperature hyperparameter for each head. This leads to even larger performance gain, on top of that achieved using multiple heads. On the benchmark VoxCeleb1 dataset, the proposed method achieves the state-of-the-art performance of Equal Error Rate (EER) of 3.966%. Our analysis shows that using multiple heads and having multiple resolutions on these heads with different temperatures lead to improved certainty of attentive weights in the new state-of-the-art system.

Tags:

sps conference

icassp 2020 virtual conference

May 2020

icassp 2020

Multi-Resolution Multi-Head Attention In Deep Speaker Embedding

Zhiming Wang, Shuo Fang, Kaisheng Yao, Xiaolong Li

Value-Added Bundle(s) Including this Product

ICASSP 2020 Virtual Conference - Presentation Videos Product Bundle

More Like This

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

IEEE ICASSP 2024, 1 4-19 April 2024, Seoul, Korea. Conference Presentation Videos Bundle

ICIP 2022, October 16-19, 2022, Bordeaux, France - Presentation Videos Product Bundle

Join the IEEE Signal Processing Society