Statistics Pooling Time Delay Neural Network Based On X-Vector For Speaker Verification

Qian-Bei Hong, Chung-Hsien Wu, Hsin-Min Wang, Chien-Lin Huang

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 15:36

04 May 2020

This paper aims to improve speaker embedding representation based on x-vector for extracting more detailed information for speaker verification. We propose a statistics pooling time delay neural network (TDNN), in which the TDNN structure integrates statistics pooling for each layer, to consider the variation of temporal context in frame-level transformation. The proposed feature vector, named as stats-vector, are compared with the baseline x-vector features on the VoxCeleb dataset and the Speakers in the Wild (SITW) dataset for speaker verification. The experimental results showed that the proposed stats-vector with score fusion achieved the best performance on VoxCeleb1 dataset. Furthermore, considering the interference from other speakers in the recordings, we found that the proposed stats-vector efficiently reduced the interference and improved the speaker verification performance on the SITW dataset.

Tags:

sps conference

icassp 2020 virtual conference

May 2020

icassp 2020

Statistics Pooling Time Delay Neural Network Based On X-Vector For Speaker Verification

Qian-Bei Hong, Chung-Hsien Wu, Hsin-Min Wang, Chien-Lin Huang

Value-Added Bundle(s) Including this Product

ICASSP 2020 Virtual Conference - Presentation Videos Product Bundle

More Like This

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

IEEE ICASSP 2024, 1 4-19 April 2024, Seoul, Korea. Conference Presentation Videos Bundle

ICIP 2022, October 16-19, 2022, Bordeaux, France - Presentation Videos Product Bundle

Join the IEEE Signal Processing Society