Voiceai Systems To Nist Sre19 Evaluation: Robust Speaker Recognition On Conversational Telephone Speech

Rongjin Li, Dongpeng Chen, Weibin Zhang

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 14:56

04 May 2020

In this study, we present the VoiceAI (VAI) submissions to NIST SRE 2019 challenge on the task of speaker recognition using conversational telephone speech. Domain mismatching remains a challenging problem on SRE19. However, participants are unconstrained to use any public or proprietary data to mitigate this problem. Using larger scale and more diverse training data can effectively improve the performance of the front-end neural networks (NN). In our experiments, we focus on constructing robust systems using x-vectors. Different input acoustic features are also investigated. In addition, we propose hybrid neural architectures to utilizing the strength of different neural networks such as long short-term memory (LSTM), extended time delay neural network (ETDNN) and factorized TDNN (FTDNN). We also explore many back-end strategies to make full use of the development data and to relief the domain mismatching problem. Our best network topology, a FTDNN with two LSTMP layers, significantly outperforms the baseline on NIST SRE18 evaluation set (SRE18Eval). The final system, a fusion of five systems, yields EER 2.59%, minimum cost 0.153, and actual cost 0.155 on the progress data set, ranking second on the leaderboard among all participants.

Tags:

sps conference

icassp 2020 virtual conference

May 2020

icassp 2020