Voiceai Systems To Nist Sre19 Evaluation: Robust Speaker Recognition On Conversational Telephone Speech
Rongjin Li, Dongpeng Chen, Weibin Zhang
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 14:56
In this study, we present the VoiceAI (VAI) submissions to NIST SRE 2019 challenge on the task of speaker recognition using conversational telephone speech. Domain mismatching remains a challenging problem on SRE19. However, participants are unconstrained to use any public or proprietary data to mitigate this problem. Using larger scale and more diverse training data can effectively improve the performance of the front-end neural networks (NN). In our experiments, we focus on constructing robust systems using x-vectors. Different input acoustic features are also investigated. In addition, we propose hybrid neural architectures to utilizing the strength of different neural networks such as long short-term memory (LSTM), extended time delay neural network (ETDNN) and factorized TDNN (FTDNN). We also explore many back-end strategies to make full use of the development data and to relief the domain mismatching problem. Our best network topology, a FTDNN with two LSTMP layers, significantly outperforms the baseline on NIST SRE18 evaluation set (SRE18Eval). The final system, a fusion of five systems, yields EER 2.59%, minimum cost 0.153, and actual cost 0.155 on the progress data set, ranking second on the leaderboard among all participants.