Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 14:56
04 May 2020

In this study, we present the VoiceAI (VAI) submissions to NIST SRE 2019 challenge on the task of speaker recognition using conversational telephone speech. Domain mismatching remains a challenging problem on SRE19. However, participants are unconstrained to use any public or proprietary data to mitigate this problem. Using larger scale and more diverse training data can effectively improve the performance of the front-end neural networks (NN). In our experiments, we focus on constructing robust systems using x-vectors. Different input acoustic features are also investigated. In addition, we propose hybrid neural architectures to utilizing the strength of different neural networks such as long short-term memory (LSTM), extended time delay neural network (ETDNN) and factorized TDNN (FTDNN). We also explore many back-end strategies to make full use of the development data and to relief the domain mismatching problem. Our best network topology, a FTDNN with two LSTMP layers, significantly outperforms the baseline on NIST SRE18 evaluation set (SRE18Eval). The final system, a fusion of five systems, yields EER 2.59%, minimum cost 0.153, and actual cost 0.155 on the progress data set, ranking second on the leaderboard among all participants.

Value-Added Bundle(s) Including this Product

More Like This

  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00