Jhu-Hltcoe System For The Voxsrc Speaker Recognition Challenge
Daniel Garcia-Romero, Alan McCree, David Snyder, Gregory Sell
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 14:55
The VoxSRC speaker recognition challenge comprises data obtained from YouTube videos of celebrity interviews in a wide range of recording environments. The challenge provides FIXED and OPEN training conditions to allow cross-system comparisons and to characterize the effects of additional amounts of training data on system performance. This paper describes our submission to this challenge where we have explored x-vector extractor topologies, classification head alternatives, data augmentation, and angular margin penalty. Our final entry to the FIXED condition (which achieved 2nd place) is the score average of 4 diverse systems. We find that this system outperforms a large single DNN with similar number of parameters.