SPEAKER EMBEDDING CONVERSION FOR BACKWARD AND CROSS-CHANNEL COMPATIBILITY
Tianxiang Chen, Elie Khoury
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:11:47
The accuracy of automatic speaker verification (ASV) systems has shown tremendous improvements due to the recent breakthroughs in low-rank speaker representations and deep learning techniques, leading to the success of ASV in real-world applications from call centers to mobile applications and smart devices. Particularly, some voice biometric providers have been shifting their systems from the traditional GMM based i-vector paradigm to the deep learning based xvector paradigm. Additionally, some of them are in need of implementing different systems for different sampling rates (for e.g. 8 kHz over the phone channel, and 16 kHz on virtual assistants). In either cases, the speaker embeddings extracted from one ASV system are often not compatible with another ASV system. This makes the process of interchangeability between systems very cumbersome and costly. In this paper, we address this issue by proposing a highly efficient speaker embedding converter that transforms a speaker embedding extracted from system A into a speaker embedding that can be used by system B. We evaluate the performance of the embedding converter for i-vector to x-vector upgrade scenario and for cross channel compatibility scenario. In both scenarios, we show that the proposed system achieves very low and compelling equal error rates.