Spoken language recognition with cluster-based modeling
Stanis?aw Kacprzak, Magdalena Rybicka, Konrad Kowalczyk
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:07:49
In this study, we analyze the incorporation of cluster-based modeling into the language recognition systems, in which a single utterance is represented as an embedding, deploying widely used i-vectors and x-vectors. We compare the results obtained with a Cosine Distance Scoring, Gaussian Mixture Model, Logistic Regression, and the Mixture of von Misses-Fisher distributions with the classifiers based on the proposed approach which incorporates cluster-based sub-models. Experimental evaluation is performed on the i-vector embeddings from the NIST 2015 language recognition i-vector machine learning challenge and the x-vector embeddings from the Oriental Language Recognition 2020 Challenge (AP20-OLR). The experimental results clearly show that the proposed approach combined with discriminatively trained Logistic Regression classifier achieves notable improvements over the baseline systems, i.e., those without language sub-models, and that our approach is competitive to other systems reported in the literature.