Unsupervised Voice Type Discrimination Score Adaptation Using X-vector Clusters
Mark R Lindsey (Carnegie Mellon University); Tyler Vuong (Carnegie Mellon University); Richard M Stern (Carnegie Mellon University)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Voice type discrimination (VTD) is the task of automatically detecting speech produced in the same room as a recording device ("live speech") among other speech and non-speech noises, such as traffic noises or radio broadcasts ("distractor audio"). Existing work has described methods for performing the VTD task. This paper presents a method for adapting the output of these existing methods in an unsupervised manner via x-vector clustering and correlation. This adaptation method can be applied to the output of any VTD algorithm, requires no additional training data, and has been shown to yield a relative decrease in decision cost function (DCF) score of up to 47% on a standardized database collected for the task.