MINIMIZING RESIDUALS FOR NATIVE-NONNATIVE VOICE CONVERSION IN A SPARSE, ANCHOR-BASED REPRESENTATION OF SPEECH
Christopher Liberatore, Ricardo Gutierrez-Osuna
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:07:28
We present a dictionary-learning algorithm for reducing the sparse coding residual of an exemplar-based method for native-to- nonnative voice conversion (VC). The proposed algorithm iteratively updates the source and target speaker dictionaries to reduce both the residual and voice conversion error, thereby increasing synthesis quality. We evaluate the method on speech from the ARCTIC and L2-ARCTIC corpora and compare it to a baseline exemplar-based VC algorithm. The proposed algorithm significantly improves synthesis quality to more than double that of the baseline system while using two orders of magnitude fewer atoms. Additionally, the proposed algorithm significantly reduces both the VC error and the residual magnitude. We discuss the implications of the algorithm for broad exemplar-based VC systems.