Learning Speech Representations with Flexible Hidden Feature Dimensions

Huaizhen Tang (University of Science and Technology of China); Xulong Zhang (Ping An Technology (Shenzhen) Co., Ltd.); Jianzong Wang (Ping An Technology (Shenzhen) Co., Ltd); Ning Cheng (Ping An Technology (Shenzhen) Co., Ltd); Jing Xiao (Ping An Insurance (Group) Company of China)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

07 Jun 2023

Non-parallel many-to-many voice conversion is a kind of style transfer task in speech. Recently, AutoVC has been applied in this field as a popular solution, as it can achieve distribution-matching style transfer by training only the reconstruction loss. However, in order to strike a good balance between timbre disentanglement and sound quality, AutoVC requires imposing very strict constraints on the dimensionality of the latent representation. This constraint affects the quality of the converted speech while making it challenging to apply to other datasets directly. This paper proposes a new voice conversion framework that uses only one encoder to obtain timbre and content information by partitioning the latent space in the channel dimension. Furthermore, two different types of classifiers and two additional reconstruction losses are proposed to ensure that different parts of the latent space contain only separated content and timbre information, respectively. Experiments on the VCTK dataset show that the proposed model achieves state-of-the-art results in terms of the naturalness and similarity of converted speech. In addition, we experimentally show that for different division proportions of latent space, the content and timbre information will always be well separated.

Tags:

adversarial machine learning

Learning Speech Representations with Flexible Hidden Feature Dimensions

Huaizhen Tang (University of Science and Technology of China); Xulong Zhang (Ping An Technology (Shenzhen) Co., Ltd.); Jianzong Wang (Ping An Technology (Shenzhen) Co., Ltd); Ning Cheng (Ping An Technology (Shenzhen) Co., Ltd); Jing Xiao (Ping An Insurance (Group) Company of China)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

GNP ATTACK: TRANSFERABLE ADVERSARIAL EXAMPLES VIA GRADIENT NORM PENALTY

Measuring the Transferability of L-infty Attacks by the L-2 Norm

SC-NET: SALIENT POINT AND CURVATURE BASED ADVERSARIAL POINT CLOUD GENERATION NETWORK

Join the IEEE Signal Processing Society