AVQVC: One-shot Voice Conversion by Vector Quantization with Applying Contrastive Learning

Huaizhen Tang, Xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:08:44

13 May 2022

Voice Conversion(VC) refers to changing the timbre of a speech while retaining the discourse content. Recently, many works have focused on disentangle-based learning techniques to separate the timber and the linguistic content information from a speech signal. Once successful, voice conversion will be feasible and straightforward. This paper proposed a novel one-shot voice conversion framework based on vector quantization voice conversion (VQVC) and AutoVC, called AVQVC. A new training method is applied to VQVC to separate content and timbre information from speech more effectively. The result shows that this approach has better performance than VQVC in separating content and timbre to improve the sound quality of generated speech.

Tags:

contrastive learning

speech synthesis

voice conversion

vector quantization

AVQVC: One-shot Voice Conversion by Vector Quantization with Applying Contrastive Learning

Huaizhen Tang, Xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao

Value-Added Bundle(s) Including this Product

ICASSP 2022, May 2022 Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Short Course Bundle: ICASSP 2022 COURSE 5: Speech Technology for Health: From Technical Foundations to Applications (Parts 1-3)

Model-agnostic visual explanations via approximate bilinear models

GPCGC: A GREEN POINT CLOUD GEOMETRY CODING METHOD

Join the IEEE Signal Processing Society