U-GAT-VC: Unsupervised Generative Attentional Networks for Non-parallel Voice Conversion

Sheng Shi, Yangzhou Du, Jianping Fan, Jiahao Shao, Yifei Hao

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:13:11

10 May 2022

Non-parallel voice conversion (VC) is a technique of transferring voice from one style to another without using a parallel corpus in model training. Various methods are proposed to approach non-parallel VC using deep neural networks. Among them, CycleGAN-VC and its variants have been widely accepted as benchmark methods. However, there is still a gap to bridge between the real target and converted voice and an increased number of parameters leads to slow convergence in training process. Inspired by recent advancements in unsupervised image translation, we propose a new end-to-end unsupervised framework U-GAT-VC that adopts a novel inter- and intra-attention mechanism to guide the voice conversion to focus on more important regions in spectrograms. We also introduce disentangle perceptual loss in our model to capture high-level spectral features. Subjective and objective evaluations shows our proposed model outperforms CycleGAN-VC2/3 in terms of conversion quality and voice naturalness.

Tags:

intra attention mechanism

inter attention mechanism

non-parallel voice conversion

perceptual loss

generative adversarial network

U-GAT-VC: Unsupervised Generative Attentional Networks for Non-parallel Voice Conversion

Sheng Shi, Yangzhou Du, Jianping Fan, Jiahao Shao, Yifei Hao

Value-Added Bundle(s) Including this Product

ICASSP 2022, May 2022 Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

LOW-SAMPLING-FREQUENCY PLANE WAVE MEDICAL ULTRASOUND IMAGING BASED ON ADVERSARIAL LEARNING

OMISSION-FREE INPAINTING: A THREE-STAGE APPROACH TO ENSURE OBJECT GENERATION

MDFD: STUDY OF DISTRIBUTED NON-IID SCENARIOS AND FRECHET DISTANCE-BASED EVALUATION

Join the IEEE Signal Processing Society