A FEW SHOT LEARNING OF SINGING TECHNIQUE CONVERSION BASED ON CYCLE CONSISTENCY GENERATIVE ADVERSARIAL NETWORKS

Po-Wei Chen (National Tsing Hua University); Von-Wun Soo (nthu)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

06 Jun 2023

We adopt the recent cycle consistent generative adversarial network (MaskCycleGAN-VC) that allows converting a specific singing technique using only a few articulations of singing voice as examples. Since it is often prone to fail to preserve the content information of the singing voice due to distortion and noise during the conversion, a self-supervised learning module is proposed as the basic framework to enforce content consistency without additional annotations. We evaluate the proposed methods on three datasets that were commonly used in pop songs which involve singing techniques in terms of breathy voice, vibrato, and vocal fry. Experiments showed that our proposed methods outperform the baseline in terms of audio quality and content preservation, including melody and singer's timbral identity, without affecting the perception of singing techniques. \footnote{samples can be found in https://powei-c.github.io/STC/}

Tags:

Speech and singing voice synthesis/convertion/coding

A FEW SHOT LEARNING OF SINGING TECHNIQUE CONVERSION BASED ON CYCLE CONSISTENCY GENERATIVE ADVERSARIAL NETWORKS

Po-Wei Chen (National Tsing Hua University); Von-Wun Soo (nthu)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

DSPGAN: a GAN-based universal vocoder for high-fidelity TTS by time-frequency domain supervision from DSP

DiffVoice: Text-to-Speech with Latent Diffusion

PHONEix: Acoustic Feature Processing Strategy for Enhanced Singing Pronunciation with Phoneme Distribution Predictor

Join the IEEE Signal Processing Society