A FEW SHOT LEARNING OF SINGING TECHNIQUE CONVERSION BASED ON CYCLE CONSISTENCY GENERATIVE ADVERSARIAL NETWORKS
Po-Wei Chen (National Tsing Hua University); Von-Wun Soo (nthu)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
We adopt the recent cycle consistent generative adversarial network (MaskCycleGAN-VC) that allows converting a specific singing technique using only a few articulations of singing voice as examples. Since it is often prone to fail to preserve the content information of the singing voice due to distortion and noise during the conversion, a self-supervised learning module is proposed as the basic framework to enforce content consistency without additional annotations. We evaluate the proposed methods on three datasets that were commonly used in pop songs which involve singing techniques in terms of breathy voice, vibrato, and vocal fry. Experiments showed that our proposed methods outperform the baseline in terms of audio quality and content preservation, including melody and singer's timbral identity, without affecting the perception of singing techniques. \footnote{samples can be found in https://powei-c.github.io/STC/}