Towards Low-Resource Stargan Voice Conversion Using Weight Adaptive Instance Normalization
Mingjie Chen, Yanpei Shi, Thomas Hain
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:05:55
Many-to-many voice conversion with non-parallel training data has seen significant progress in recent years. It is challenging because of lacking of ground truth parallel data. StarGAN-based models have gained attentions because of their efficiency and effectiveness. However, most of the StarGAN-based works only focused on small number of speakers and large amount of training data. In this work, we aim at improving the data efficiency of the model and achieving a many-to-many non-parallel StarGAN-based voice conversion for a relatively large number of speakers with limited training samples. In order to improve data efficiency, the proposed model uses a speaker encoder for extracting speaker embeddings and weight adaptive instance normalization (W-AdaIN) layers. Experiments are conducted with 109 speakers under two low-resource situations, where the number of training samples is 20 and 5 per speaker. An objective evaluation shows the proposed model outperforms baseline methods significantly. Furthermore, a subjective evaluation shows that, for both naturalness and similarity, the proposed model outperforms baseline method.
Chairs:
Tomoki Toda