Skip to main content

LIMI-VC: A LIGHT WEIGHT VOICE CONVERSION MODEL WITH MUTUAL INFORMATION DISENTANGLEMENT

Liangjie Huang (Beijing Language and Culture University); Tian Yuan (Baidu (China) Co., Ltd); Yunming Liang (Baidu (China) Co., Ltd); Zeyu Chen (Baidu, Inc.); Can Wen (Baidu (China) Co., Ltd); Yanlu Xie (Beijing Language and Culture University); Jinsong Zhang (Beijing Language and Culture University); dengfeng ke (blcu.edu.cn)

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
06 Jun 2023

Voice conversion(VC) model aims to convert the source timbre to the target one. Recently, many VC models utilize pre-trained models to enhance the performance and achieve good results. However, pre-trained models could not somehow disentangle the timbre and linguistic information, thus resulting in a redundancy, which may hurt the conversion performance. In this paper we proposed LIMI-VC, reducing the redundancy between the linguistic content and the timbre information with mutual information disentanglement. We design the model in a light weight form, for the sake of parameter and computation efficiency when pre-trained models are commonly used nowadays. Experiments show that the proposed model can still improve the performance, with 15 times smaller size, compared to baseline. An out-of-domain cross-lingual inference also shows that our model greatly outperforms the baseline. Our source code and audio examples will be available at: https://github.com/WongLaw/LIMI-VC.

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00