NVC-NET: END-TO-END ADVERSARIAL VOICE CONVERSION

Bac Nguyen, Fabien Cardinaux

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:09:18

10 May 2022

Voice conversion (VC) has gained increasing popularity in many speech synthesis applications. The idea is to change the voice identity from one speaker into another while keeping the linguistic content unchanged. Many VC approaches rely on the use of a vocoder to reconstruct the speech from acoustic features, and as a consequence, the speech quality heavily depends on such a vocoder. In this paper, we propose NVCNet, an end-to-end adversarial network, which performs VC directly on the raw audio waveform. By disentangling the speaker identity from the speech content, NVC-Net is able to perform non-parallel traditional many-to-many VC as well as zero-shot VC from a short utterance of an unseen target speaker. Importantly, NVC-Net is non-autoregressive and fully convolutional, achieving fast inference. Objective and subjective evaluations on VC tasks show that NVC-Net obtains competitive results with significantly fewer parameters.

Tags:

disentangled representation

end-to-end training

voice conversion

adversarial training

NVC-NET: END-TO-END ADVERSARIAL VOICE CONVERSION

Bac Nguyen, Fabien Cardinaux

Value-Added Bundle(s) Including this Product

ICASSP 2022, May 2022 Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Short Course Bundle: ICASSP 2022 COURSE 5: Speech Technology for Health: From Technical Foundations to Applications (Parts 1-3)

Comparative Study of Saliency- and Scanpath-Based Approaches for Patch Selection in Image Quality Assessment

A PENALIZED MODIFIED HUBER REGULARIZATION TO IMPROVE ADVERSARIAL ROBUSTNESS

Join the IEEE Signal Processing Society