Optimal Transport in Diffusion Modeling for Conversion Tasks in Audio Domain

Vadim Popov (Huawei Noah's Ark Lab); Amantur Amatov (Huawei); Mikhail Kudinov (Huawei Noah's Ark Lab); Vladimir Gogoryan (Huawei Noah's Ark Lab); Tasnima Sadekova (Huawei Noah's Ark Lab); Ivan Vovk (Huawei Noah's Ark Lab)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

06 Jun 2023

Diffusion models have recently become a popular generative modeling framework in various domains because of their high-quality sampling capabilities. Lately, it has been hypothesized that optimally trained diffusion models supplied with specific differential equation solvers provide a solution to the optimal transport problem between the data distribution and the prior distribution. In this paper, we empirically show that applying the optimal transport point of view on diffusion modeling allows making a good choice of a noise sample the reverse diffusion starts generating from. We consider two audio-related tasks: voice conversion and timbre transfer. In the former, we improve upon the recent state-of-the-art model and demonstrate that the optimal transport helps us to keep the prosody of the source utterances significantly better than the vanilla diffusion-based model does. As for timbre transfer, we propose the novel diffusion model capable of many-to-many timbre transfer performing on par with common algorithms in terms of the overall music quality.

Tags:

Music signal analysis, processing and synthesis

Optimal Transport in Diffusion Modeling for Conversion Tasks in Audio Domain

Vadim Popov (Huawei Noah's Ark Lab); Amantur Amatov (Huawei); Mikhail Kudinov (Huawei Noah's Ark Lab); Vladimir Gogoryan (Huawei Noah's Ark Lab); Tasnima Sadekova (Huawei Noah's Ark Lab); Ivan Vovk (Huawei Noah's Ark Lab)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Stay in the Middle: A Semi-Supervised Model for CT Metal Artifact Reduction

Deep Self-Supervised Hierarchical Metrical Structure Modeling

Soft Dynamic Time Warping for Multi-Pitch Estimation and Beyond

Join the IEEE Signal Processing Society