CONDITIONING AND SAMPLING IN VARIATIONAL DIFFUSION MODELS FOR SPEECH SUPER-RESOLUTION

Chin-Yun Yu (Queen Mary University of London); Sung-Lin Yeh (University of Edinburgh); George Fazekas (QMUL); Hao Tang (The University of Edinburgh)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

06 Jun 2023

Recently, diffusion models (DMs) have been increasingly used in audio processing tasks, including speech super-resolution (SR), which aims to restore high-frequency content given low-resolution speech utterances. This is commonly achieved by conditioning the network of noise predictor with low-resolution audio. In this paper, we propose a novel sampling algorithm that communicates the information of the low-resolution audio via the reverse sampling process of DMs. The proposed method can be a drop-in replacement for the vanilla sampling process and can significantly improve the performance of the existing works. Moreover, by coupling the proposed sampling method with an unconditional DM, i.e., a DM with no auxiliary inputs to its noise predictor, we can generalize it to a wide range of SR setups. We also attain state-of-the-art results on the VCTK Multi-Speaker benchmark with this novel formulation.

Tags:

Deep generative models

CONDITIONING AND SAMPLING IN VARIATIONAL DIFFUSION MODELS FOR SPEECH SUPER-RESOLUTION

Chin-Yun Yu (Queen Mary University of London); Sung-Lin Yeh (University of Edinburgh); George Fazekas (QMUL); Hao Tang (The University of Edinburgh)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Slides: Deep Generative Models for Bayesian Imaging

Deep Generative Models for Bayesian Imaging

EVALUATION OF CATEGORICAL GENERATIVE MODELS - BRIDGING THE GAP BETWEEN REAL AND SYNTHETIC DATA

Join the IEEE Signal Processing Society