Solving audio inverse problems with a diffusion model

Eloi Moliner (Aalto University); Jaakko Lehtinen (NVIDIA & Aalto University); Vesa Valimaki (Aalto University)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

07 Jun 2023

This paper presents CQT-Diff, a data-driven generative audio model that can, once trained, be used for solving various different audio inverse problems in a problem-agnostic setting. CQT-Diff is a neural diffusion model with an architecture that is carefully constructed to exploit pitch-equivariant symmetries in music. This is achieved by preconditioning the model with an invertible Constant-Q Transform (CQT), whose logarithmically-spaced frequency axis represents pitch equivariance as translation equivariance. The proposed method is evaluated with solo piano music, using objective and subjective metrics in three different and varied tasks: audio bandwidth extension, inpainting, and declipping. The results show that CQT-Diff outperforms the compared baselines and ablations in audio bandwidth extension and, without retraining, delivers competitive performance against modern baselines in audio inpainting and declipping. This work represents the first diffusion-based general framework for solving inverse problems in audio processing.

Tags:

Music signal analysis, processing and synthesis

Solving audio inverse problems with a diffusion model

Eloi Moliner (Aalto University); Jaakko Lehtinen (NVIDIA & Aalto University); Vesa Valimaki (Aalto University)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Deep Self-Supervised Hierarchical Metrical Structure Modeling

Soft Dynamic Time Warping for Multi-Pitch Estimation and Beyond

Towards Controllable Audio Texture Morphing

Join the IEEE Signal Processing Society