Unsupervised vocal dereverberation with diffusion-based generative models

Koichi Saito (Sony Gruop Corporation); Naoki Murata (Sony Group Corporation); Toshimitsu Uesaka (Sony Group Corporation); Chieh-Hsin Lai (Sony Group Corporation); Yuhta Takida (Sony Group Corporation); Takao Fukui (Sony Group Corporation); Yuki Mitsufuji (Sony Group Corporation)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

06 Jun 2023

Removing reverb from reverberant music is a necessary technique to clean up audio for downstream music manipulations. Reverberation of music contains two categories natural reverb, and artificial reverb. Artificial reverb has a wider diversity than natural reverb due to its various parameter setups and reverberation types. However, recent supervised dereverberation methods may fail because they rely on sufficiently diverse and numerous pairs of reverberant observations and retrieved data for training in order to be generalizable to unseen observations during inference. To resolve these problems, we propose an unsupervised method that can remove a general kind of artificial reverb for music without requiring pairs of data for training. The proposed method is based on diffusion models, where it initializes the unknown reverberation operator with a conventional signal processing technique and simultaneously refines the estimate with the help of diffusion models. We show through objective and perceptual evaluations that our method outperforms the current leading vocal dereverberation benchmarks.

Tags:

Music signal analysis, processing and synthesis

Unsupervised vocal dereverberation with diffusion-based generative models

Koichi Saito (Sony Gruop Corporation); Naoki Murata (Sony Group Corporation); Toshimitsu Uesaka (Sony Group Corporation); Chieh-Hsin Lai (Sony Group Corporation); Yuhta Takida (Sony Group Corporation); Takao Fukui (Sony Group Corporation); Yuki Mitsufuji (Sony Group Corporation)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Stay in the Middle: A Semi-Supervised Model for CT Metal Artifact Reduction

Deep Self-Supervised Hierarchical Metrical Structure Modeling

Soft Dynamic Time Warping for Multi-Pitch Estimation and Beyond

Join the IEEE Signal Processing Society