Upmixing via style transfer: a variational autoencoder for disentangling spatial images and musical content

Haici Yang, Minje Kim, Sanna Wager, Spencer Russell, Mike Luo, Wontak Kim

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:13:27

10 May 2022

In the stereo-to-multichannel upmixing problem for music, one of the main tasks is to set the directionality of the instrument sources in the multichannel rendering results. In this paper, we propose a modified variational autoencoder model that learns a latent space to describe the spatial images in multichannel music. We seek to disentangle the spatial images and music content, so the learned latent variables are invariant to the music. At test time, we use the latent variables to control the panning of sources. We propose two upmixing use cases: transferring the spatial images from one song to another and blind panning based on the generative model. We report objective and subjective evaluation results to empirically show that our model captures spatial images separately from music content and achieves transfer-based interactive panning. This work can be extended to spatial mapping tasks.

Tags:

stereo-to-multichannel upmixing

variational autoencoders

panning

information disentanglement

Upmixing via style transfer: a variational autoencoder for disentangling spatial images and musical content

Haici Yang, Minje Kim, Sanna Wager, Spencer Russell, Mike Luo, Wontak Kim

Value-Added Bundle(s) Including this Product

ICASSP 2022, May 2022 Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

AN IMPROVED UPPER BOUND ON THE RATE-DISTORTION FUNCTION OF IMAGES

BLOCK-BASED MOTION ESTIMATION FOR DEEP-LEARNED VIDEO CODING

UNSUPERVISED ANOMALY DETECTION USING VARIATIONAL AUTOENCODER WITH GAUSSIAN RANDOM FIELD PRIOR

Join the IEEE Signal Processing Society