Controllable music inpainting with mixed-level and disentangled representation

Shiqi Wei (Fudan University); Ziyu Wang (NYU Shanghai); Weiguo Gao (Fudan University); Gus Xia (New York University Shanghai)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

06 Jun 2023

Music inpainting, which is to complete the missing part of a piece given some context, is an important task of automated music generation. In this study, we contribute a controllable inpainting model by combining the high expressivity of mixed-length, disentangled music representations and the strong predictive power of masked language modeling. The model enables flexible user controls over both time scope (inpainted length and location) and semantic features that composers often consider during composition, including pitch contour, rhythm pattern, and chords. The key model design is to simultaneously predict disentangled representations of different time ranges. Such design aims to mirror the thought process of a professional composer who can take into account of the music flow of various semantic features at different hierarchies in parallel. Results show that our model produces much higher quality music compared to the baseline, and the subjective evaluation shows that our model generates much better results than the baseline and can generate melodies that are similar to human composition.

Tags:

Deep generative models

Controllable music inpainting with mixed-level and disentangled representation

Shiqi Wei (Fudan University); Ziyu Wang (NYU Shanghai); Weiguo Gao (Fudan University); Gus Xia (New York University Shanghai)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Deep Generative Models for Bayesian Imaging

Slides: Deep Generative Models for Bayesian Imaging

Exploring Approaches to Multi-Task Automatic Synthesizer Programming

Join the IEEE Signal Processing Society