Transformer Vae: A Hierarchical Model For Structure-Aware And Interpretable Music Representation Learning
Junyan Jiang, Gus Xia, Dave Carlton, Chris Anderson, Ryan Miyakawa
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 20:41
Structure awareness and interpretability are two of the most desired properties of music generation algorithms. Structure-aware models generate more natural and coherent music with long-term dependencies, while interpretable models are more friendly for human-computer interaction and co-creation. To achieve these two goals simultaneously, we designed the Transformer Variational AutoEncoder, a hierarchical model that unifies the efforts of two recent breakthroughs in deep music generation: 1) the Music Transformer and 2) Deep Music Analogy. The former learns long-term dependencies using attention mechanism, and the latter learns interpretable latent representations using a disentangled conditional-VAE. We showed that Transformer VAE is essentially capable of learning a context-sensitive hierarchical representation, regarding local representations as the context and the dependencies among the local representations as the global structure. By interacting with the model, we can achieve context transfer, realizing the imaginary situation of ``what if" a piece is developed following the music flow of another piece.