MUSIC PHRASE INPAINTING USING LONG-TERM REPRESENTATION AND CONTRASTIVE LOSS
Shiqi Wei, Weiguo Gao, Gus Xia, Liwei Lin, Yixiao Zhang
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:11:51
Deep generative modeling has already become the leading technique for music automation. However, long-term generation remains a challenging task as most methods fall short in preserving a natural structure and the overall musicality when the generation scope exceeds several beats. In this study, we tackle the problem of long-term, phrase-level symbolic melody inpainting by equipping a sequence prediction model with phrase-level representation (as an extra condition) and contrastive loss (as an extra optimization term). The underlying ideas are twofold. First, to predict phrase-level music, we need phrase-level representations as a better context. Second, we should predict tokens and their high-level representations simultaneously, while contrastive loss serves as a better target for abstract representations. Experimental results show that our method significantly outperforms the baselines. In particular, contrastive loss plays a critical role in the generation quality, and the phase-level representation further enhances the structure of long-term generation.