Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 00:11:51
08 May 2022

Deep generative modeling has already become the leading technique for music automation. However, long-term generation remains a challenging task as most methods fall short in preserving a natural structure and the overall musicality when the generation scope exceeds several beats. In this study, we tackle the problem of long-term, phrase-level symbolic melody inpainting by equipping a sequence prediction model with phrase-level representation (as an extra condition) and contrastive loss (as an extra optimization term). The underlying ideas are twofold. First, to predict phrase-level music, we need phrase-level representations as a better context. Second, we should predict tokens and their high-level representations simultaneously, while contrastive loss serves as a better target for abstract representations. Experimental results show that our method significantly outperforms the baselines. In particular, contrastive loss plays a critical role in the generation quality, and the phase-level representation further enhances the structure of long-term generation.

More Like This

  • SPS
    Members: $10.00
    IEEE Members: $22.00
    Non-members: $30.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00