Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 09:43
09 Jul 2020

Recently, deep learning models have been proposed for cover song identification and designed to learn fixed-length feature vectors for music recordings. However, the aspect of the temporal progression of music, which is important for measuring the melody similarity between two recordings, is not well exploited in those models. In this paper, we propose a new Siamese architecture to learn deep representations for cover song identification where Dilated Temporal Pyramid Convolution is used to exploit the local temporal context and Temporal Self-Attention to exploit the global temporal context in music recordings. In addition to the traditional block which calculates the similarity between a pair of recordings, we add a classification block to classify the recordings to their respective cliques. By combining the regression loss and the classification loss, our model can learn more robust and discriminative latent representations. The representations extracted by our model show substantial superiority to existing hand-crafted features and learned deep features. Experimental results show that our approach far outperforms the state-of-the-art methods on several public datasets.

Value-Added Bundle(s) Including this Product

More Like This

  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00