Pre-training strategies using contrastive learning and playlist information for music classification and similarity

Pablo Alonso-Jiménez (Universitat Pompeu Fabra); Xavier Favory (Utopia Music); Hadrien Foroughmand (Utopia Music); Grigoris Bourdalas (Utopia Music); Xavier Serra (Universitat Pompeu Fabra ); Thomas Lidy (Utopia Music); Dmitry Bogdanov (Universitat Pompeu Fabra)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

07 Jun 2023

In this work, we investigate an approach that relies on contrastive learning and music metadata as a weak source of supervision to train music representation models. Recent studies show that contrastive learning can be used with editorial metadata (e.g., artist or album name) to learn audio representations that are useful for different classification tasks. In this paper, we extend this idea to using playlist data as a source of music similarity information and investigate three approaches to generate anchor and positive track pairs. We evaluate these approaches by fine-tuning the pre-trained models for music multi-label classification tasks (genre, mood, and instrument tagging) and music similarity. We find that creating anchor and positive track pairs by relying on co-occurrences in playlists provides better music similarity and competitive classification results compared to choosing tracks from the same artist as in previous works. Additionally, our best pre-training approach based on playlists provides superior classification performance for most datasets.

Tags:

Music signal analysis, processing and synthesis

Pre-training strategies using contrastive learning and playlist information for music classification and similarity

Pablo Alonso-Jiménez (Universitat Pompeu Fabra); Xavier Favory (Utopia Music); Hadrien Foroughmand (Utopia Music); Grigoris Bourdalas (Utopia Music); Xavier Serra (Universitat Pompeu Fabra ); Thomas Lidy (Utopia Music); Dmitry Bogdanov (Universitat Pompeu Fabra)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Deep Self-Supervised Hierarchical Metrical Structure Modeling

Soft Dynamic Time Warping for Multi-Pitch Estimation and Beyond

Towards Controllable Audio Texture Morphing

Join the IEEE Signal Processing Society