Improving Music Genre Classification from Multi-Modal Properties of Music and Genre Correlations Perspective
Ganghui Ru (Fudan University); Xulong Zhang (Ping An Technology (Shenzhen) Co., Ltd.); Jianzong Wang (Ping An Technology (Shenzhen) Co., Ltd); Ning Cheng (Ping An Technology (Shenzhen) Co., Ltd); Jing Xiao (Ping An Insurance (Group) Company of China)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Music genre classification has been widely studied in recent years due to its various applications in music retrieval and recommendation. Previous works tend to perform unsatisfactorily, since those methods only use audio content or jointly use audio content and lyrics content inefficiently. In addition, as genres normally co-occur in a music track, it is desirable to capture and model the genre correlations to improve the multi-label music genre classification performance. To address these issues, we propose a novel multi-modal method leveraging audio-lyrics contrastive loss and two symmetric cross-modal attention, to align and fuse features from audio and lyrics. Furthermore, based on the nature of the multi-label classification problem, a genre correlations extraction module is presented to capture and model potential genre correlations. Extensive experiments demonstrate that our proposed method achieves state-of-the-art performance on Music4All dataset.