PSEUDO-LABEL TRANSFER FROM FRAME-LEVEL TO NOTE-LEVEL IN A TEACHER-STUDENT FRAMEWORK FOR SINGING TRANSCRIPTION FROM POLYPHONIC MUSIC

Sangeun Kum, Jongpil Lee, Keunhyoung Luke Kim, Taehyoung Kim, Juhan Nam

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:10:07

12 May 2022

Lack of large-scale note-level label data is the major obstacle to singing transcription from polyphonic music. We address the issue by using pseudo labels from vocal pitch estimation models. The proposed method first converts the frame-level pseudo labels to note-level pseudo-labels through pitch and rhythm quantization steps. Then, it further improves the label quality through self-training in a teacher-student framework. To validate the method, we conduct various experiments. We compare two vocal pitch estimation models to verify their performance as pseudo-label generators. We explore two setups of teacher-student models with different data augmentation settings and also investigate the number of self-training iterations. The results show that the proposed method can effectively leverage large-scale unlabeled audio data. We also found that self-training with the noisy student model helps to improve performance. Finally, we show that the model trained with only unlabeled data has reasonable performances compared to previous works and, the model trained with additional labeled data, achieves higher accuracy than the model trained with only labeled data.

Tags:

music information retrieval

pseudo label

singing transcription from polyphonic music

teacher-student framework

PSEUDO-LABEL TRANSFER FROM FRAME-LEVEL TO NOTE-LEVEL IN A TEACHER-STUDENT FRAMEWORK FOR SINGING TRANSCRIPTION FROM POLYPHONIC MUSIC

Sangeun Kum, Jongpil Lee, Keunhyoung Luke Kim, Taehyoung Kim, Juhan Nam

Value-Added Bundle(s) Including this Product

ICASSP 2022, May 2022 Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

DEEP PERFORMER: SCORE-TO-AUDIO MUSIC PERFORMANCE SYNTHESIS

GENRE-CONDITIONED ACOUSTIC MODELS FOR AUTOMATIC LYRICS TRANSCRIPTION OF POLYPHONIC MUSIC

EXPLORING CATEGORY CONSISTENCY FOR WEAKLY SUPERVISED SEMANTIC SEGMENTATION

Join the IEEE Signal Processing Society