A study of audio mixing methods for piano transcription in violin-piano ensembles
Hyemi Kim (KAIST / ETRI); Jiyun Park (KAIST); Taegyun Kwon (KAIST); Dasaem Jeong (Sogang University); Juhan Nam (KAIST)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Piano music transcription models have particularly good performance for solo piano recordings, while the performance is degraded in ensembles. In this paper, we analyze the piano transcription performance according to the data augmentation method of mixing audios in violin-piano ensembles. We apply the mixing methods considering harmonic and temporal characteristics of audio. As two datasets for piano transcription in violin-piano ensembles, we create the PFVN-synth dataset containing 7 hours of violin-piano ensemble audios by rendering MIDI files and corresponding labels. Also, we collect unaccompanied violin recordings and utilize them with a large-scale MAESTRO dataset. We evaluate the transcription results on not only synthesized audio but also real audio recordings dataset.
To the best of our knowledge, this is the first work on data augmentation taking into account harmonically and temporally controlled mixing for automatic music transcription.