Improving Music Transcription By Pre-Stacking A U-Net
Fabrizio Pedersoli, George Tzanetakis, Kwang Moo Yi
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 13:09
We propose to pre-stack a U-Net as a way of improving the polyphonic music transcription performance of various baseline Convolutional Neural Networks (CNNS). The U-Net, a network architecture based on skip-connections between layers acts as a transformation network followed by a transcription network. Notably, we do not introduce any additional loss terms specific to the transformation network, but instead, jointly train the entire combined model with the original loss function that was designed for the back-end transcription network. We argue that this U-Net network transforms the input signal into a representation that is more effective for transcription, and thus enables the observed improvements in accuracy. We empirically confirm with several experiments using the MusicNet dataset, that the proposed configuration consistently improves the accuracy of transcription networks. This enhancement cannot be achieved by simply introducing more neurons or more layers to the baseline CNS. Moreover, we show that using the proposed architecture we can go beyond general music transcription and perform transcription in an instrument-specific fashion. By doing so, the original general transcription performance is also increased.