Skip to main content

Improving Music Transcription By Pre-Stacking A U-Net

Fabrizio Pedersoli, George Tzanetakis, Kwang Moo Yi

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 13:09
04 May 2020

We propose to pre-stack a U-Net as a way of improving the polyphonic music transcription performance of various baseline Convolutional Neural Networks (CNNS). The U-Net, a network architecture based on skip-connections between layers acts as a transformation network followed by a transcription network. Notably, we do not introduce any additional loss terms specific to the transformation network, but instead, jointly train the entire combined model with the original loss function that was designed for the back-end transcription network. We argue that this U-Net network transforms the input signal into a representation that is more effective for transcription, and thus enables the observed improvements in accuracy. We empirically confirm with several experiments using the MusicNet dataset, that the proposed configuration consistently improves the accuracy of transcription networks. This enhancement cannot be achieved by simply introducing more neurons or more layers to the baseline CNS. Moreover, we show that using the proposed architecture we can go beyond general music transcription and perform transcription in an instrument-specific fashion. By doing so, the original general transcription performance is also increased.

Value-Added Bundle(s) Including this Product

More Like This

  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00