Simultaneous Separation And Transcription Of Mixtures With Multiple Polyphonic And Percussive Instruments
Ethan Manilow, Bryan Pardo, Prem Seetharaman
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 15:07
We present a single deep learning architecture that can both separate an audio recording of a musical mixture into constituent single-instrument recordings and transcribe these instruments into a human-readable format at the same time, learning a shared musical representation for both tasks. This novel architecture, which we call Cerberus, builds on the Chimera network for source separation by adding a third âheadâ for transcription. By training each head with different losses, we are able to jointly learn how to separate and tran- scribe up to five instruments with a single network. We show that separation and transcription are highly complementary with one another and when learned jointly, lead to Cerberus networks that are better at both separation and transcription and generalize better to unseen mixtures.