Multichannel Singing Voice Separation by Deep Neural Network Informed DOA Constrained CMNMF
Antonio Jesús Muñoz-Montoro, Archontis Politis, Konstantinos Drossos, Julio José Carabias-Orti
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 11:36
This work addresses the problem of multichannel source separation combining two powerful approaches, multichannel spectral factorization with recent monophonic deep learning (DL) based spectrum inference. Individual source spectra at different channels are estimated with a Masker-Denoiser twin network, able to model long-term temporal patterns of a musical piece. The monophonic source spectrograms are used within a spatial covariance mixing model based on complex-valued multichannel non-negative matrix factorization (CMNMF) that predicts the spatial characteristics of each source. The proposed framework is evaluated on the task of singing voice separation with a large multichannel dataset. Experimental results show that our joint DL+CMNMF method outperforms both the individual monophonic DL-based separation and the multichannel CMNMF baseline methods.