MedleyVox: An Evaluation Dataset for Multiple Singing Voices Separation

Chang-Bin Jeon (Seoul National University); Hyeongi Moon (Gaudio Lab.); Keunwoo Choi (Gaudio Lab); Ben Sangbae Chon (Gaudio Lab); Kyogu Lee (Seoul National University)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

07 Jun 2023

Separation of multiple singing voices into each voice is a rarely studied area in music source separation research. The absence of a benchmark dataset has hindered its progress. In this paper, we present an evaluation dataset and provide baseline studies for multiple singing voices separation. First, we introduce MedleyVox, an evaluation dataset for multiple singing voices separation. We specify the problem definition in this dataset by categorizing it into i) unison, ii) duet, iii) main vs. rest, and iv) N-singing separation. Second, to overcome the absence of existing multi-singing datasets for a training purpose, we present a strategy for construction of multiple singing mixtures using various single-singing datasets. Third, we propose the improved super-resolution network (iSRNet), which greatly enhances initial estimates of separation networks. Jointly trained with the Conv-TasNet and the multi-singing mixture construction strategy, the proposed iSRNet achieved comparable performance to ideal time-frequency masks on duet and unison subsets of MedleyVox. Audio samples, the dataset, and codes are available on our website (https://github.com/jeonchangbin49/MedleyVox).

Tags:

Applications in music and audio processing

MedleyVox: An Evaluation Dataset for Multiple Singing Voices Separation

Chang-Bin Jeon (Seoul National University); Hyeongi Moon (Gaudio Lab.); Keunwoo Choi (Gaudio Lab); Ben Sangbae Chon (Gaudio Lab); Kyogu Lee (Seoul National University)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Stochastic Optimization of Vector Quantization Methods in Application to Speech and Image Processing

Sinusoidal Frequency Estimation by Gradient Descent

Deep architecture for doa trajectory localization.

Join the IEEE Signal Processing Society