MUSICYOLO: A SIGHT-SINGING ONSET/OFFSET DETECTION FRAMEWORK BASED ON OBJECT DETECTION INSTEAD OF SPECTRUM FRAMES
Xianke Wang, Wei Xu, Weiming Yang, Wenqing Cheng
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:13:50
In this paper, we propose MusicYOLO based on object detection to detect the onset and offset in singing for the first time. The onset of the vocal is not as stable and clear as that of musical instruments, which makes the frame-based onset/offset detection methods often not work well. Compared with the previous onset/offset detection methods, MusicYOLO detects the whole note object in the spectrogram image instead of transient frame features around onset/offset, improving the onset/offset detection performance significantly. The experiment results show that the MusicYOLO framework has obtained a 94.16% F1 score of onset detection and a 91.35% F1 score of offset detection on the ISMIR2014 dataset, which proves that MusicYOLO is the state-of-the-art onset/offset detection framework for singing situation.