TONET: TONE-OCTAVE NETWORK FOR SINGING MELODY EXTRACTION FROM POLYPHONIC MUSIC

Ke Chen, Taylor Berg-Kirkpatrick, Shlomo Dubnov, Shuai Yu, Wei Li, Cheng-i Wang

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:10:54

11 May 2022

Singing melody extraction is an important problem in the field of music information retrieval. Existing methods typically rely on frequency-domain representations to estimate the sung frequencies. However, this design does not lead to human-level performance in the perception of melody information for both tone (pitch-class) and octave. In this paper, we propose TONet, a plug-and-play model that improves both tone and octave perceptions by leveraging a novel input representation and a novel network architecture. First, we present an improved input representation, the Tone-CFP, that explicitly groups harmonics via a rearrangement of frequency-bins. Second, we introduce an encoder-decoder architecture that is designed to obtain a salience feature map, a tone feature map, and an octave feature map. Third, we propose a tone-octave fusion mechanism to improve the final salience feature map. Experiments are done to verify the capability of TONet with various baseline backbone models. Our results show that tone-octave fusion with Tone-CFP can significantly improve the singing voice extraction performance across various datasets -- with substantial gains in octave and tone accuracy.

Tags:

melody extraction

self-attention

tone-octave information fusion

tone-cfp

TONET: TONE-OCTAVE NETWORK FOR SINGING MELODY EXTRACTION FROM POLYPHONIC MUSIC

Ke Chen, Taylor Berg-Kirkpatrick, Shlomo Dubnov, Shuai Yu, Wei Li, Cheng-i Wang

Value-Added Bundle(s) Including this Product

ICASSP 2022, May 2022 Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

AICT: AN ADAPTIVE IMAGE COMPRESSION TRANSFORMER

TRANSPOINTFLOW: LEARNING SCENE FLOW FROM POINT CLOUDS WITH TRANSFORMER

COUPLING SPATIAL AND CHANNEL TRANSFORMER FOR SINGLE IMAGE DERAINING

Join the IEEE Signal Processing Society