Data-Driven Harmonic Filters For Audio Representation Learning
Minz Won, Sanghyuk Chun, Oriol Nieto, Xavier Serra
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 15:00
We introduce a trainable front-end module for audio representation learning that exploits the inherent harmonic structure of audio signals. The proposed architecture, composed of a set of filters, compels the subsequent network to capture harmonic relations while preserving spectro-temporal locality. Since the harmonic structure is known to have a key role in human auditory perception, one can expect these harmonic filters to yield more efficient audio representation learning. Experimental results show that a simple convolutional neural network back-end with the proposed front-end outperforms state-of-the-art baseline methods in automatic music tagging, keyword spotting, and sound event tagging tasks.