Learning Separable Time-Frequency Filterbanks For Audio Classification

Jie Pu, Yannis Panagakis, Maja Pantic

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:08:27

08 Jun 2021

State-of-the-art audio classification systems often apply deep neural networks on hand-crafted features (e.g., spectrogram-based representations), instead of learning features directly from raw audio. Moreover, these audio networks have millions of unknown parameters need to be learned, which causes a great demand for computational resources and training data. In this paper, we aim to learn audio representations directly from raw audio, and at the same time mitigate its training burden by employing a light-weight architecture. In particular, we propose to learn separable filters, parametrized with only a few variables, namely center frequency and bandwidth, facilitating training and offering interpretability of learned representations. The generality of the proposed method is demonstrated by applying it onto two applications, namely 1) speaker identification and 2) acoustic event recognition. Experimental results indicate its effectiveness on these applications, especially when small amount of training data is available.

Chairs:

Ritwik Giri

Tags:

signal processing society

IEEE icassp 2021

virtual conference

2021

sps

virtual conference icassp 2021

june 6-11 2021

icassp 2021

Learning Separable Time-Frequency Filterbanks For Audio Classification

Jie Pu, Yannis Panagakis, Maja Pantic

Value-Added Bundle(s) Including this Product

ICASSP 2021 Virtual Conference - Presentation Videos Product Bundle

More Like This

Keynote: Navigating the Transition to Sustainable Energy Solutions in a Power-Hungry World

Panel: Leveraging Technology to Achieve Carbon Neutrality of Buildings and Factories

Panel: Charting the Course for Future-Ready Data Centers in the Era of Sustainability

Join the IEEE Signal Processing Society