AN INVESTIGATION OF THE EFFECTIVENESS OF PHASE FOR AUDIO CLASSIFICATION

Shunsuke Hidaka, Kohei Wakamiya, Tokihiko Kaburagi

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:10:48

10 May 2022

While log-amplitude mel-spectrogram has widely been used as the feature representation for processing speech based on deep learning, the effectiveness of another aspect of speech spectrum, i.e., phase information, was shown recently for tasks such as speech enhancement and source separation. In this study, we extensively investigated the effectiveness of including phase information of signals for eight audio classification tasks. We constructed a learnable front-end that can compute the phase and its derivatives based on a time-frequency representation with mel-like frequency axis. As a result, experimental results showed significant performance improvement for musical pitch detection, musical instrument detection, language identification, speaker identification, and birdsong detection. On the other hand, overfitting to the recording condition was observed for some tasks when the instantaneous frequency was used. The results implied that the relationship between the phase values of adjacent elements is more important than the phase itself in audio classification.

Tags:

group delay

learnable filterbank

instantaneous frequency

deep learning

time-frequency representation

AN INVESTIGATION OF THE EFFECTIVENESS OF PHASE FOR AUDIO CLASSIFICATION

Shunsuke Hidaka, Kohei Wakamiya, Tokihiko Kaburagi

Value-Added Bundle(s) Including this Product

ICASSP 2022, May 2022 Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Short Course Bundle: ICASSP 2023 COURSE 2: Graph Signal Processing and Geometric Learning: A Foundational Approach (Parts 1-4)

Short Course Bundle: ICASSP 2023 COURSE 1: A Hands-on Approach for Implementing Stochastic Optimization Algorithms from Scratch (Parts 1-4)

Audio Signal Enhancement: A Weakly Supervised Deep Learning Approach

Join the IEEE Signal Processing Society