Time-Frequency Feature Decomposition Based On Sound Duration For Acoustic Scene Classification
Yuzhong Wu, Tan Lee
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 11:26
Acoustic scene classification is the task of identifying the type of acoustic environment in which a given audio signal is recorded. The signal is a mixture of sound events with various characteristics. In-depth and focused analysis is needed to find out the most representative sound patterns for recognizing and differentiating the scenes. In this paper, we propose a feature decomposition method based on temporal median filtering, and use convolutional neural network to model long-duration background sounds and transient sounds separately. Experiments on log-mel and wavelet based time-frequency features show that using the proposed method leads to better classification accuracy. Analysis of detailed experimental results reveals that (1) long-duration sounds are generally most informative for acoustic scene classification; and (2) the focus of sound duration may be different for classifying different types of acoustic scenes.