High-Resolution Attention Network With Acoustic Segment Model For Acoustic Scene Classification
Xue Bai, Jun Du, Jia Pan, Heng-Shun Zhou, Chin-Hui Lee, Yan-Hui Tu
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 14:40
The spectral information of acoustic scenes is diverse and complex, which poses challenges for acoustic scene tasks. To improve the classification performance, a variety of convolutional neural networks (CNNs) are proposed to extract richer semantic information of scene utterances. However, the different regions of the features extracted from CNN-based encoder have different importance. In this paper, we propose a novel strategy for acoustic scene classification, namely high-resolution attention network with acoustic segment model (HRAN-ASM). In this approach, we utilize fully CNN to obtain high-level semantic information and then adopt two-stage attention strategy to select the relevant acoustic scene segments. Besides, the acoustic segment model (ASM) proposed in our recent work provides embedding vectors for this attention mechanism. The performance is evaluated on DCASE 2018 Task 1a, showing 70.5% good classification accuracy under single system and no data expansion, which is superior to CNN-based self-attention mechanism and highly competitive.