SEMANTICAC: SEMANTICS-ASSISTED FRAMEWORK FOR AUDIO CLASSIFICATION

Yicheng Xiao (Tsinghua Shenzhen International Graduate School, Tsinghua University); Yue Ma (Tsinghua University); SHUYAN LI (University of Cambridge); Hantao Zhou (Tsinghua Shenzhen International Graduate School, Tsinghua University); Ran Liao (Tsinghua Shenzhen International Graduate School, Tsinghua University); Xiu Li (Tsinghua University)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

07 Jun 2023

In this paper, we propose SemanticAC, a semantics-assisted framework for Audio Classification to better leverage the semantic information. Unlike conventional audio classification methods that treat class labels as discrete vectors, we employ a language model to extract abundant semantics from labels and optimize the semantic consistency between audio signals and their labels. We verify that simple textual information from labels and advanced pretraining models enable more abundant semantic supervision for better performance. Specifically, we design a text encoder to capture the semantic information from the text extension of labels. Then we map the audio signals to align with the semantics of corresponding class labels via an audio encoder and a similarity calculation module so as to enforce the semantic consistency. Extensive experiments on two audio datasets, ESC-50 and US8K demonstrate that our proposed method consistently outperforms the compared audio classification methods.

Tags:

Modeling, analysis and synthesis of acoustic environments

SEMANTICAC: SEMANTICS-ASSISTED FRAMEWORK FOR AUDIO CLASSIFICATION

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Neural Fourier Shift for Binaural Speech Rendering

Lightweight Annotation and Class Weight Training for Automatic Estimation of Alarm Audibility in Noise

Self-supervised learning of audio representations using angular contrastive loss

Join the IEEE Signal Processing Society