NAS-DYMC: NAS-based Dynamic Multi-Scale Convolutional Neural Network for Sound Event Detection

Wang Jun (Kuaishou Technology); Peng Yao (Kuaishou Inc.); Feng Deng (Kuaishou); Jianchao Tan (Kwai Inc.); Chengru Song (Kuaishou); Xiaorui Wang (Kwai)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

07 Jun 2023

CNN+RNN models have become the mainstream approach for semi-supervised sound event detection, and the CNN part is mainly a stack of several 2D convolutional layers to capture the representations of the time-frequency features. However, conventional 2D convolution is of limited ability in capturing detailed information about acoustic events. In this paper, to enhance the representation ability of CNN, we propose NAS-DYMC, a NAS-based dynamic multi-scale convolutional neural network to extract a more effective acoustic representation. Specifically, multi-scale convolution can capture the characteristics of sound events with different time-frequency distributions and dynamic convolution enhances the representation capability of conventional convolution by adapting attention weights onto basis kernels. Furthermore, a neural architecture search (NAS) method is adopted to find the optimal network architecture from the search space consisting of various dynamic multi-scale convolutions for the DCASE 2021 Task4 dataset. Experimental results demonstrate the superiority of our proposed method.

Tags:

Audio for multimedia and audio processing systems