A New Dcase 2017 Rare Sound Event Detection Benchmark Under Equal Training Data: Crnn With Multi-Width Kernels
Jan Baumann, Patrick Meyer, Timo Lohrenz, Alexander Roy, Michael Papendieck, Tim Fingscheidt
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:13:01
Rare sound event detection (rare SED) deals with obtaining valuable information from data consisting mostly of acoustic background noises. It has meanwhile a long research history and was part of the DCASE 2017 Challenge. State-of-the-art performance is currently reached using a stacked combination of a CNN and an RNN, dubbed CRNN, which was also successfully applied in other domains such as in hybrid automatic speech recognition. In this work, we propose a new CRNN model for rare SED. This new model uses a set of parallel convolutions with multiple kernel widths in the CRNN and is based on an extended feature representation of the log-mel spectrogram. Furthermore, we apply and optimize different evaluation postprocessing methods and analyze the modifications in an ablation study. The proposed model outperforms the so-far top-scoring networks of the DCASE Challenge - using the same training material for all methods - by an error rate of 6.13% absolute and by 4.39% absolute in the F1 score on the test set and under these conditions achieves a new benchmark result on the DCASE 2017 Rare SED data set.
Chairs:
Mark Cartwright