Sound Event Detection By Consistency Training And Pseudo-Labeling With Feature-Pyramid Convolutional Recurrent Neural Networks

Chih-Yuan Koh, You-Siang Chen, Yi-Wen Liu, Mingsian Bai

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:14:25

09 Jun 2021

Due to the high cost of large-scale strong labeling, sound event detection (SED) using only weakly-labeled and unlabeled data has drawn increasing attention in recent years. To exploit large amount of unlabeled in-domain data efficiently, we applied three semi-supervised learning strategies: interpolation consistency training (ICT), shift consistency training (SCT), and weakly pseudo-labeling. In addition, we propose FP-CRNN, a convolutional recurrent neural network (CRNN) which contains feature-pyramid (FP) components, to leverage temporal information by utilizing features at different scales. Experiments were conducted on DCASE 2020 task 4. In terms of event-based F-measure, these approaches outperform the official baseline system, at 34.8%, with the highest Fmeasure of 48.0% achieved by an FP-CRNN that was trained with the combination of all three strategies.

Chairs:

Romain Serizel

Tags:

signal processing society

IEEE icassp 2021

virtual conference

2021

sps

virtual conference icassp 2021

june 6-11 2021

icassp 2021