PEER COLLABORATIVE LEARNING FOR POLYPHONIC SOUND EVENT DETECTION

Hayato Endo, Hiromitsu Nishizaki

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:13:44

12 May 2022

This paper describes how semi-supervised learning, called peer collaborative learning (PCL), can be applied to the polyphonic sound event detection (PSED) task, which is one of the tasks in the Detection and Classification of Acoustic Scenes and Events (DCASE) challenge. Many deep learning models have been studied to determine what kind of sound events occur where and for how long in a given audio clip. The characteristic of PCL used in this paper is the combination of ensemble-based knowledge distillation into sub-networks and student-teacher model-based knowledge distillation, which can train a robust PSED model from a small amount of strongly labeled data, weakly labeled data, and a large amount of unlabeled data. We evaluated the proposed PCL model using the DCASE 2019 Task 4 dataset, and achieved an F1-score improvement of about 8.2 points compared with the baseline model.

Tags:

emsemble training

knowledge distillation

semi-supervised training

student-teacher model

sound event detection

PEER COLLABORATIVE LEARNING FOR POLYPHONIC SOUND EVENT DETECTION

Hayato Endo, Hiromitsu Nishizaki

Value-Added Bundle(s) Including this Product

ICASSP 2022, May 2022 Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

KD-FIXMATCH: KNOWLEDGE DISTILLATION SIAMESE NEURAL NETWORKS

FEATURE ADVERSARIAL DISTILLATION FOR POINT CLOUD CLASSIFICATION

EXPLORING EFFECTIVE KNOWLEDGE DISTILLATION FOR TINY OBJECT DETECTION

Join the IEEE Signal Processing Society