TIME-BALANCED FOCAL LOSS FOR AUDIO EVENT DETECTION

Sangwook Park, Mounya Elhilali

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:11:12

09 May 2022

Sound Event Detection (SED) tackles the challenge of identifying sound events in an audio recording by delimiting both their temporal boundaries as well as sound category. With recent advances in deep learning, current systems are able to leverage availability of large datasets to train sophisticated and highly effective SED models. Nonetheless, sound sources and acoustic characteristics of different classes vary greatly in their prevalence as well as representation in labeled datasets. The challenge with data imbalance in the case of SED stems not only from the representation (number of samples) across classes but also the natural asymmetry in time duration across different events varying from short transient events such as the clacking of dishes to more sustained events such as vacuuming. This variability results in an inherent disproportional representation of effective training samples. To address this compounded imbalance issue, this work proposes a balanced focal learning function that introduces a novel time-sensitive classwise weight. The proposed loss is applied to SED in the context of DCASE2021 challenge, and reports a notable improvement over the baseline, particularly in the case of shorter sound events.

Tags:

dcase challenge

weighted loss

imbalanced data

sound event detection

focal loss

TIME-BALANCED FOCAL LOSS FOR AUDIO EVENT DETECTION

Sangwook Park, Mounya Elhilali

Value-Added Bundle(s) Including this Product

ICASSP 2022, May 2022 Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

EXPLORING DIFFUSION MODELS FOR UNSUPERVISED VIDEO ANOMALY DETECTION

MULTIMODAL EVALUATION METHOD FOR SOUND EVENT DETECTION

A BENCHMARK OF STATE-OF-THE-ART SOUND EVENT DETECTION SYSTEMS EVALUATED ON SYNTHETIC SOUNDSCAPES

Join the IEEE Signal Processing Society