BYOL for Audio: Exploring Pre-Trained General-Purpose Audio Representations

Daisuke Niizumi

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:46:20

13 Jun 2024

Self-supervised learning (SSL) models perform remarkably in various domains, including audio. In this webinar, the presenter will introduce an audio SSL, Bootstrap Your Own Latent (BYOL) for Audio (BYOL-A), that learns a general-purpose audio representation effective for various audio tasks. The presenter hypothesizes that the representations should provide multi-aspect information to serve the various needs of diverse tasks. BYOL-A learns a robust representation against sound changes, such as pitch and background noise, and combines multi-layer features. As a result, BYOL-A demonstrates generalizability with the best average result of 72.4% among nine tasks and the best speaker identification task VoxCeleb1 accuracy of 57.6% in the experiments. The presenter will investigate the performance contribution of BYOL-A components. The presenter will also introduce the various use cases from other studies, such as video understanding, for how the studies used BYOL-A in their deep learning framework.

Tags:

SPS Webinar 2024

self-supervised learning

BYOL

Audio Representation

data augmentation

BYOL for Audio: Exploring Pre-Trained General-Purpose Audio Representations

Daisuke Niizumi

More Like This

Short Course Bundle: ICIP 2023 COURSE 2: Short Course: Unboxing Advancements in Biomedical Image Processing (Parts 1-4)

Teaching Foundation Models New Skills: Insights and Experiences

Federated Learning in The Age of Foundation Models

Join the IEEE Signal Processing Society