-
SPS
IEEE Members: $11.00
Non-members: $15.00Pages/Slides: 56
Self-supervised learning (SSL) models perform remarkably in various domains, including audio. In this webinar, the presenter will introduce an audio SSL, Bootstrap Your Own Latent (BYOL) for Audio (BYOL-A), that learns a general-purpose audio representation effective for various audio tasks. The presenter hypothesizes that the representations should provide multi-aspect information to serve the various needs of diverse tasks. BYOL-A learns a robust representation against sound changes, such as pitch and background noise, and combines multi-layer features. As a result, BYOL-A demonstrates generalizability with the best average result of 72.4% among nine tasks and the best speaker identification task VoxCeleb1 accuracy of 57.6% in the experiments. The presenter will investigate the performance contribution of BYOL-A components. The presenter will also introduce the various use cases from other studies, such as video understanding, for how the studies used BYOL-A in their deep learning framework.