The Changing Landscape of Speech Foundation Models

Shinji Watanabe, Abdelrahman Mohamed, Karen Livescu, Hung-yi Lee, Tara Sainath, Katrin Kirchhoff, Shang-Wen Li

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 01:59:49

06 Aug 2024

The paper "Self-Supervised Speech Representation Learning: A Review", published in 2022, focused on how representation learning transformed the landscape of speech perception models and AI applications. However, over the past two years since the article was published, there have been numerous developments in building "Foundation Models" that have blurred the boundaries between domains. Generative models have had the largest share of research innovation due to their impressive performance across many modalities and their applicability to a wider set of scenarios. In this talk, the presenters will connect their 2022 review of self-supervised approaches to the current developments in foundation perception and generative models. They will highlight active directions of research in foundation models, methods to analyze them, and their standing in comparison to other approaches across a wide range of speech applications.

Tags:

SPS Webinar 2024

Hidden Markov models

data models

representation learning

training

speech processing

self-supervised learning