Unifying Speech Processing Applications with Speech Foundation Models

Shinji Watanabe

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 01:01:22

Keynote Speech 18 Dec 2023

After the success of large language models in natural language processing, the field of speech processing is currently exploring the possibility of combining speech and language modalities to create a foundation model. This single unified model could perform multiple speech processing applications, such as speech recognition, synthesis, translation, and spoken language processing. Our group is dedicated to achieving this goal through the development of speech foundation models, including speech/text decoder-only models, whisper-style multi-tasking, universal spoken language understanding, and multilingual SUPERB projects. In addition to showcasing the above research outcomes during this talk, we will describe the engineering efforts involved in building such a large foundation model from scratch on an academic computing scale for reproducibility.

Tags:

IEEE ASRU 2023

automatic speech recognition

speech processing

Unifying Speech Processing Applications with Speech Foundation Models

Shinji Watanabe

More Like This

Slides: The Changing Landscape of Speech Foundation Models

The Changing Landscape of Speech Foundation Models

End-to-End Automatic Speech Recognition

Join the IEEE Signal Processing Society