End-to-End Automatic Speech Recognition

Jinyu Li

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 01:34:10

10 May 2024

The field of automatic speech recognition (ASR) is now dominated by the end-to-end (E2E) models that directly map speech to text. In this talk, we will give an overview of the E2E ASR models and introduce the recent progress from an industry perspective. To design an E2E model that has high accuracy and low latency, a masking strategy was applied to Transformer Transducer. We will discuss technologies that can use text-only data for general model training through pretraining and adaptation to a new domain through augmentation and factorization. We will also discuss how to build multilingual ASR models to serve all the users. Then, we will extend E2E modeling for streaming multi-speaker ASR. Finally, we will end the talk with some new research opportunities we can explore.

Tags:

SPS Webinar 2024

end-to-end

automatic speech recognition

transformer transducer

Speech and Language Processing

End-to-End Automatic Speech Recognition

Jinyu Li

More Like This

Teaching Foundation Models New Skills: Insights and Experiences

Federated Learning in The Age of Foundation Models

Alternating GD & Minimization (AltGDmin) for Fast Communication-Efficient Federated Learning

Join the IEEE Signal Processing Society