Speech Translation: Model, Data, Evaluation

Ann Lee, Sravya Popuri, Xutai Ma, David Dale

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 02:41:09

Tutorial 16 Dec 2023

Speech translation technology, including speech-to-text (S2T) and speech-tospeech translation (S2ST), aims at converting speech from one language into speech or text in another language. Model training is challenging as it requires the model to learn not only the alignment between two languages but also the acoustic and linguistic characteristics of both languages. In recent years, there are several research breakthroughs that transform speech translation systems from proof-of-concept to high-performing real-world products. In this tutorial, we will introduce the full pipeline for building a speech translation system with literature reviews and discussions on model training, dataset creation and robust evaluation. In the end, we will also present examples on how to leverage tools open-sourced by the team to build the pipeline. This tutorial intends to democratize speech translation technology through knowledge sharing and promote future research in the community.

Tags:

IEEE ASRU 2023

automatic speech recognition

speech translation

Speech Translation: Model, Data, Evaluation

Ann Lee, Sravya Popuri, Xutai Ma, David Dale

More Like This

End-to-End Automatic Speech Recognition

Towards a Speech Version of ChatGPT

Neural Signal Interpretation for Spoken Communication

Join the IEEE Signal Processing Society