Speech Translation: Model, Data, Evaluation
Ann Lee, Sravya Popuri, Xutai Ma, David Dale
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 02:41:09
Speech translation technology, including speech-to-text (S2T) and speech-tospeech translation (S2ST), aims at converting speech from one language into speech or text in another language. Model training is challenging as it requires the model to learn not only the alignment between two languages but also the acoustic and linguistic characteristics of both languages. In recent years, there are several research breakthroughs that transform speech translation systems from proof-of-concept to high-performing real-world products. In this tutorial, we will introduce the full pipeline for building a speech translation system with literature reviews and discussions on model training, dataset creation and robust evaluation. In the end, we will also present examples on how to leverage tools open-sourced by the team to build the pipeline. This tutorial intends to democratize speech translation technology through knowledge sharing and promote future research in the community.