Skip to main content

Multilingual Phonetic Dataset For Low Resource Speech Recognition

Xinjian Li, David Mortensen, Florian Metze, Alan Black

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 00:12:20
10 Jun 2021

Phone Recognition is one of the most important tasks in the field of multilingual speech recognition, especially for low-resource languages whose orthographies are not available. However, most speech recognition datasets so far only focus on high-resource languages, there are very few datasets available for low-resource languages, especially datasets with detailed phone annotation. In this work, we present a large multilingual phonetic dataset, which is preprocessed and aligned from the UCLA phonetic dataset. The dataset contains around 100 low-resource languages and 7000 utterances in total. This dataset would provide an ideal training/evaluation set for universal phone recognition.

Chairs:
Shi-Xiong Zhang

Value-Added Bundle(s) Including this Product

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00