Universal Phone Recognition With A Multilingual Allophone System
Xinjian Li, Siddharth Dalmia, Juncheng Li, Graham Neubig, David Mortensen, Alan Black, Florian Metze, Antonios Anastasopoulos, Matthew Lee, Patrick Littell, Jiali Yao
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 10:24
Recently, multilingual speech recognition has achieved tremendous progress by sharing parameters across languages. Multilingual acoustic models, however, generally ignore the difference between phonemes (sounds that can support lexical contrasts in a \emph{particular} language and their underlying phones (the sounds that are actually spoken, which are language independent). This can lead to performance degradation when combining a variety of training languages, as identically annotated phonemes can actually correspond to several different underlying phonetic realizations. In this work, we propose a joint model of both language-independent phone and language-dependent phoneme distributions. In multilingual ASR experiments over 11 languages, we find that this modeling of underlying structure of phonemes improves testing performance by 2.0\% phoneme error rate. Additionally, because we are explicitly modeling language-independent phones, this allows us to build a (nearly-)universal phone recognizer that, when combined with a large manually curated database of phone inventories, PHOIBLE, can be customized into 2000 language dependent recognizers. Experiments on two low-resourced indigenous languages, Inuktitut and Tusom, show that our recognizer achieves phone accuracy improvements of more than 17\%, moving a step closer to speech recognition for all languages in the world.