DOMAIN AND LANGUAGE ADAPTATION USING HETEROGENEOUS DATASETS FOR WAV2VEC2.0-BASED SPEECH RECOGNITION OF LOW-RESOURCE LANGUAGE
Kak Soky (Kyoto University); Sheng Li (National Institute of Information & Communications Technology (NICT)); Chenhui Chu (Kyoto University); Tatsuya Kawahara (Kyoto University)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
We address the effective finetuning of a large-scale pre-trained model for automatic speech recognition of low-resource languages with only a one-hour matched dataset. The finetuning is composed of domain adaptation and language adaptation, and they are conducted by using heterogeneous datasets, which are matched with either domain or language. For effective adaptation, we incorporate auxiliary tasks of domain identification and language identification with multi-task learning. Moreover, the embedding result of the auxiliary tasks is fused to the encoder output of the pre-trained model for ASR. Experimental evaluations on the Khmer ASR using the corpus of ECCC (the Extraordinary Chambers in the Courts of Cambodia) demonstrates that first conducting domain adaption and then language adaption is effective. In addition, multi-tasking with domain embedding gives the best performance, which reduces the baseline CER by 6.47%.