Joint Discriminator and Transfer Based Fast Domain Adaptation for End-to-End Speech Recognition
Hang Shao (Shanghai Jiao Tong University); Tian Tan (Aispeech Ltd.); wei wang (Shanghai Jiao Tong University); Xun Gong (Shanghai Jiaotong University); Yanmin Qian (Shanghai Jiao Tong University)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Adapting End-to-End (E2E) models to unseen domains is still a huge challenge since training E2E models requires lots of paired audio and text training data. We propose a novel domain adaptation framework for the E2E model, which only uses the text of the target domain. Moreover, the proposed methods can keep the performance on the source domain intact while greatly improving the performance
on the target domain. The proposed framework consists of two parts: the discriminator and the transfer which were optimized separately. Finally, optimized discriminator and transfer were combined and evaluated on two domain adaption tasks. In the experiments of adapting the English LIBRISPEECH to GIGASPEECH, we obtained an average relative 11.6% and 11.8% on word error rate (WER) reduction for the target domain dev and test sets, respectively, while
almost without WER degradation on the source domain. For the in-house Chinese corpus aviation and TV, the character error rate (CER) of the source domain increased within 5%, while the CER on the target domain achieved around relative 85% and 42% improvement,respectively. In addition, our approach is also more effective in the mixed domain scenarios in the evaluation.