Skip to main content

Exploring Machine Speech Chain for Domain Adaptation

Fengpeng Yue, Tom Ko, Yu Zhang, Yan Deng, Lei He

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 00:10:54
09 May 2022

Machine Speech Chain integrates both end-to-end (E2E) automatic speech recognition (ASR) and neural text-to-speech (TTS) into one circle for joint training. It has been proven that it can effectively leverage a large amount of unpaired data in the spirit of data augmentation. In this paper, we explore the TTS?ASR pipeline in machine speech chain to perform domain adaptation for both E2E ASR and neural TTS models with only text data from the target domain. We conduct experiments by adapting from audiobook domain (i.e., LibriSpeech) to presentation domain (i.e., TED-LIUM). There is a relative word error rate (WER) reduction of 19.7% for the E2E ASR model on the TED-LIUM test set, and a relative WER reduction of 29.4% in synthetic speech generated by neural TTS in the presentation domain. Moreover, we observe that the gains from the proposed method and conventional adaptation methods of language models are additive.

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00