Exploring Machine Speech Chain for Domain Adaptation

Fengpeng Yue, Tom Ko, Yu Zhang, Yan Deng, Lei He

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:10:54

09 May 2022

Machine Speech Chain integrates both end-to-end (E2E) automatic speech recognition (ASR) and neural text-to-speech (TTS) into one circle for joint training. It has been proven that it can effectively leverage a large amount of unpaired data in the spirit of data augmentation. In this paper, we explore the TTS?ASR pipeline in machine speech chain to perform domain adaptation for both E2E ASR and neural TTS models with only text data from the target domain. We conduct experiments by adapting from audiobook domain (i.e., LibriSpeech) to presentation domain (i.e., TED-LIUM). There is a relative word error rate (WER) reduction of 19.7% for the E2E ASR model on the TED-LIUM test set, and a relative WER reduction of 29.4% in synthetic speech generated by neural TTS in the presentation domain. Moreover, we observe that the gains from the proposed method and conventional adaptation methods of language models are additive.

Tags:

speech recognition

speech chain

speech synthesis

domain adaptation

Exploring Machine Speech Chain for Domain Adaptation

Fengpeng Yue, Tom Ko, Yu Zhang, Yan Deng, Lei He

Value-Added Bundle(s) Including this Product

ICASSP 2022, May 2022 Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Tutorial: Foundational Problems in Neural Speech Recognition

Conversational Speech Processing and Recognition: Speech Separation, End-to-End Modeling, and Speaker Diarization

TARGET-DISCRIMINABILITY-INDUCED MULTI-SOURCE-FREE DOMAIN ADAPTATION

Join the IEEE Signal Processing Society