ASR ERROR CORRECTION WITH DUAL-CHANNEL SELF-SUPERVISED LEARNING

Fan Zhang, Jinyao Yan, Mei Tu, Song Liu

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:12:49

10 May 2022

To improve the performance of Automatic Speech Recognition (ASR), it is common to deploy an error correction module at the post-processing stage to correct recognition errors. In this paper, we propose 1) an error correction model, which takes account of both contextual information and phonetic information by dual-channel; 2) a self-supervised learning method for the model. Firstly, an error region detection model is used to detect the error regions of ASR output. Then, we perform dual-channel feature extraction for the error regions, where one channel extracts their contextual information with a pre-trained language model, while the other channel builds their phonetic information. At the training stage, we construct error patterns at the phoneme level, which simplifies the data annotation procedure, thus allowing us to leverage a large scale of unlabeled data to train our model in a self-supervised learning manner. Experimental results on different test sets demonstrate the effectiveness and robustness of our model.

Tags:

self-supervised learning

dual-channel

pre-trained language model

error correction