LATTICEBART: LATTICE-TO-LATTICE PRE-TRAINING FOR SPEECH RECOGNITION
Lingfeng Dai, Lu Chen, Zhikai Zhou, Kai Yu
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:12:39
To improve automatic speech recognition, a growing body of work has attempted to further fix the output of ASR systems with advanced sequence models. However, the output of ASR systems differs significantly from the input form of standard sequence models. In order to encompass richer information, the output of ASR systems is often a compact lattice structure containing multiple sentences. This mismatch in input form significantly limits sequence models' ability. On the one hand, the widely used pre-trained models cannot directly input lattice structures and are therefore difficult to use for this task. On the other hand, the sparsity of the supervised training data forces the model to have the ability to learn from limited data. To address these problems, we propose LatticeBART, a model that decodes the sequence from the lattice in an end-to-end fashion. In addition, this paper proposes the lattice-to-lattice pre-training method, which can be used when annotated data is missing, using easily generated lattice with the ASR system for training. The experimental results show that our model can effectively improve the output quality of the ASR system.