LATTICEBART: LATTICE-TO-LATTICE PRE-TRAINING FOR SPEECH RECOGNITION

Lingfeng Dai, Lu Chen, Zhikai Zhou, Kai Yu

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:12:39

08 May 2022

To improve automatic speech recognition, a growing body of work has attempted to further fix the output of ASR systems with advanced sequence models. However, the output of ASR systems differs significantly from the input form of standard sequence models. In order to encompass richer information, the output of ASR systems is often a compact lattice structure containing multiple sentences. This mismatch in input form significantly limits sequence models' ability. On the one hand, the widely used pre-trained models cannot directly input lattice structures and are therefore difficult to use for this task. On the other hand, the sparsity of the supervised training data forces the model to have the ability to learn from limited data. To address these problems, we propose LatticeBART, a model that decodes the sequence from the lattice in an end-to-end fashion. In addition, this paper proposes the lattice-to-lattice pre-training method, which can be used when annotated data is missing, using easily generated lattice with the ASR system for training. The experimental results show that our model can effectively improve the output quality of the ASR system.

Tags:

lattice-to-lattice

speech recognition

lattice-to-sequence

pre-trained language model

LATTICEBART: LATTICE-TO-LATTICE PRE-TRAINING FOR SPEECH RECOGNITION

Lingfeng Dai, Lu Chen, Zhikai Zhou, Kai Yu

Value-Added Bundle(s) Including this Product

ICASSP 2022, May 2022 Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Tutorial: Foundational Problems in Neural Speech Recognition

Conversational Speech Processing and Recognition: Speech Separation, End-to-End Modeling, and Speaker Diarization

HYBRID RNN-T/ATTENTION-BASED STREAMING ASR WITH TRIGGERED CHUNKWISE ATTENTION AND DUAL INTERNAL LANGUAGE MODEL INTEGRATION

Join the IEEE Signal Processing Society