Skip to main content

Sequence-Level Consistency Training For Semi-Supervised End-To-End Automatic Speech Recognition

Ryo Masumura, Mana Ihori, Akihiko Takashima, Takafumi Moriya, Atsushi Ando, Yusuke Shinohara

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 12:03
04 May 2020

This paper presents a novel semi-supervised end-to-end automatic speech recognition (ASR) method that employs consistency training with the use of unlabeled data. In consistency training, unlabeled data can be utilized for constraining a model such that it becomes invariant to small deformation. In fact, considering consistency can make the model robust to a variety of input examples. While previous studies have applied consistency training to primitive classification problems, no studies have employed consistency training to tackle sequence-to-sequence generation problems including end-to-end ASR. One problem is that existing consistency training schemes cannot take sequence-level generation consistency into consideration. In this paper, we propose a sequence-level consistency training scheme specialized to handle sequence-to-sequence generation problems. Our key idea is to consider the consistency of the generation function by utilizing beam search decoding results. For semi-supervised learning, we adopt Transformer as the end-to-end ASR model, and SpecAugment as the deformation function in consistency training. Our experiments show that our semi-supervised learning proposal with sequence-level consistency training can efficiently improve ASR performance using unlabeled speech data.

Value-Added Bundle(s) Including this Product

More Like This

  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00