RepackagingAugment: Overcoming Prediction Error Amplification in Weight-averaged Speech Recognition Models Subject to Self-training
Jae-Hong Lee (Hanyang University); Dong-Hyun Kim (Hanyang University); Joon-Hyuk Chang (Hanyang University)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Representation-based speech recognition models have demonstrated state-of-the-art performance on downstream tasks. These models are pre-trained on large-scale unlabeled data, fine-tuned on a small amount of labeled data, and subsequently advanced via the self-training procedure by leveraging pseudo-labels. However, a self-trained representation model produces prediction errors caused by training with incorrect labels in the pseudo-labeled data. Weight-averaging methods have been employed to refine the pseudo-labels in a variety of studies; however, these methods amplify the prediction errors of each self-trained model. To alleviate this problem, we propose RepackagingAugment, a data augmentation method that improves the diversity of models while preventing the same incorrect labels from recursively occurring in every epoch. Our data augmentation deconstructs the paired speech--text data into word units and repackages them into a randomly determined number of word sequences. This strategy induces the models to produce different prediction errors by mitigating the problem of incorrect label overfitting. Through various experiments on representation models, such as wav2vec 2.0 and data2vec, we demonstrate that our approach improves the performance of weight-averaged models.