Improving Noisy Student Training on Non-target Domain Data for Automatic Speech Recognition

YU CHEN (University of Hong Kong); Wen Ding (NVIDIA); Junjie Lai (NVIDIA)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

06 Jun 2023

Noisy Student Training (NST) has recently demonstrated extremely strong performance in Automatic Speech Recognition(ASR). In this paper, we propose a data selection strategy named LM Filter to improve the performance of NST on non-target domain data in ASR tasks. Hypotheses with and without a Language Model are generated and the CER differences between them are utilized as a filter threshold. Results reveal that significant improvements of 10.4% compared with no data filtering baselines. We can achieve 3.31% CER in AISHELL-1 test set, which is best result from our knowledge without any other supervised data. We also perform evaluations on the supervised 1000 hour AISHELL-2 dataset and competitive results of 4.73% CER can be achieved.

Tags:

transfer learning

Improving Noisy Student Training on Non-target Domain Data for Automatic Speech Recognition

YU CHEN (University of Hong Kong); Wen Ding (NVIDIA); Junjie Lai (NVIDIA)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Short Course Bundle: ICIP 2023 COURSE 2: Short Course: Unboxing Advancements in Biomedical Image Processing (Parts 1-4)

(Slides) Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition

PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition

Join the IEEE Signal Processing Society