Efficient Large Scale Semi-Supervised Learning For Ctc Based Acoustic Models
Prakhar Swarup, Debmalya Chakrabarty, Ashtosh Sapru, Hitesh Tulsiani, Harish Arsikere, Sri Garimella
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 0:11:17
Semi-supervised learning (SSL) is an active area of research which aims to utilize unlabelled data to improve the accuracy of speech recognition systems. While the previous studies have established the efficacy of various SSL methods on varying amounts of data, this paper presents largest ASR SSL experiment ever conducted till date where 75K hours of transcribed and 1.2 million hours of untranscribed data is used for model training. In addition, the paper introduces couple of novel techniques to facilitate such a large scale experiment: 1) a simple scalable Teacher-Student based SSL method for connectionist temporal classification (CTC) objective and 2) effective data selection mechanisms for leveraging massive amounts of unlabelled data to boost the performance of student models. Further, we apply SSL in all stages of the acoustic model training, including final stage sequence discriminative training. Our experiments indicate encouraging word error rate (WER) gains up to 14 % in such a large scale transcribed data regime due to the SSL training.