Cross-Training: A Semi-Supervised Training Scheme for Speech Recognition
Soheil Khorram (Google Inc. USA); Anshuman Tripathi (Google); Jaeyoung Kim (Google); Han Lu (Google Inc. USA); Qian Zhang (Google Inc. USA); Rohit Prabhavalkar (Google); Hasim Sak (Google)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Semi-supervised training can be performed by jointly optimizing supervised and unsupervised losses. In many settings, supervised and unsupervised losses are inconsistent, and this inconsistency creates instability in training. As a solution, we propose cross-training: instead of training one network with two losses, we train two separate networks, each with a different loss; we then tie the parameters of the networks by minimizing an additional L2 loss between the parameters. This L2 loss acts as a knowledge bridge between the networks. It forces the networks to be similar; therefore both can learn from each other. This paper introduces the cross-training scheme to develop a stable contrastive siamese (c-siam) network. Our experiments on LibriSpeech and Google’s Voice-Search/YouTube datasets show that (1) cross-training provides 20% relative WER improvement over the SOTA systems on the LibriSpeech dataset; (2) cross-training stabilizes c-siam training and significantly outperforms SOTA systems on small supervised datasets; (3) cross-training is effective for cascaded encoders, unlike the original c-siam which shows weak convergence characteristics.