Federated Self-Learning with Weak Supervision for Speech Recognition

Milind M Rao (Amazon); Gopinath Chennupati (Amazon Alexa); Gautam Tiwari (Amazon); Anit Kumar Sahu (Amazon Alexa AI); Anirudh Raju (Amazon Alexa); Ariya Rastrow (Amazon); Jasha Droppo (Amazon)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

07 Jun 2023

Automatic speech recognition (ASR) models with low-footprint are increasingly being deployed on edge devices for conversational agents, which enhances privacy. We study the problem of federated continual incremental learning for recurrent neural network-transducer (RNN-T) ASR models in the privacy-enhancing scheme of learning on-device, without access to ground truth human transcripts or machine transcriptions from a stronger ASR model. In particular, we study the performance of a self-learning based scheme, with a paired teacher model updated through an exponential moving average of ASR models. Further, we propose using possibly noisy weak-supervision signals such as feedback scores and natural language understanding semantics determined from user behavior across multiple turns in a session of interactions with the conversational agent. These signals are leveraged in a multitask policy-gradient training approach to improve the performance of self-learning for ASR. Finally, we show how catastrophic for getting can be mitigated by combining on-device learning with a memory-replay approach using selected historical datasets. These innovations allow for 10% relative improvement in WER on new use cases with minimal degradation on other test sets in the absence of strong-supervision signals such as ground-truth transcriptions.

Tags:

Acoustic modeling for automatic speech recognition