Sharing Low Rank Conformer Weights for Tiny Always-On Ambient Speech Recognition Models
Steven M. Hernandez (Virginia Commonwealth University); Ding Zhao (Google); Shaojin Ding (Google); Antoine Bruguier (Google); Rohit Prabhavalkar (Google); Tara Sainath (Google); Yanzhang He (Google); Ian McGraw ()
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Continued improvements in machine learning techniques offer exciting new opportunities through the use of larger models and larger training datasets. However, there is a growing need to offer these new capabilities on-board low-powered devices such as smartphones, wearables and other embedded environments where only low memory is available. Towards this, we consider methods to reduce the model size of Conformer-based speech recognition models which typically require models with greater than 100M parameters down to just 5M parameters while minimizing impact on model quality. Such a model allows us to achieve always-on ambient speech recognition on edge devices with low-memory neural processors. We propose model weight reuse at different levels within our model architecture: (i) repeating full conformer block layers, (ii) sharing specific conformer modules across layers, (iii) sharing sub-components per conformer module, and (iv) sharing decomposed sub-component weights after low-rank decomposition. By sharing weights at different levels of our model, we can retain the full model in-memory while increasing the number of virtual transformations applied to the input. Through a series of ablation studies and evaluations, we find that with weight sharing and a low-rank architecture, we can achieve a WER of 2.84 and 2.94 for Librispeech dev-clean and test-clean respectively with a 5M parameter model.