QUANTIZED WINOGRAD ACCELERATION FOR CONV1D EQUIPPED ASR MODELS ON MOBILE DEVICES
Yiwu Yao, Chengyu Wang, Jun Huang
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:06:53
The intensive computation of Automatic Speech Recognition (ASR) models obstructs them from being deployed on mobile devices. In this work, we present a novel quantized Winograd optimization framework, combining quantization and fast convolution to achieve efficient inference acceleration for ASR models on mobile devices. To avoid the information loss due to the combination of quantization and Winograd convolution, a Range-Scaled Quantization (RSQ) training method is proposed, integrating integer-range scaling and quantization noise minimization. Moreover, the Conv1D equipped DFSMN (ConvDFSMN) model is designed for mobile applications and experimental verification. We conduct extensive experiments on ConvDFSMN and Wav2letter models, demonstrating that the models can be effectively optimized with the proposed optimization framework. Especially, the optimized Wav2letter model achieves 1.48x speedup for end-to-end inference and 1.92x speedup for model backbone inference on ARMv7-based mobile devices, with only an approximate 0.07% decrease in WER on AIShell-1.