Low-Latency Lightweight Streaming Speech Recognition With 8-Bit Quantized Simple Gated Convolutional Neural Networks
Jinhwan Park, Xue Qian, Youngmin Jo, Wonyong Sung
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 12:58
Automatic speech recognition (ASR) is very important for mobile devices. However, deep neural network-based ASR demands a large number of computations, while the memory bandwidth and battery capacity of mobile devices are limited. Server-based implementations are mostly employed, but this increases latency or privacy concerns. Efficient on-device ASR is the solution for these issues. In this paper, we propose a low-latency on-device speech recognition system with a simple gated convolutional network (SGCN). The SGCN shows a competitive recognition accuracy even with 1M parameters. In addition, SGCN is advantageous for parallelization which enables efficient cache utilization. 8-bit quantization is applied to reduce the memory size and computation time. The proposed system features online recognition fulfilling the 0.4s latency limit and operates with the real-time factor of 0.2 using only a single 900MHz CPU core. The system occupying 1.2MB memory footprint shows 19.75% word error rate (WER) with greedy decoding.