-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 09:11
There has been a vast literature on Neural Network Compression, either by quantizing network variables to low precision numbers or pruning redundant connections from the network architecture. However, these techniques experience performance degradation when the compression ratio is increased to an extreme extent. In this paper, we propose Cascaded Mixed-precision Networks (CMNs), which are compact yet efficient neural networks without incurring performance drop. CMN is designed as a cascaded framework by concatenating a group of neural networks with sequentially increased bitwidth. The execution flow of CMN is conditional on the difficulty of input samples, i.e., easy examples will be correctly classified by going through extremely low-bitwidth networks, and hard examples will be handled by high-bitwidth networks, so that the average compute is reduced. In addition, weight pruning is incorporated into the cascaded framework and jointly optimized with the mixed-precision quantization. To validate this method, we implemented a 2-stage CMN consisting of a binary neural network and a multi-bit (e.g. 8 bits) neural network. Empirical results on CIFAR-100 and ImageNet demonstrate that CMN performs better than state-of-the-art methods, in terms of accuracy and compute.