Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 10:07
10 Jul 2020

By reducing parameters and quantization, convolutional neural network compression has become the key steps to reduce model size. However, most of the existing work treats these two methods separately, and the work combining the two methods is cumbersome. We propose a simple but highly efficient training-aware convolutional neural network compression paradigm, which elegantly combines lightweight network design and binary network architecture into a three-stage scheme.
In the proposed scheme, we keep the first layer as the standard convolution in order to maintain the representation ability of the network, and then use depthwise separable convolution to replace the standard convolution for middle layers, finally we adopt binary quantization for the remaining convolution layers. Our scheme is training-aware, meaning pruning and quantization are explicit and aware to us, unlike the well-known deep compression method where pruning and quantization are unknown before training.
Experimental results show that our proposed compression paradigm can achieve 75.4 times of compression ratio compared to maximal 49 times of deep compression, while maintaining similar performance. Both compression ratio and accuracy are much better than the state-of-the-art binary neural network BNN+.