LKBQ: PUSHING THE LIMIT OF POST-TRAINING QUANTIZATION TO EXTREME 1 BIT
Tianxiang Li, Bin Chen, QianWei Wang, Yujun Huang, ShuTao Xia
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Recent advances have shown the potential for post-training quantization (PTQ) to reduce excessive hardware resources and quantize deep models to low bits in a short time, compared with Quantization-Aware Training (QAT). However, existing PTQ approaches lose a lot of accuracies when quantizing the model to extremely low bits, e.g., 1 bit. In this work, we propose layer-by-layer self-knowledge distillation binary post-training quantization (LKBQ), the first method capable of quantizing the weights of neural networks to 1 bit in PTQ domain. We show that careful use of layer-by-layer self-distillation within the LKBQ can provide a significant performance boost. Furthermore, our evaluation results show that the initialization of quantized network weights can have a huge impact on the results. Then we propose three methods for weight initialization. Finally, in light of the characteristics of the binarized network, we propose a method named gradient scaling to further improve efficiency. Our experiments show that LKBQ pushes the limit of PTQ to extreme 1-bit for the first time.