Variational Bayesian Sparsification for Distillation Compression
Yue Ming, Hao Fu
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 07:10
Model compression is a critical technique for cumbersome models to reduce memory consumption and accelerate inference. Here, we propose a novel method, called Variational Bayesian Sparsification, for distilling large models into small and sparse models while maintaining accuracy. Different from prior work, our approach innovatively embeds Bayesian sparsification into distillation. The core contributions are in two folds. First, the minibatch re-weighting method is proposed to dynamically balance the hard and soft knowledge, which can largely boost distillation accuracy. Then, the Bayesian deep sparse method presents to leverage the group sparseness and element sparseness simutaneously to reduce the parameter redundancy of student networks. We validate our method in MNIST, CIFAR10 and CIFAR100 datasets. Our method achieves 98.86\% compression ratio with minor accuracy loss in MNIST. It is also evaluated through a compact model with only 1.99M weights on CIFAR10 and performed favorably against state-of-the-art compression methods on CIFAR100, to verify the algorithm's generality.