Skip to main content

Variational Student: Learning Compact And Sparser Networks In Knowledge Distillation Framework

Srinidhi Hegde, Ramya Hebbalaguppe, Ranjitha Prasad, Vishwajeet Kumar

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 14:21
04 May 2020

The holy grail in deep neural network research is porting the memory- and computation-intensive network models on embedded platforms with a minimal compromise in model accuracy. To this end, we propose Variational Student where we reap the benefits of compressibility of the knowledge distillation framework, and sparsity inducing abilities of variational inference (VI) techniques. Essentially, we build an accurate and sparse student network, whose sparsity is induced by the variational parameters found via optimizing a loss function based on VI, leveraging the knowledge learnt by an accurate but complex pre-trained teacher network. Further, for sparsity enhancement, we also employ a Block Sparse Regularizer on a concatenated tensor of teacher and student network weights. We benchmark our results on MLP and CNN variants and illustrate an improved performance in lowering the memory footprint up to ~213× without a need to retrain the teacher network.

Value-Added Bundle(s) Including this Product

More Like This

  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00