Skip to main content

Self-Ensemble Distillation Using Mean Teachers With Long & Short Memory

Nilanjan Chattopadhyay, Geetank Raipuria, Nitin Singhal

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 00:04:07
28 Mar 2022

Ensemble of deep learning models is widely used to increase performance, however, doing so requires training and deploying several models. This can be mitigated by distilling the knowledge of several models into a single network. Yet, the cost of training numerous models remains. We propose a new consistency regularisation based methodology that eliminates the requirement of training several teacher networks, thus lowering training costs. We efficiently generate several teacher networks by taking exponential moving averages of student network parameters with varying decay rates that provide long and short memory from training routine. Random augmentation is applied individually to each teacher input, and a consistency loss is obtained between teacher & student output to improve model generalisation. We test our proposed method of self-ensembling distillation on two segmentation datasets - The MICCAI 2019 Challenge dataset & the Kaggle Prostate cANcer graDe Assessment (PANDA) Challenge dataset, and show significant gain in performance over baseline well as ensemble knowledge distillation.

Value-Added Bundle(s) Including this Product

More Like This

  • SPS
    Members: Free
    IEEE Members: $25.00
    Non-members: $40.00
  • SPS
    Members: Free
    IEEE Members: $25.00
    Non-members: $40.00
  • SPS
    Members: Free
    IEEE Members: $25.00
    Non-members: $40.00