Self-Ensemble Distillation Using Mean Teachers With Long & Short Memory
Nilanjan Chattopadhyay, Geetank Raipuria, Nitin Singhal
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:04:07
Ensemble of deep learning models is widely used to increase performance, however, doing so requires training and deploying several models. This can be mitigated by distilling the knowledge of several models into a single network. Yet, the cost of training numerous models remains. We propose a new consistency regularisation based methodology that eliminates the requirement of training several teacher networks, thus lowering training costs. We efficiently generate several teacher networks by taking exponential moving averages of student network parameters with varying decay rates that provide long and short memory from training routine. Random augmentation is applied individually to each teacher input, and a consistency loss is obtained between teacher & student output to improve model generalisation. We test our proposed method of self-ensembling distillation on two segmentation datasets - The MICCAI 2019 Challenge dataset & the Kaggle Prostate cANcer graDe Assessment (PANDA) Challenge dataset, and show significant gain in performance over baseline well as ensemble knowledge distillation.