MAKD:MULTIPLE AUXILIARY KNOWLEDGE DISTILLATION
Zehan Chen, Xuan Jin, Yuan He, Hui Xue
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:09:05
Knowledge distillation aims to learn a small student model by leveraging knowledge from a larger teacher model. The gap between these heterogeneous models hinder their knowledge transfer and it would be more challenging when the teacher model is from another task. We notice that the teacher model has defect in extracting features of another task samples. To improve knowledge distillation under such situation, we propose Multiple Auxiliary Subspaces(MAS). Most previous methods improve distillation performance by representation alignment, while we resort to the promotion of the teacher model which is more suitable for cross-task distillation. The MAS distills the knowledge in a mutual learning way based on an auxiliary network. Along with the training procedure, the teacher model is improved by the auxiliary network which works as a trainable part of the teacher model and learn the features distribution of target samples from the student model. And this promotion of the teacher model will benefit the student model via the following distillation procedure. We adopt the representation alignment technique, multiple auxiliary networks to further enhance the proposed method. The MAS works well with limited or sufficient labeled target data.