Input-dependent Dynamical Channel Association for Knowledge Distillation
Qiankun Tang (Zhejiang Lab); Yuan Zhang (China Telecom); Xiaogang Xu (Zhejiang Gongshang University); Jun Wang (Zhejiang Lab); Yimin Guo (China Telecom Research Institute)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Feature-map based knowledge distillation has exhibited its significance in improving the performance of student model. Existing works mainly focus on the formulation of knowledge, but ignore the number difference of channels due to heterogeneous architectures of teacher-student pair. They generally adopt handcrafted matching or input-independent association matrix, which would lead to the semantic mismatch, thus suboptimal performance. To resolve this problem, we present an input-dependent channel association module. This module automatically generates an allocation matrix in a cross-attention manner, which enables each student channel to be dynamically connected to its semantic-related teacher channel based on its learning state. An alternative training scheme is applied for stable optimization. Extensive experiments on image classification with a variety of settings based on the popular architectures well demonstrate the effectiveness of our proposed strategy compared to prior works.