Differentiable Dynamic Channel Association For Knowledge Distillation

Qiankun Tang, Xiaogang Xu, Jun Wang

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:05:52

21 Sep 2021

Knowledge distillation is an effective model compression technology, which encourages a small student model to mimic the features or probabilistic outputs of a large teacher model. Existing feature-based distillation methods mainly focus on formulating enriched representations, while naively address the channel dimension gap and adopt the handcrafted channel association strategy between teacher and student for distillation. This not only introduces more parameters and computational cost, but may transfer irrelevant information to student. In this paper, we present a differentiable and efficient Dynamic Channel Association (DCA) mechanism, which automatically associates proper teacher channels for each student channel. DCA also enables each student channel to distill knowledge from multiple teacher channels in a weighted manner. Extensive experiments on classification task, with various combinations of network architectures for teacher and student models, well demonstrate the effectiveness of our proposed approach.

Tags:

signal processing society

IEEE icip 2021

september 19-22

virtual conference

2021

sps

virtual conference icip 2021

icip 2021

Differentiable Dynamic Channel Association For Knowledge Distillation

Qiankun Tang, Xiaogang Xu, Jun Wang

Value-Added Bundle(s) Including this Product

ICIP 2021 Virtual Conference - Presentation Videos Product Bundle

More Like This

Welcome and Opening Remarks for the IEEE SustainTech Leadership Forum

Panel: Building Sustainable Cities for Tomorrow

Panel: Unleashing the Potential of Virtual Power Plants for Sustainable Energy Solutions

Join the IEEE Signal Processing Society