Enhancing Data-Free Adversarial Distillation With Activation Regularization And Virtual Interpolation

Xiaoyang Qu, Jianzong Wang, Jing Xiao

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:11:20

09 Jun 2021

Knowledge distillation refers to a technique of transferring the knowledge from a large learned model or an ensemble of learned models to a small model. This method relies on access to the original training set, which might not always be available. A possible solution is a data-free adversarial distillation framework, which deploys a generative network to transfer the teacher model's knowledge to the student model. However, the data generation efficiency is low in the data-free adversarial distillation. We add an activation regularizer and a virtual interpolation method to improve the data generation efficiency. The activation regularizer enables the students to match the teacher's predictions close to activation boundaries and decision boundaries. The virtual interpolation method can generate virtual samples and labels in-between decision boundaries. Our experiments show that our approach surpasses state-of-the-art data-free distillation methods. The student model can achieve 95.42% accuracy on CIFAR-10 and 77.05% accuracy on CIFAR-100 without any original training data. Our model's accuracy is 13.8% higher than the state-of-the-art data-free method on CIFAR-100.

Chairs:

Danilo Comminiello

Tags:

signal processing society

IEEE icassp 2021

virtual conference

2021

sps

virtual conference icassp 2021

june 6-11 2021

icassp 2021