Skip to main content

ATTENTION PROBE: VISION TRANSFORMER DISTILLATION IN THE WILD

Jiahao Wang, Mingdeng Cao, Shuwei Shi, Yujiu Yang, Baoyuan Wu

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 00:08:24
10 May 2022

Vision transformers require intensive computational resources to achieve high performance, which usually makes them not suitable for mobile devices. Compressing transformers usually need the original training data, but sometimes the requirement cannot be satisfied due to privacy limitations or transmission restrictions. Utilizing the massive unlabeled data in the wild with training data unavailable is a new paradigm for compressing convolutional neural networks (CNNs). However, for vision transformers, distilling portable student in the wild data remains an open issue since their structure and basic computation paradigm are completely different. In this paper, we first propose a novel "Attention Probe" method, which serves as an effective tool in selecting valuable data from the wild. Then, a probe knowledge distillation algorithm is proposed for improving the performance of the student transformer. Besides maximizing the similarity between the output of teacher and student networks, our method learns student transformers by inheriting intermediate feature information from the given teacher model. Experimental results on several benchmarks demonstrate that the svelte transformer obtained by the proposed method can achieve comparable performance with the baseline that requires the entire original training data.

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00