Prompt-Distiller: Few-shot Knowledge Distillation for Prompt-based Language Learners with Dual Contrastive Learning
Boyu Hou (Chongqing University); Chengyu Wang (Alibaba); Xiaoqing Chen (Chongqing University); Minghui Qiu (Alibaba); Liang Feng (Chongqing University, China); Jun Huang (Alibaba Group)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Prompt-based learning has improved the few-shot learning performance of large-scale Pre-trained Language Models (PLMs). Yet, it is challenging to deploy large-scale PLMs in resource-constrained environments for online applications. Knowledge Distillation (KD) is a promising approach for PLM compression. However, distilling prompt-tuned PLMs in the few-shot learning setting is a non-trivial problem due to the lack of task-specific training data and KD techniques for the new prompting paradigm. We propose Prompt-Distiller, the first few-shot KD algorithm for prompt-tuned PLMs, which forces the student model to learn from both its pre-trained and prompt-tuned teacher models to alleviate the model overfitting problem. We further design a contrastive learning technique to learn higher-order dependencies from intermediate-layer representations of teacher models, considering different knowledge capacities of teacher and student models.