Boosting BERT Subnets with Neural Grafting

Ting Hu (Hasso Plattner Institute); Christoph Meinel (Hasso Plattner Institute); Haojin Yang (Hasso-Plattner-Institut für Digital Engineering gGmbH)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

06 Jun 2023

Pre-trained Language Models in Natural Language Processing have become increasingly computationally expensive and memory demanding. The recently proposed computation-adaptive BERT models facilitate their deployment in practical applications. Training such a BERT model involves jointly optimizing subnets of varying sizes, which is not easy due to their mutual interference with one another. The larger-size subnets in particular could deteriorate when there is a large performance gap between the smallest subnet and the supernet. In this work, we propose Neural grafting to boost BERT subnets, especially the larger ones. Specifically, we regard the less important sub-modules of a BERT model as less active and reactivate them via layer-wise Neural grafting. Experimental results show that the proposed method improves the average performance of BERT subnets on six datasets of GLUE benchmark. The subnet performing comparable to the supernet BERT-Base reduces around 67% and 70% inference latency on GPU and CPU, respectively. Moreover, we compare two Neural grafting strategies under varied experimental settings, hoping to shed light on the application scenarios of Neural grafting.

Tags:

Language understanding and computational semantics

Boosting BERT Subnets with Neural Grafting

Ting Hu (Hasso Plattner Institute); Christoph Meinel (Hasso Plattner Institute); Haojin Yang (Hasso-Plattner-Institut für Digital Engineering gGmbH)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Parameter Efficient Transfer Learning for Various Speech Processing Tasks

Revisit Out-of-vocabulary Problem for Slot Filling: A Unified Contrastive Framework with Multi-level Data Augmentations

ZEPHYR: ZERO-SHOT PUNCTUATION RESTORATION

Join the IEEE Signal Processing Society