Improving BERT Fine-tuning via Stabilizing Cross-layer Mutual Information

Jicun Li (1. Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences (ICT/CAS) 2. University of Chinese Academy of Sciences, Beijing, China); Xingjian Li (1. Big Data Lab, Baidu Research; 2. State Key Lab of IOTSC, University of Macau); Tianyang Wang (University of Alabama at Birmingham); Shi Wang ( 1. Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences (ICT/CAS) 2. University of Chinese Academy of Sciences, Beijing, China); Yanan Cao (Institute of Information Engineering, Chinese Academy of Sciences); Cheng-Zhong Xu (University of Macau); Dejing Dou (Baidu)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

06 Jun 2023

Fine-tuning pre-trained language models, such as BERT, has shown enormous success among various NLP tasks. Though simple and effective, the process of fine-tuning has been found unstable, which often leads to unexpected poor performance. To increase stability and generalizability, most existing works resort to maintaining the parameters or representations of pre-trained models during fine-tuning. Nevertheless, very little work explores mining the reliable part of pre-learned information that can help to stabilize fine-tuning. To address this challenge, we introduce a novel solution in which we fine-tune BERT with stabilized cross-layer mutual information. Our method aims to preserve the reliable behaviors of cross-layer information propagation, instead of preserving the information itself, of the pre-trained model. Therefore, our method circumvents the domain conflicts between pre-trained and target tasks. We conduct extensive experiments with popular pre-trained BERT variants on NLP datasets, demonstrating the universal effectiveness and robustness of our method.

Tags:

Machine learning methods for language

Improving BERT Fine-tuning via Stabilizing Cross-layer Mutual Information

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

SELF SUPERVISED BERT FOR LEGAL TEXT CLASSIFICATION

Estimating Shapley Values of Training Utterances for Automatic Speech Recognition Models

Egocentric Action Anticipation for Personal Health

Join the IEEE Signal Processing Society