SADE: A Self-adaptive Expert for Multi-dataset Question Answering
Yixing Peng (State Key Laboratory of Communication Content Cognition, University of Science and Technology of China); Quan Wang (Beijing University of Posts and Telecommunications); Zhendong Mao (University of Science and Technology of China); Yongdong Zhang (University of Science and Technology of China)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Multi-dataset question answering (QA) aims to combine multiple QA datasets to build models that not only perform well on training distributions, but also transfer and generalize well to new distributions. Some prior work considered building a collection of dataset-specific experts upon a shared Transformer, so as to simultaneously encode both regularities across datasets and specificities of each dataset. This approach, however, has its limitations when generalized to an unseen new distribution, and the number of extra parameters will increase with the number of training datasets. In this paper, we devise Self-ADaptive Expert (SADE), the key idea of which is to train a single expert that can be automatically adapted to each individual instance according to its gradients. This gradient-based, instance-level modulation scheme makes our approach easily adaptable to any instance from unseen new distributions, and keeps the number of extra parameters as a constant. We further design a contrastive learning mechanism to enhance the discriminability of modulation signals across different datasets. Experimental results on twelve QA datasets demonstrate that SADE consistently outperforms previous state-of-the-art in all the three settings including in-domain learning, few-shot transfer learning, and zero-shot generalization.