Heuristic Masking for Text Representation Pretraining
Yimeng Zhuang (Samsung Research China - Beijing (SRC-B))
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Masked language model pretraining provides a standardized way to learn contextualized semantic representations, which reconstructs corrupted text sequences by estimating the conditional probabilities of randomly masked tokens given the context. We attempt to exploit language knowledge from the model itself to boost its pretraining in a lightweight and on-the-fly fashion. In this paper, a heuristic token masking scheme is studied, in which those tokens that deep networks and shallow networks have inconsistent predictions for are more likely to be masked. The proposed method can be applied to BERT-like architectures, and its training approach is consistent with BERT, which guarantees training effects and efficiency. Extensive experiments show that the masked language model pretrained with the heuristic masking scheme consistently outperforms previous schemes in various downstream tasks.