Towards A Unified Training for Levenshtein Transformer
Kangjie Zheng (Peking University); Longyue Wang (Tencent AI Lab); Zhihao Wang (Xiamen University); Chen Binqi (Peking University); Ming Zhang (Peking University); Zhaopeng Tu (Tencent AI Lab)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Levenshtein Transformer (LevT) is a widely-used text-editing model, which generates a sequence based on editing operations (deletion and insertion) in a non-autoregressive manner. However, it is challenging to train the key refinement components of LevT due to training-inference discrepancy. By carefully designing experiments, our work reveals that the deletion module is under-trained while the insertion module is over-trained due to the imbalance training signals for the two refinement modules. Based on these observations, we further propose a dual learning approach that can remedy the imbalance training by feeding an initial input to both refinement modules, which is consistent with the process in inference. Experimental results on three representative NLP tasks demonstrate the effectiveness and universality of the proposed approach.