Rethinking the Reasonability of the Test Set for Simultaneous Machine Translation
Mengge Liu (Beijing Institute of Technology); Wen Zhang (Xiaomi AI Lab); Xiang Li (Xiaomi AI Lab); Jian Luan (Xiaomi AI Lab); Bin Wang (Xiaomi AI Lab); Yuhang Guo (Beijing Engineering Research Center of High Volume Language Information Processing and Cloud Computing Applications, Department of Computer Science and Technology, Beijing Institute of technology); Shuoying Chen (Beijing Institute of Technology)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Simultaneous machine translation (SimulMT) models start translation before the end of the source sentence, making the translation monotonically aligned with the source sentence. However, the general full-sentence translation test set is acquired by offline translation of the entire source sentence, which is not designed for SimulMT evaluation, making us rethink whether this will underestimate the performance of SimulMT models. In this paper, we manually annotate a monotonic test set based on the MuST-C English-Chinese test set. Our human evaluation confirms the acceptability of our annotated test set. Evaluations on three different SimulMT models verify that the underestimation problem can be alleviated on our test set. Further experiments show that finetuning on an automatically extracted monotonic training set improves SimulMT models by up to 3 BLEU points.