{NASE: A Chinese Benchmark for Evaluating Robustness of Spoken Language Understanding Models in Slot Filling
Meizheng Peng (Wuhan University); Xu Jia (Wuhan University); Min Peng (Wuhan University)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Slot filling is a major problem in spoken language understanding (SLU) task. However, the current SLU models may experience performance degradation when encountering unfamiliar data in different datasets. Meanwhile, as recent models are becoming more complex, retraining the model in a new application scenario would be too expensive. So we think it is important to study the robustness and generalization capability of SLU models. Then we propose the Natural Adversarial Slot Evaluator (NASE), a benchmark with adversarial SLU data to evaluate the robustness and generalization capability of SLU models on the task of slot filling. Our experiments and analysis reveal that all of the six SLU models have a significant performance degradation on NASE. The further analysis points out that the models rely more on the context of the slots than the slot values themselves to make predictions. In addition, the widespread use of joint learning strategy makes unfamiliar intents also affect the slot filling. Based on our findings, we also propose a simple data augmentation method to improve the robustness of SLU models in slot filling. The F1 Scores improve up to about 30\% compared to the original model.