Joint Modeling for ASR Correction and Dialog State Tracking
Deyuan Wang (Beijing University of Posts and Telecommunications); Tiantian Zhang (Beijing University of Posts and Telecommunications); Caixia Yuan (Beijing University of Posts and Telecommunications); Xiaojie Wang (Beijing University of Posts and Telecommunications)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
In spoken dialog system, transcription errors in Automated Speech Recognition (ASR) impact downstream task, especially dialog state tracking (DST). Approaches to alleviate such errors involve using richer information such as word-lattices and word confusion networks. However, in some cases, this information may not be easily obtained. In addition, the large pre-trained language model is trained on plain text, leading to the gap between spoken DST and original pretrained model. In this paper, we propose a multi-task method which performs DST jointly with ASR correction to improve the performance of both tasks. To do so, we build a MultiWOZ-ASR dataset containing ASR noise in DST and mitigate the gap by utilizing a multi-task pre-training framework. Moreover, curriculum learning is adopted to alleviate the phenomenon that the correction task is difficult to converge at the initial stage of pre-training. Experimental results show that our model achieves significant improvements on DSTC2 and MultiWOZ-ASR dataset.