Joint Modeling for ASR Correction and Dialog State Tracking

Deyuan Wang (Beijing University of Posts and Telecommunications); Tiantian Zhang (Beijing University of Posts and Telecommunications); Caixia Yuan (Beijing University of Posts and Telecommunications); Xiaojie Wang (Beijing University of Posts and Telecommunications)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

06 Jun 2023

In spoken dialog system, transcription errors in Automated Speech Recognition (ASR) impact downstream task, especially dialog state tracking (DST). Approaches to alleviate such errors involve using richer information such as word-lattices and word confusion networks. However, in some cases, this information may not be easily obtained. In addition, the large pre-trained language model is trained on plain text, leading to the gap between spoken DST and original pretrained model. In this paper, we propose a multi-task method which performs DST jointly with ASR correction to improve the performance of both tasks. To do so, we build a MultiWOZ-ASR dataset containing ASR noise in DST and mitigate the gap by utilizing a multi-task pre-training framework. Moreover, curriculum learning is adopted to alleviate the phenomenon that the correction task is difficult to converge at the initial stage of pre-training. Experimental results show that our model achieves significant improvements on DSTC2 and MultiWOZ-ASR dataset.

Tags:

Discourse and dialog

Joint Modeling for ASR Correction and Dialog State Tracking

Deyuan Wang (Beijing University of Posts and Telecommunications); Tiantian Zhang (Beijing University of Posts and Telecommunications); Caixia Yuan (Beijing University of Posts and Telecommunications); Xiaojie Wang (Beijing University of Posts and Telecommunications)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

SPASHT: Semantic and PrAgmatic SpeecH Features for automatic assessment of autism

Think before you speak: Concept-guided Explicit Persona Reasoning for Personalized Dialogue Generation

History, Present and Future: Enhancing Dialogue Generation with Few-shot History-Future Prompt

Join the IEEE Signal Processing Society