Learning Asr-Robust Contextualized Embeddings For Spoken Language Understanding
Chao-Wei Huang, Yun-Nung Chen
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 10:21
Employing pre-trained language models (LM) to extract contextualized word representations has achieved state-of-the-art performance on various NLP tasks. However, applying this technique to noisy transcripts generated by automatic speech recognizer (ASR) is concerned. Therefore, this paper focuses on making contextualized representations more ASR-robust. We propose a novel confusion-aware fine-tuning method to mitigate the impact of ASR errors on pre-trained LMs. Specifically, we fine-tune LMs to produce similar representations for acoustically confusable words that are obtained from word confusion networks (WCNs) produced by ASR. Experiments on multiple benchmark datasets show that the proposed method significantly improves the performance of spoken language understanding when performing on ASR transcripts.