A Transformer-Based E2E SLU model for Improved Semantic Parsing
Othman Istaiteh (Samsung Research Jordan); Yasmeen Kussad (Samsung Research Jordan); Yahya Daqour (Samsung Research Jordan); Maria Habib (Samsung); Mohammad Habash (Samsung Research Jordan); Dhananjaya Gowda (Samsung Electronics)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Spoken Language Understanding (SLU) is an essential part of voice and speech assistant tools. End-to-End (E2E) SLU models attempt to automatically extract semantic meanings from the speech signal without the need for an intermediate transcription of speech. However, SLU is a challenging task mainly due to the lack of labeled, in-domain, and multilingual datasets. The Spoken Task-Oriented Semantic Parsing (STOP) dataset tries to address this problem and is the most extensive public dataset for the SLU task. This paper demonstrates our contribution to the Spoken Language Understanding Grand Challenge at ICASSP 2023. The fundamental idea of the proposed model is to utilize the pre-trained HuBERT model as an encoder alongside a transformer decoder with layer-drop and ensemble learning. The combination of HuBERT large encoder and a base transformer decoder obtained the best results, with an Exact Match (EM) accuracy of 75.05% on the STOP dataset. Ensemble decoding improved the accuracy to 75.92%.