JOINT MODELLING OF SPOKEN LANGUAGE UNDERSTANDING TASKS WITH INTEGRATED DIALOG HISTORY

Siddhant Arora (Carnegie Mellon University); Hayato Futami (Sony Group Corporation); Emiru Tsunoo (Sony Group Corporation); Brian Yan (Carnegie Mellon University); Shinji Watanabe (Carnegie Mellon University)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

07 Jun 2023

Most human interactions occur in the form of spoken conversations where the semantic meaning of a given utterance depends on the context. Each utterance in spoken conversation can be represented by many semantic and speaker attributes, and there has been an interest in building Spoken Language Understanding (SLU) systems for automatically predicting these attributes. Recent work has shown that incorporating dialogue history can help advance SLU performance. However, separate models are used for each SLU task, leading to an increase in inference time and computation cost. Motivated by this, we aim to ask: can we jointly model all the SLU tasks while incorporating context to facilitate low-latency and lightweight inference? To answer this, we propose a novel model architecture that learns dialog context to jointly predict the intent, dialog act, speaker role, and emotion for the spoken utterance. Note that our joint prediction is based on an autoregressive model and we need to decide the prediction order of dialog attributes, which is not trivial. To mitigate the issue, we also propose an order agnostic training method. Our experiments show that our joint model achieves similar results to task-specific classifiers and can effectively integrate dialog context to further improve the SLU performance.

Tags:

Discourse and dialog

JOINT MODELLING OF SPOKEN LANGUAGE UNDERSTANDING TASKS WITH INTEGRATED DIALOG HISTORY

Siddhant Arora (Carnegie Mellon University); Hayato Futami (Sony Group Corporation); Emiru Tsunoo (Sony Group Corporation); Brian Yan (Carnegie Mellon University); Shinji Watanabe (Carnegie Mellon University)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

SPASHT: Semantic and PrAgmatic SpeecH Features for automatic assessment of autism

History, Present and Future: Enhancing Dialogue Generation with Few-shot History-Future Prompt

Think before you speak: Concept-guided Explicit Persona Reasoning for Personalized Dialogue Generation

Join the IEEE Signal Processing Society