MODELING TURN-TAKING IN HUMAN-TO-HUMAN SPOKEN DIALOGUE DATASETS USING SELF-SUPERVISED FEATURES

Edmilson da Silva Morais (IBM Research Brazil); Matheus Damasceno (IBM Research); Hagai Aronowitz (IBM Research - AI); Aharon Satt (IBM Research ); Ron Hoory (IBM Research)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

07 Jun 2023

Self-supervised pre-trained models have consistently delivered state-of-art results in the fields of natural language and speech processing. However, we argue that their merits for modeling Turn-Taking for spoken dialogue systems still need further investigation. Due to that, in this paper we introduce a modular End-to-End system based on an Upstream + Downstream architecture paradigm, which allows easy use/integration of a large variety of self-supervised features to model the specific Turn-Taking task of End-of-Turn Detection (EOTD). Several architectures to model the EOTD task using audio-only, text-only and audio+text modalities are presented, and their performance and robustness are carefully evaluated for three different human-to-human spoken dialogue datasets. The proposed model not only achieves SOTA results for EOTD, but also brings light to the possibility of powerful and well fine-tuned self-supervised models to be successfully used for a wide variety Turn-Taking tasks.

Tags:

Machine learning methods for language

MODELING TURN-TAKING IN HUMAN-TO-HUMAN SPOKEN DIALOGUE DATASETS USING SELF-SUPERVISED FEATURES

Edmilson da Silva Morais (IBM Research Brazil); Matheus Damasceno (IBM Research); Hagai Aronowitz (IBM Research - AI); Aharon Satt (IBM Research ); Ron Hoory (IBM Research)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

SELF SUPERVISED BERT FOR LEGAL TEXT CLASSIFICATION

Estimating Shapley Values of Training Utterances for Automatic Speech Recognition Models

Egocentric Action Anticipation for Personal Health

Join the IEEE Signal Processing Society