Exploring Attention Mechanisms for Multimodal Emotion Recognition in an Emergency Call Center Corpus

Theo Deschamps-Berger (Paris-Saclay University, CNRS); Lori Lamel (CNRS LIMSI); Laurence Y. Devillers (LISN-CNRS)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

07 Jun 2023

The emotion detection technology to enhance human decision-making is an important research issue for real-world applications, but real-life emotion datasets are relatively rare and small. The experiments conducted in this paper use the CEMO, which was collected in a French emergency call center. Two pre-trained models based on speech and text were fine-tuned for speech emotion recognition. Using pre-trained Transformer encoders mitigates our data's limited and sparse nature. This paper explores the different fusion strategies of these modality-specific models. In particular, fusions with and without cross-attention mechanisms were tested to gather the most relevant information from both the speech and text encoders. We show that multimodal fusion brings an absolute gain of 4-9% with respect to either single modality and that the Symmetric multi-headed cross-attention mechanism performed better than late classical fusion approaches. Our experiments also suggest that for the real-life CEMO corpus, the audio component encodes more emotive information than the textual one.

Tags:

Multimodal processing of language

Exploring Attention Mechanisms for Multimodal Emotion Recognition in an Emergency Call Center Corpus

Theo Deschamps-Berger (Paris-Saclay University, CNRS); Lori Lamel (CNRS LIMSI); Laurence Y. Devillers (LISN-CNRS)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Effectiveness of Text, Acoustic, and Lattice-based representations in Spoken Language Understanding tasks

Exploring complementary features in multi-modal speech emotion recognition

USING EMOTION EMBEDDINGS TO TRANSFER KNOWLEDGE BETWEEN EMOTIONS, LANGUAGES, AND ANNOTATION FORMATS

Join the IEEE Signal Processing Society