Multi-View Learning for Speech Emotion Recognition With Categorical Emotion, Categorical Sentiment, and Dimensional Scores

Daniel Tompkins (Microsoft); Dimitra Emmanouilidou (Microsoft Research); Soham Deshmukh (Microsoft); Benjamin Elizalde (Microsoft)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

07 Jun 2023

Psychological research has postulated that emotions and sentiment are correlated to dimensional scores of valence, arousal, and dominance. However, the literature of Speech Emotion Recognition focuses on independently predicting the three of them for a given speech audio. In this paper, we evaluate and quantify the predictive power of the dimensional scores towards categorical emotions and sentiment for two publicly available speech emotion datasets. We utilize the three emotional views in a joined multi-view training framework. The views comprise the dimensional scores, emotions categories, and sentiment categories. We present a comparison for each emotional view or combination of, utilizing two general-purpose models for speech-related applications: CNN14 and wav2vec. To our knowledge this is the first time such a joint framework is explored. We found that a joined multi-view training framework can produce results as strong or stronger than models trained independently for each view.

Tags:

Speech production, perception and psychoacoustics

Multi-View Learning for Speech Emotion Recognition With Categorical Emotion, Categorical Sentiment, and Dimensional Scores

Daniel Tompkins (Microsoft); Dimitra Emmanouilidou (Microsoft Research); Soham Deshmukh (Microsoft); Benjamin Elizalde (Microsoft)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Summary on the Multimodal Information based Speech Processing (MISP) 2022 Challenge

Auditory EEG Decoding Challenge

Spoken Language Understanding Grand Challenge

Join the IEEE Signal Processing Society