MULTI-MODAL EMOTION RECOGNITION WITH SELF-GUIDED MODALITY CALIBRATION

Mixiao Hou, Zheng Zhang, Guangming Lu

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:07:15

09 May 2022

Multi-modal emotion recognition aims to extract sentiment-related information from multiple sources and integrate different modal representations for sentiment analysis. Alignment is an effective strategy to achieve semantically consistent representations for multi-modal emotion recognition, while the current alignment models are jointly unable to maintain the dependence of word-to-sentence and independence of unimodal learning. In this paper, we propose a Self-guided Modality Calibration Network (SMCN) to realize multi-modal alignment which can capture the global connections without interfering with unimodal learning. While preserving unimodal learning without interference, our model leverages semantic sentiment-related features to guide modality-specific representation learning. On one hand, SMCN simulates human thinking by deriving a branch for acquiring knowledge of other modalities in unimodal learning. This branch aims to learn high-level semantic information of other modalities for realizing semantic alignment between modalities. On the other hand, we also provide an indirect interaction manner to integrate unimodal feature and calibrate features in different levels for avoiding unimodal features mixed with other clubs. Experiments demonstrate that our approach outperforms the state-of-the-art methods on both IEMOCAP and MELD databases.

Tags:

feature fusion

multi-modal emotion recognition

feature calibration

indirect interaction

MULTI-MODAL EMOTION RECOGNITION WITH SELF-GUIDED MODALITY CALIBRATION

Mixiao Hou, Zheng Zhang, Guangming Lu

Value-Added Bundle(s) Including this Product

ICASSP 2022, May 2022 Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Learn more: Sub-significant area learning for fine-grained visual classification

MULTI-VIEW VARIATIONAL RECURRENT NEURAL NETWORK FOR HUMAN EMOTION RECOGNITION USING MULTI-MODAL BIOLOGICAL SIGNALS

LEVERAGING EFFICIENT TRAINING AND FEATURE FUSION IN TRANSFORMERS FOR MULTIMODAL CLASSIFICATION

Join the IEEE Signal Processing Society