MULTI-MODAL EMOTION RECOGNITION WITH SELF-GUIDED MODALITY CALIBRATION
Mixiao Hou, Zheng Zhang, Guangming Lu
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:07:15
Multi-modal emotion recognition aims to extract sentiment-related information from multiple sources and integrate different modal representations for sentiment analysis. Alignment is an effective strategy to achieve semantically consistent representations for multi-modal emotion recognition, while the current alignment models are jointly unable to maintain the dependence of word-to-sentence and independence of unimodal learning. In this paper, we propose a Self-guided Modality Calibration Network (SMCN) to realize multi-modal alignment which can capture the global connections without interfering with unimodal learning. While preserving unimodal learning without interference, our model leverages semantic sentiment-related features to guide modality-specific representation learning. On one hand, SMCN simulates human thinking by deriving a branch for acquiring knowledge of other modalities in unimodal learning. This branch aims to learn high-level semantic information of other modalities for realizing semantic alignment between modalities. On the other hand, we also provide an indirect interaction manner to integrate unimodal feature and calibrate features in different levels for avoiding unimodal features mixed with other clubs. Experiments demonstrate that our approach outperforms the state-of-the-art methods on both IEMOCAP and MELD databases.