MEmoBERT: Pre-training Model with Prompt-based Learning for Multimodal Emotion Recognition

Jinming Zhao, Ruichen Li, Qin Jin, Xinchao Wang, Haizhou Li

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:08:10

09 May 2022

Multimodal emotion recognition study is hindered by the lack of labelled corpora in terms of scale and diversity, due to the high annotation cost and label ambiguity. In this paper, we propose a multimodal pre-training model MEmoBERT for multimodal emotion recognition, which learns multimodal joint representations through self-supervised learning from a self-collected large-scale unlabeled video data that come in sheer volume. Furthermore, unlike the conventional ``pre-train, finetune'' paradigm, we propose a prompt-based method that reformulates the downstream emotion classification task as a masked text prediction one, bringing the downstream task closer to the pre-training. Extensive experiments on two benchmark datasets, IEMOCAP and MSP-IMPROV, show that our proposed MEmoBERT significantly enhances emotion recognition performance.

Tags:

pre-training

prompt

multimodal

emotion recognition

MEmoBERT: Pre-training Model with Prompt-based Learning for Multimodal Emotion Recognition

Jinming Zhao, Ruichen Li, Qin Jin, Xinchao Wang, Haizhou Li

Value-Added Bundle(s) Including this Product

ICASSP 2022, May 2022 Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

MODALITY-AWARE OOD SUPPRESSION USING FEATURE DISCREPANCY FOR MULTI-MODAL EMOTION RECOGNITION

LEVERAGING EFFICIENT TRAINING AND FEATURE FUSION IN TRANSFORMERS FOR MULTIMODAL CLASSIFICATION

PRE-TRAINING WITH FRACTAL IMAGES FACILITATES LEARNED IMAGE QUALITY ESTIMATION

Join the IEEE Signal Processing Society