Adaptive Mask Co-optimization for Modal Dependence in Multimodal Learning

Ying Zhou (Xidian University); Xuefeng Liang (Xidian University); ShiQuan Zheng (Xidian University); Huijun Xuan (Xidian University); Takatsune Kumada (Kyoto University)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

06 Jun 2023

Multimodal learning has demonstrated a great advantage in emotion recognition tasks due to the richer information from different modalities. However, multimodal models may incline to rely on some modalities that are easier to be learned, while under-fit the other modalities and lead to sub-optimal results. To address this problem, we propose a novel plug-in module, Adaptive Mask Co-optimization (AMCo), which could be inserted into advanced models. The adaptive mask can encourage the model to fit other modalities better by making dependent modalities harder to be learned. The co-optimization can preserve the performance of models on dependent modalities without degradation. The extensive experiments on the IEMOCAP dataset show AMCo can improve four SOTA models by 1.14% ~ 3.03% in terms of accuracy.

Tags:

Multi-modal signal processing and analysis (audio/visual/haptics/radar/lidar etc.)

Adaptive Mask Co-optimization for Modal Dependence in Multimodal Learning

Ying Zhou (Xidian University); Xuefeng Liang (Xidian University); ShiQuan Zheng (Xidian University); Huijun Xuan (Xidian University); Takatsune Kumada (Kyoto University)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Improving Few-Shot Learning for Talking Face System with TTS Data Augmentation

Adaptive CSI Feedback with Hidden Semantic Information Transfer

The Multimodal Information Based Speech Processing (MISP) 2022 Challenge: Audio-Visual Diarization and Recognition

Join the IEEE Signal Processing Society