RECURSIVE JOINT ATTENTION FOR AUDIO-VISUAL FUSION IN REGRESSION BASED EMOTION RECOGNITION

Gnana Praveen Rajasekhar (Ecole Technologie Superieure); Eric Granger (ETS Montreal ); Patrick Cardinal (École de technologie supérieure)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

07 Jun 2023

In video-based emotion recognition (ER), it is important to effectively leverage the complementary relationship among audio (A) and visual (V) modalities, while retaining the intra-modal characteristics of individual modalities. In this paper, we present a recursive joint attention model that includes long short-term memory (LSTM) modules for fusion of vocal and facial expressions in regression-based ER. Specifically, we investigated the possibility of exploiting the complementary nature of A and V modalities using joint cross attention model in a recursive fashion and LSTMs to capture the intra-modal temporal dependencies within the same modalities as well as among the A-V feature representations. By integrating LSTMs with recursive joint cross attention, our model can efficiently leverage both intra- and inter-modal relationships for fusion of A and V modalities. The results of extensive experiments performed on the challenging Affwild2 and Fatigue (private) datasets indicate that the proposed A-V fusion models can significantly outperform state-of-the-art-methods.

Tags:

Image and video content analysis

RECURSIVE JOINT ATTENTION FOR AUDIO-VISUAL FUSION IN REGRESSION BASED EMOTION RECOGNITION

Gnana Praveen Rajasekhar (Ecole Technologie Superieure); Eric Granger (ETS Montreal ); Patrick Cardinal (École de technologie supérieure)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

PROGRESSIVE REFINEMENT LEARNING BASED ON FEATURE CROSS PERCEPTION FOR RESIDENTIAL AREAS SEMANTIC SEGMENTATION

IMAGE COMPLETION VIA DUAL-PATH COOPERATIVE FILTERING

OPT: One-shot Pose-Controllable Talking Head Generation

Join the IEEE Signal Processing Society