Skip to main content

Multimodal Emotion Recognition With Capsule Graph Convolutional Based Representation Fusion

Jiaxing Liu, Sen Chen, Longbiao Wang, Zhilei Liu, Yahui Fu, Lili Guo, Jianwu Dang

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 00:10:04
09 Jun 2021

Due to the more robust characteristics compared to unimodal, audio-video multimodal emotion recognition (MER) has attracted a lot of attention. The efficiency of representation fusion algorithm often determines the performance of MER. Although there are many fusion algorithms, information redundancy and information complementarity are usually ignored. In this paper, we propose a novel representation fusion method, Capsule Graph Convolutional Network (CapsGCN). Firstly, after unimodal representation learning, the extracted audio and video representations are distilled by capsule network and encapsulated into multimodal capsules respectively. Multimodal capsules can effectively reduce data redundancy by the dynamic routing algorithm. Secondly, the multimodal capsules with their inter-relations and intra-relations are treated as a graph structure. The graph structure is learned by Graph Convolutional Network (GCN) to get hidden representation which is a good supplement for information complementarity. Finally, the multimodal capsules and hidden relational representation learned by CapsGCN are fed to multihead self-attention to balance the contributions of source representation and relational representation. To verify the performance, visualization of representation, the results of commonly used fusion methods, and ablation studies of the proposed CapsGCN are provided. Our proposed fusion method achieves 80.83% accuracy and 80.23% F1 score on eNTERFACE05’.

Chairs:
Tanaya Guha

Value-Added Bundle(s) Including this Product

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00