Hybrid Fusion Based Approach For Multimodal Emotion Recognition With Insufficient Labeled Data
Puneet Kumar, Vedanti Khokher, Yukti Gupta, Balasubramanian Raman
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:10:16
In this paper, a deep learning based fusion approach has been proposed to classify the emotions portrayed by image and corresponding text into discrete emotion classes. The proposed method first implements intermediate fusion on image and text inputs and then applies late fusion on image, text, and intermediate fusion's output. We have also come up with a way to handle the unavailability of labeled multimodal emotional data. We have prepared a new dataset built on Balanced Twitter for Sentiment Analysis dataset (B-T4SA) dataset containing an image, text, and emotion labels, i.e., 'happy,' 'sad,' 'hate' and 'anger.' The emotion recognition accuracy of 90.20% has been achieved by the proposed method. Along with multi-class emotion recognition, we've also compared the sentiment classification results and found the proposed method to perform better than the benchmark approaches.