MMATR: A lightweight approach for Multimodal Sentiment Analysis based on tensor methods
Panagiotis Koromilas (University of Athens); Mihalis A Nicolaou (The Cyprus Institute); Theodoros Giannakopoulos (NCSR Demokritos); Yannis Panagakis (University of Athens)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Despite the considerable research output on Multimodal Learning for Affect-related tasks, most of the current methods are very complex in terms of the number of trainable parameters, and thus do not constitute effective solutions for real-life applications. In this work we try to alleviate this gap in the literature by introducing the Multimodal Attention Tensor Regression (MMATR) network, a lightweight model that is based on: (i) a static input representation (2D matrix of dimensions time $\times$ features) for each modality, which helps to avoid high-parameterized sequential models by incorporating a CNN, (ii) the replacement of the usual pooling and flattening operations as well as the linear layers by tensor contraction and tensor regression layers that are able to reduce the number of parameters, while keeping the high-order structure of the multimodal data, and (iii) a bimodal attention layer that learns multimodal co-occurrences. By a set of experiments comparing with a variety of state-of-the-art techniques, we show that the proposed MMATR can achieve results competitive to the state-of-the-art in the task of Multimodal Sentiment Analysis, albeit having four orders of magnitude fewer parameters.