LOCAL CONTEXT AND DIMENSIONAL RELATION AWARE TRANSFORMER NETWORK FOR CONTINUOUS AFFECT ESTIMATION

Shuo Yang, Yongtang Bao, Yue Qi

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Lecture 10 Oct 2023

In recent years, video-based continuous affect estimation has received more attention in computer vision. Therefore, how to robustly and accurately model the temporal information during facial expression change is crucial. Hence, we propose a transformer network that incorporates both local context and dimensional correlation to model visual information in an efficient manner. Specifically, noise, such as instantaneous head poses and lighting changes, may affect the model's performance due to the local context insensitivity of the transformer's self-attention layer. Therefore, a local-wise transformer encoder is adopted to enhance the transformer's ability to capture local contextual information. In addition, considering the prior knowledge of the correlation between valence and arousal,we design the va-relevance bootstrap module and the corresponding valence-arousal relevance loss (va loss). Experiments on Aff-Wild2 and AFEW-VA datasets show the superior performance of our method for continuous affect estimation.

Tags:

continuous affect estimation

transformer

local context

valence-arousal relevance

LOCAL CONTEXT AND DIMENSIONAL RELATION AWARE TRANSFORMER NETWORK FOR CONTINUOUS AFFECT ESTIMATION

Shuo Yang, Yongtang Bao, Yue Qi

More Like This

Slides: Devising Transformers as an Autoencoder for Unsupervised Multivariate Time Series Imputation

Devising Transformers as an Autoencoder for Unsupervised Multivariate Time Series Imputation

OPTIMIZING TRANSFORMER FOR LARGE-HOLE IMAGE INPAINTING

Join the IEEE Signal Processing Society