Capsule Transformer Network for Dynamic Hand Gesture Recognition using Multimodal Data

Alexandre Lebas, Rim Slama, Hazem Wannous

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Poster 10 Oct 2023

In recent years, deep learning techniques have achieved remarkable success in video analysis and more especially in action and gesture recognition. Even though convolutional neural networks (CNNs) remain the most widely used models, they have difficulty in capturing the global contextual information involving spatial and temporal domains or inter-modality due to the local feature learning mechanism. This paper introduces a Capsule Transformer Network, which composed of a frame capsule module for extracting hand features and a gesture transformer module for modeling the temporal features and recognizing the dynamic gesture. Spatial attention is ensured through the capsule module to enhance the spatial information of the hand image, while the transformer module guarantees temporal attention through gesture sequence. We propose to use multimodal data, including RGB, depth and IR data, which improves the accuracy of our approach as it better captures the 3D structure of the hand and can distinguish between similar hand gestures. Testing on two datasets, Briareo and SHREC17, the proposed approach outperforms or equals previous methods.

Tags:

Hand gesture recognition

capsule network

transformer

multi-modal data

Capsule Transformer Network for Dynamic Hand Gesture Recognition using Multimodal Data

Alexandre Lebas, Rim Slama, Hazem Wannous

More Like This

Devising Transformers as an Autoencoder for Unsupervised Multivariate Time Series Imputation

Slides: Devising Transformers as an Autoencoder for Unsupervised Multivariate Time Series Imputation

A MULTI-MODAL TRANSFORMER APPROACH FOR FOOTBALL EVENT CLASSIFICATION

Join the IEEE Signal Processing Society