Human Pose Estimation from Ambiguous Pressure Recordings with Spatio-temporal Masked Transformers

Vandad Davoodnia (Queen's University); Ali Etemad (Queen's University)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

06 Jun 2023

Despite the impressive performance of vision-based pose estimators, they generally fail to perform well under adverse vision conditions and often don't satisfy the privacy demands of customers. As a result, researchers have begun to study tactile sensing systems as an alternative. However, these systems suffer from noisy and ambiguous recordings. To tackle this problem, we propose a novel solution for pose estimation from ambiguous pressure data. Our method comprises a spatio-temporal vision transformer with an encoder-decoder architecture. Detailed experiments on two popular public datasets reveal that our model outperforms existing solutions in the area. Moreover, we observe that increasing the number of temporal crops in the early stages of the network positively impacts the performance while pre-training the network in a self-supervised setting using a masked auto-encoder approach also further improves the results.

Tags:

Biomedical and biological image processing

Human Pose Estimation from Ambiguous Pressure Recordings with Spatio-temporal Masked Transformers

Vandad Davoodnia (Queen's University); Ali Etemad (Queen's University)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Towards Privacy and Utility in Tourette Tic Detection Through Pretraining Based on Publicly Available Video Data of Healthy Subjects

TransWnet: Integrating Transformers into CNNs via Row and Column Attention for Abdominal Multi-organ Segmentation

StackMaps: A Visualization Technique for Diabetic Retinopathy Grading

Join the IEEE Signal Processing Society