Monocular 3D Human Pose Estimation Based on Global Temporal-Attentive and Joints-Attention in Video

ruhan He (Wuhan Textile University); shanshan xiang (Wuhan Textile University); Tao Peng (Wuhan Textile University); Yongsheng Yu (武汉理工大学)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

07 Jun 2023

Learning to capture human motion is essential to 3D human pose and shape estimation from monocular video, which is widely used in many 3D applications. However, the existing methods mainly rely on recurrent or convolutional operation to model such temporal information, which limits the ability to capture non-local contextual relations of human motion and ignores human joint hierarchies. To address this problem, we propose a Global Temporal-Attentive and Joints-Attention network (GTAJA-Net). This method introduces a Global Attention Feature Integration (GAFI) module and a Motion Tree Fusion Decoder (MTFD) module on the basis of a temporally consistent mesh recovery system (TCMR). A GAFI consisting of a collection of temporal features obtains final temporal features carrying spatial information that enhances temporal correlation and refine the features of the current frame. Meanwhile, MTFD aims at modeling the joint level attention. MTFD considers pose estimation as a top-down hierarchical process similar to SMPL kinematic tree. Though conceptually simple, our GTAJA-Net outperforms the state-of-the-art methods on the 3DPW, MPI-INF-3DHP, and Human3.6M benchmark datasets. Our code is available at https://github.com/xiangcece/GTAJA-Net.

Tags:

Image and video synthesis, rendering, and visualization

Monocular 3D Human Pose Estimation Based on Global Temporal-Attentive and Joints-Attention in Video

ruhan He (Wuhan Textile University); shanshan xiang (Wuhan Textile University); Tao Peng (Wuhan Textile University); Yongsheng Yu (武汉理工大学)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

SVMV: SPATIOTEMPORAL VARIANCE-SUPERVISED MOTION VOLUME FOR VIDEO FRAME INTERPOLATION

Flow-Guided Deformable Alignment Network with Self-Supervision for Video Inpainting

ACTIVE PERCEPTION SYSTEM FOR ENHANCED VISUAL SIGNAL RECOVERY USING DEEP REINFORCEMENT LEARNING

Join the IEEE Signal Processing Society