-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 07:17
In this paper we propose a unsupervised and unified approach to simultaneously recover time-varying 3D shape, camera motion, and temporal clustering into deformations, all of them, from partial 2D point tracks in a RGB video and without assuming any pre-trained model. As the data are drawn from a sequentially ordered images, we fully exploit this information to constrain all model parameters we estimate. We present an energy-based formulation that is efficiently solved and allows to estimate all model parameters in the same loop via augmented Lagrange multipliers in polynomial time, enforcing similarities between images at any level. Validation is done in a wide variety of human video sequences, including articulated and continuous motion, and for dense and missing tracks. Our approach is shown to outperform state-of-the-art solutions in terms of 3D reconstruction and clustering.