Optimal Fractional Fourier Filtering For Graph Signals
Cuneyd Ozturk, Haldun M. Ozaktas, Sinan Gezici, Aykut Koc
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:12:12
Temporal analysis of workout routines comes with a major difficulty: the observed actions or processes are naturally sequences of variable length. Besides being highly dependent on the exercise itself, the faster or slower repetition of the pattern's temporal dynamics is also determined by the individual's performance. in this paper, we present a Transformer-based Deep Neural Network to perform classification over 19 phases of 5 exercises, common to CrossFit routines. From this baseline, we aim to perform workout video segmentation, while creating a descriptor that enables repetition counting or feedback on the exercise execution. More specifically, a 2D Human Pose Network creates heatmap-based features that capture the human body pose, which are fed into the Transformer Encoder with 4 attention heads. A final Multilayer Perceptron with 3 Dense layers performs the phase classification task. To this end, we have trained our model using a previously acquired dataset that is naturally imbalanced, e.g. 6 classes have less than 8k samples and 9 classes have more than 24k. Finally, the obtained results show that we are able to divide videos in a temporally consistent manner, outperforming a state-of-the-art model that counts repetitive actions, specifically for 4 out of the 5 exercises.