Dual Temporal Transformers for Fine-Grained Dangerous Action Recognition
Wenfeng Song, Xingliang Jin, Yang Ding, Yang Gao, Xia Hou
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Recognizing dangerous actions is a critical task in computer vision, especially for surveillance applications. While existing deep learning methods have been successful in confined environments, they struggle with the anomalous and salient variations of human postures in dangerous actions. Additionally, finer-grained dangerous actions require more discriminative cues, adding to the complexity of the task. To address these challenges, we propose a novel solution that models the intrinsic and invariant properties of dangerous actions at multiple temporal semantic levels. Concretely, we propose a Dual Temporal Transformers (DTT) to capture temporal interactions between distinct key points in the human body at different scales, from local to global, simultaneously. By doing so, our method avoids overfitting to unrelated or minor clues in videos and achieves a generalized representation of abnormal actions. We evaluate our approach on indoor and outdoor environments and found that DTT outperforms existing methods in terms of efficiency and accuracy. Our code and dataset are pubic available on https://github.com/AveryJohnsonJJ/DTT.git.