Skip to main content

Dual Temporal Transformers for Fine-Grained Dangerous Action Recognition

Wenfeng Song, Xingliang Jin, Yang Ding, Yang Gao, Xia Hou

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
Poster 10 Oct 2023

Recognizing dangerous actions is a critical task in computer vision, especially for surveillance applications. While existing deep learning methods have been successful in confined environments, they struggle with the anomalous and salient variations of human postures in dangerous actions. Additionally, finer-grained dangerous actions require more discriminative cues, adding to the complexity of the task. To address these challenges, we propose a novel solution that models the intrinsic and invariant properties of dangerous actions at multiple temporal semantic levels. Concretely, we propose a Dual Temporal Transformers (DTT) to capture temporal interactions between distinct key points in the human body at different scales, from local to global, simultaneously. By doing so, our method avoids overfitting to unrelated or minor clues in videos and achieves a generalized representation of abnormal actions. We evaluate our approach on indoor and outdoor environments and found that DTT outperforms existing methods in terms of efficiency and accuracy. Our code and dataset are pubic available on https://github.com/AveryJohnsonJJ/DTT.git.