Skip to main content

SCOREFORMER: SCORE FUSION-BASED TRANSFORMERS FOR WEAKLY-SUPERVISED VIOLENCE DETECTION

Yang Xiao (Xinjiang University); Liejun Wang (Xinjiang University); Tongguan Wang (Xinjiang University); Huicheng Lai (Xinjiang University)

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
06 Jun 2023

Violence detection is an application of anomaly detection, which is used to detect violence content in video clips. Using multimodal as input can improve the performance of violence detection. However, the existing MML Transformers based fusion methods do not take into account the differences between non-homologous modals. The fusion of non-homologous modals makes features become noise between each other. This paper proposes a score fusion-based transformer framework, named Scoreformer. First of all, the optical flow, RGB and audio features pass through the independent self-attention transformer blocks. Second, the optical flow and RGB features pass through the cross-modal transformer blocks, after that they are fused with the audio features through the score fusion block. This method avoids the noise interference caused by the direct fusion of audio features and visual features. Experiments on the XD-Violence dataset show that the proposed method achieves 84.54% of the AP value, which exceeds at least 2.85% compared with the most advanced method (e. g. MSL, CRFD).

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00