LOOK, LISTEN AND PAY MORE ATTENTION: FUSING MULTI-MODAL INFORMATION FOR VIDEO VIOLENCE DETECTION

Dong-Lai Wei, Yang Liu, Jing Liu, Xin-Hua Zeng, Chen-Geng Liu, Xiao-Guang Zhu

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:07:49

09 May 2022

Violence detection is an essential and challenging problem in the computer vision community. Most existing works focus on single modal data analysis, which is not effective when multi-modality is available. Therefore, we propose a two-stage multi-modal information fusion method for violence detection: 1) the first stage adopts multiple instance learning strategies to refine video-level hard labels into clip-level soft labels, and 2) the next stage uses multi-modal information fused attention module to achieve fusion, and supervised learning is carried out using the soft labels generated at the first stage. Extensive empirical evidence on the XD-Violence dataset shows that our method outperforms the state-of-the-art methods.

Tags:

fused attention

multi-modal information

deep learning

weak supervision

violence detection

LOOK, LISTEN AND PAY MORE ATTENTION: FUSING MULTI-MODAL INFORMATION FOR VIDEO VIOLENCE DETECTION

Dong-Lai Wei, Yang Liu, Jing Liu, Xin-Hua Zeng, Chen-Geng Liu, Xiao-Guang Zhu

Value-Added Bundle(s) Including this Product

ICASSP 2022, May 2022 Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Signal Processing and Deep Learning for Practical Active Noise Control

Short Course Bundle: ICASSP 2023 COURSE 2: Graph Signal Processing and Geometric Learning: A Foundational Approach (Parts 1-4)

Short Course Bundle: ICASSP 2023 COURSE 1: A Hands-on Approach for Implementing Stochastic Optimization Algorithms from Scratch (Parts 1-4)

Join the IEEE Signal Processing Society