Recurrent Fine-Grained Self-Attention Network for Video Crowd Counting

Jifan Zhang (School of Electronic and Computer Engineering, Peking University); Zhe Wu (Peng Cheng Laboratory); xinfeng zhang (University of Chinese Academy of Sciences); Guoli Song (Peng Cheng Laboratory); Yaowei Wang (PengCheng Laboratory); Jie Chen (Peking University)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

07 Jun 2023

Striking a balance between exploring the spatio-temporal correlation and controlling model complexity is vital for video-based crowd counting methods. In this paper, we propose a Recurrent Fine-Grained Self-Attention Network (RFSNet) to achieve efficient and accurate counting in video scenes via the self-attention mechanism and a recurrent fine-tuning strategy. Specifically, we design a decoder which consists of patch-wise spatial self-attention and temporal self-attention. Compared with vanilla self-attention, it effectively leverages the dependencies in spatial and temporal domain respectively, while significantly reducing computational complexity. Moreover, the RFSNet recurrently feeds the features into the decoder to enhance the spatio-temporal representations. This strategy not only simplifies the model structure and reduces the number of parameters, but also improves the quality of estimated density maps. Our RFSNet achieves state-of-the-art performance on three video crowd counting benchmarks, and outperforms other methods by more than 20% on the challenging FDST dataset.

Tags:

Machine learning for image processing

Recurrent Fine-Grained Self-Attention Network for Video Crowd Counting

Jifan Zhang (School of Electronic and Computer Engineering, Peking University); Zhe Wu (Peng Cheng Laboratory); xinfeng zhang (University of Chinese Academy of Sciences); Guoli Song (Peng Cheng Laboratory); Yaowei Wang (PengCheng Laboratory); Jie Chen (Peking University)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

RETIFORMER: RETINEX-BASED ENHANCEMENT IN TRANSFORMER FOR LOW-LIGHT IMAGE

Learning Supervised Covariation Projection Through General Covariance

Learning Generalizable Light Field Networks from Few Images

Join the IEEE Signal Processing Society