Skip to main content

Recurrent Fine-Grained Self-Attention Network for Video Crowd Counting

Jifan Zhang (School of Electronic and Computer Engineering, Peking University); Zhe Wu (Peng Cheng Laboratory); xinfeng zhang (University of Chinese Academy of Sciences); Guoli Song (Peng Cheng Laboratory); Yaowei Wang (PengCheng Laboratory); Jie Chen (Peking University)

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
07 Jun 2023

Striking a balance between exploring the spatio-temporal correlation and controlling model complexity is vital for video-based crowd counting methods. In this paper, we propose a Recurrent Fine-Grained Self-Attention Network (RFSNet) to achieve efficient and accurate counting in video scenes via the self-attention mechanism and a recurrent fine-tuning strategy. Specifically, we design a decoder which consists of patch-wise spatial self-attention and temporal self-attention. Compared with vanilla self-attention, it effectively leverages the dependencies in spatial and temporal domain respectively, while significantly reducing computational complexity. Moreover, the RFSNet recurrently feeds the features into the decoder to enhance the spatio-temporal representations. This strategy not only simplifies the model structure and reduces the number of parameters, but also improves the quality of estimated density maps. Our RFSNet achieves state-of-the-art performance on three video crowd counting benchmarks, and outperforms other methods by more than 20% on the challenging FDST dataset.

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00