Stochastic Multi-Scale Aggregation Network For Crowd Counting
Mingjie Wang, Hao Cai, Jun Zhou, Minglun Gong
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 15:04
Crowd counting from unconstrained and congested scenes is an important task in computer vision. Its main difficulties stem from large scale/density variation and prone to overfitting. This paper presents a novel end-to-end stochastic multi-scale aggregation network (SMANet) which carefully addresses these issues. Specifically, general features are first extracted by the front-end subnetwork and then fed into the back-end subnetwork which consists of stochastic multi-scale aggregation module, density map generator, and global prior encoder. The stochastic aggregation impels the multi-branch units to learn features at different scales effectively and reduces sensitivity to scale variations, whereas the global prior encoder is designed to encode global contextual information and guarantee density consistency of shared representations. Our proposed SMANet is the first work to fuse multi-scale features in a stochastic manner for crowd counting. Experimental results on four public datasets demonstrate that our SMANet consistently outperforms the state-of-the-arts.