Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 07:58
10 Jul 2020

Crowd counting is a challenging computer vision task which aims to estimate people count in crowded scenes. Although CNN-based methods have designed multi-scale or multi-column structures to cope with scale variation within one image, the variety of distribution features among different images has not been taken into consideration, which is difficult to handle in a fixed scheme. In this paper, we propose a multi-output structure network named Adaptive Depth Network (ADNet) that can adaptively adjust the network's depth according to the inputs' features. This flexible model introduces extra output blocks into internal layers to exploit their representation abilities and selects the output from the output block that produces the best confidence value as the final result. In our experiments on three crowd counting datasets, ADNet shows a consistent improvement. Moreover, ablation study also proves the effectiveness of the multi-output structure on both crowd counting datasets and CIFAR-100.