Fusioncount: Efficient Crowd Counting Via Multiscale Feature Fusion
Yiming Ma, Victor Sanchez, Tanaya Guha
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:12:12
Although CNN has powerful learning capability, it is still difficult for CNN to judge corresponding points in the occlusion region. The ghosting effect in the occlusion region during feature warping is the bottleneck of performance improvement for many stereo matching networks. in this paper, we propose an Occlusion-Aware Refinement Module (OARM), which can learn a rough occlusion map from multi-scale aggregated cost volume without supervision to mask and filter pernicious occluded regions in a warped image. Furthermore, we craft simple yet efficient 2D-based intra/Cross-Level Aggregation Modules to effectively and efficiently aggregate information of different scales before the disparity refinement stage. Benefiting from the above modules, our proposed OMNet achieve an error rate of 1.82% (D1-all ) on KITTI 2015 dataset, which is even better than most 3D-based networks. Meanwhile, OMNet keeps real-time characteristic and could process a 1248 ? 384 resolution image pair at 36 fps.