Cross-head supervision for crowd counting with noisy annotations
Mingliang Dai (Fudan University); Zhizhong Huang (Fudan University); Jiaqi Gao (Fudan University); Hongming Shan (Fudan University); Junping Zhang (Fudan University)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Noisy annotations such as missing annotations and location shifts often exist in crowd counting datasets due to multi-scale head sizes, high occlusion, etc. These noisy annotations severely affect the model training, especially for density map-based methods. To alleviate the negative impact of noisy annotations, we propose a novel crowd counting model with one convolution head and one transformer head, in which these two heads can supervise each other in noisy areas, called Cross-Head Supervision. The resultant model, CHS-Net, can synergize different types of inductive biases for better counting. In addition, we develop a progressive cross-head supervision learning strategy to stabilize the training process and provide more reliable supervision. Extensive experimental results on ShanghaiTech and QNRF datasets demonstrate the superior performance of our proposed approach over state-of- the-art methods.