DF-Net: Diversity-Focused Network for Video Object Detection
Zhenyu Qiu, Qiang Qi, Yan Yan, Hanzi Wang
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Video object detection is a challenging task due to deteriorated object appearances. To enhance per-frame features, one way is to aggregate features from several support frames. However, proposals generated by the region proposal network may not be precise and diverse due to the fixed anchors, limiting the detection performance. We propose a novel architecture called Diversity-Focused Network (DF-Net), which consists of three modules: 1) An affine transform module (ATM), which is proposed to model the deblurring process and fuse the feature maps of different receptive fields by a multi-level attention block; 2) A label assignment module (LAM), which is proposed to assign the labels to the proposals used in a fine-grained aggregation manner; 3) A regression-guided diffusion module (RGDM), which is proposed to obtain the features of diversity and higher quality. Experiments show that DF-Net achieves favorable results on the most representative large-scale ImageNet VID dataset. Remarkably, the DF-Net achieves 84.8% mAP with ResNet-101 without post-processing steps.