Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
Poster 11 Oct 2023

Video object detection is a challenging task due to deteriorated object appearances. To enhance per-frame features, one way is to aggregate features from several support frames. However, proposals generated by the region proposal network may not be precise and diverse due to the fixed anchors, limiting the detection performance. We propose a novel architecture called Diversity-Focused Network (DF-Net), which consists of three modules: 1) An affine transform module (ATM), which is proposed to model the deblurring process and fuse the feature maps of different receptive fields by a multi-level attention block; 2) A label assignment module (LAM), which is proposed to assign the labels to the proposals used in a fine-grained aggregation manner; 3) A regression-guided diffusion module (RGDM), which is proposed to obtain the features of diversity and higher quality. Experiments show that DF-Net achieves favorable results on the most representative large-scale ImageNet VID dataset. Remarkably, the DF-Net achieves 84.8% mAP with ResNet-101 without post-processing steps.

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00