DF-Net: Diversity-Focused Network for Video Object Detection

Zhenyu Qiu, Qiang Qi, Yan Yan, Hanzi Wang

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Poster 11 Oct 2023

Video object detection is a challenging task due to deteriorated object appearances. To enhance per-frame features, one way is to aggregate features from several support frames. However, proposals generated by the region proposal network may not be precise and diverse due to the fixed anchors, limiting the detection performance. We propose a novel architecture called Diversity-Focused Network (DF-Net), which consists of three modules: 1) An affine transform module (ATM), which is proposed to model the deblurring process and fuse the feature maps of different receptive fields by a multi-level attention block; 2) A label assignment module (LAM), which is proposed to assign the labels to the proposals used in a fine-grained aggregation manner; 3) A regression-guided diffusion module (RGDM), which is proposed to obtain the features of diversity and higher quality. Experiments show that DF-Net achieves favorable results on the most representative large-scale ImageNet VID dataset. Remarkably, the DF-Net achieves 84.8% mAP with ResNet-101 without post-processing steps.

Tags:

video object detection

Affine Transform Module

Fine-grained Aggregation

Diffusion Module

DF-Net: Diversity-Focused Network for Video Object Detection

Zhenyu Qiu, Qiang Qi, Yan Yan, Hanzi Wang

More Like This

YOLO-MAXVOD FOR REAL-TIME VIDEO OBJECT DETECTION

Multi-Focus Guided Semantic Aggregation for Video Object Detection

Join the IEEE Signal Processing Society