Adaptive Scale and Spatial Aggregation for Real-time Object Detection
Wei Chen (College of Computer, National University of Defense Technology); Yulin He (National University of Defense Technology); Zhengfa Liang (Defense Innovation Institute); Yulan Guo (National University of Defense Technology)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Cutting-edge real-time detectors usually reach real-time performance by adopting lightweight architectures. The accuracy of detection may be limited by their insufficient capabilities to obtain powerful feature representation, which is a notoriously onerous task in machine vision applications. Aiming at this problem, this study proposes a method of adaptive aggregation of features at both scale and spatial levels in an anchor-free framework: 1) at the scale level, a Multi-scale Point Feature Fusion (MPFF) module has been proposed to fuse point features from multiple scales via a self-adaptive re-weighting manner; 2) at the spatial level, a Restrained Deformable Convolution (R-DCN) has been designed to focus on the most informative features in a pre-defined region while avoiding the remote feature distraction. Based on R-DCN, an Adaptive Spatial Aggregation (ASA) module has been presented to alleviate the feature misalignment problem in classification and regression tasks via their respective spatial divisions. Extensive experimental results on MS COCO indicate that AADet achieves a state-of-the-art detection performance, i.e., 41.8 AP at 60 FPS, for real-time anchor-free detectors.