Adaptive Scale and Spatial Aggregation for Real-time Object Detection

Wei Chen (College of Computer, National University of Defense Technology); Yulin He (National University of Defense Technology); Zhengfa Liang (Defense Innovation Institute); Yulan Guo (National University of Defense Technology)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

08 Jun 2023

Cutting-edge real-time detectors usually reach real-time performance by adopting lightweight architectures. The accuracy of detection may be limited by their insufficient capabilities to obtain powerful feature representation, which is a notoriously onerous task in machine vision applications. Aiming at this problem, this study proposes a method of adaptive aggregation of features at both scale and spatial levels in an anchor-free framework: 1) at the scale level, a Multi-scale Point Feature Fusion (MPFF) module has been proposed to fuse point features from multiple scales via a self-adaptive re-weighting manner; 2) at the spatial level, a Restrained Deformable Convolution (R-DCN) has been designed to focus on the most informative features in a pre-defined region while avoiding the remote feature distraction. Based on R-DCN, an Adaptive Spatial Aggregation (ASA) module has been presented to alleviate the feature misalignment problem in classification and regression tasks via their respective spatial divisions. Extensive experimental results on MS COCO indicate that AADet achieves a state-of-the-art detection performance, i.e., 41.8 AP at 60 FPS, for real-time anchor-free detectors.

Tags:

Deep learning techniques

Adaptive Scale and Spatial Aggregation for Real-time Object Detection

Wei Chen (College of Computer, National University of Defense Technology); Yulin He (National University of Defense Technology); Zhengfa Liang (Defense Innovation Institute); Yulan Guo (National University of Defense Technology)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Training Robust Spiking Neural Networks with ViewPoint Transform and SpatioTemporal Stretching

ANALYSING THE MASKED PREDICTIVE CODING TRAINING CRITERION FOR PRE-TRAINING A SPEECH REPRESENTATION MODEL

Robustness-preserving Lifelong Learning via Dataset Condensation

Join the IEEE Signal Processing Society