Efficient Multi-Scale Attention Module with Cross-Spatial Learning
Daliang Ouyang (AEROSPACE SCIENCE & INDUSTRY SHENZHEN(GROUP)CO.,LTD.); Su He (AEROSPACE SCIENCE & INDUSTRY SHENZHEN(GROUP)CO.,LTD.); Guozhong Zhang (AEROSPACE SCIENCE & INDUSTRY SHENZHEN(GROUP)CO.,LTD.); Mingzhu Luo (AEROSPACE SCIENCE & INDUSTRY SHENZHEN(GROUP)CO.,LTD.); Huaiyong Guo (AEROSPACE SCIENCE & INDUSTRY SHENZHEN(GROUP)CO.,LTD.); Jian Zhan (AEROSPACE SCIENCE & INDUSTRY SHENZHEN(GROUP)CO.,LTD.); Zhijie Huang (AEROSPACE SCIENCE & INDUSTRY SHENZHEN(GROUP)CO.,LTD.)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Remarkable effectiveness of the channel or spatial attention mechanisms for producing more discernible feature representation are illustrated in various computer vision tasks. However, modeling the cross-channel relationships with channel dimensionality reduction may bring side effect in extracting deep visual representations. In this paper, a novel efficient multi-scale attention (EMA) module is proposed. Focusing on retaining the information on per channel and decreasing the computational overhead, EMA groups the channel dimensions into multiple sub-features and makes the spatial semantic features well-distributed inside each feature group. Specifically, apart from encoding the global information to re-calibrate the channel-wise weight in each parallel branch, the output features of the two parallel branches are further aggregated by a cross-dimension interaction. The extensive experiments on common-used benchmarks, such as CIFAR100 for image classification, and object detection on MSCOCO and VisDrone, are conducted which indicates that EMA outperforms several recent attention mechanisms significantly without changing networks depth.