Attention Meets Normalization and Beyond
Xu Ma, Jingda Guo, Qi Chen, Sihai Tang, Qing Yang, Song Fu
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 06:36
To make Convolutional Neural Networks (CNNs) more efficient and accurate, various lightweight self-attention modules have been proposed. In this paper, we systematically study state-of-the-art attention modules in CNNs and discover that
self-attention mechanism can be closely related to normalization. Based on this observation, we propose a novel attention module, named Normalization-Attention module (NA module in short), which is almost parameter-free. The NA module calculates the mean and standard deviation of intermediate feature maps and processes the feature context with normalization, which makes a CNN model easier to be trained and more responsive to informative features. Our proposed Normalization-Attention module can be integrated into various base CNN architectures, and used for many computer vision tasks, including image recognition, object detection, and more. Experimental results on ImageNet
and MS COCO benchmarks show that our method outperforms state-of-the-art works using fewer parameters. Codes are made publicly available.
self-attention mechanism can be closely related to normalization. Based on this observation, we propose a novel attention module, named Normalization-Attention module (NA module in short), which is almost parameter-free. The NA module calculates the mean and standard deviation of intermediate feature maps and processes the feature context with normalization, which makes a CNN model easier to be trained and more responsive to informative features. Our proposed Normalization-Attention module can be integrated into various base CNN architectures, and used for many computer vision tasks, including image recognition, object detection, and more. Experimental results on ImageNet
and MS COCO benchmarks show that our method outperforms state-of-the-art works using fewer parameters. Codes are made publicly available.