DECOMFORMER: DECOMPOSE SELF-ATTENTION VIA FOURIER TRANSFORM FOR VHR AERIAL IMAGE SCENE CLASSIFICATION
Yan Zhang (Chongqing University of Posts and Telecommunications); Xiyuan Gao (Chongqing University of Posts and Telecommunications); Xiao PU (Chongqing University of Posts and Telecommunications); Tao Wang (Chongqing University of Posts and Telecommunications); Xinbo Gao (Chongqing University of Posts and Telecommunications)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Very high-resolution (VHR) aerial image scene classification is an essential task for aerial image understanding. Although transformer-based models have demonstrated strong ability in natural image classification, transformer-based methods on VHR aerial image tasks are still lack of concern because the complexity of self-attention in the transformer grows quadratically with the image resolution. To address this issue, we decompose the self-attention via Fourier Transform and propose a novel Fourier self-attention (FSA) mechanism. Based on FSA, we design a highly efficient network named DecomFormer, which learns contextual relationships in the real part and imaginary part of the Fourier field, respectively. Theoretically, the DecomFormer reduces the complexity of the naive self-attention mechanism from O(n^2) to O(nlog(n)). Universal experiments on public VHR aerial image classification benchmarks demonstrated the DecomFormer’s efficiency, especially on images with very high-resolution.