HIGH-FREQUENCY TRANSFORMER NETWORK BASED ON WINDOW CROSS-ATTENTION FOR PANSHARPENING
Chengjie Ke (WuHan University); Hao Liang (WuHan University); Duidui Li (China Centre for Resources Satellite Data and Application); Xin Tian (Wuhan University)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Inspired by the powerful ability to capture long-distance dependencies in the vision transformer, we propose a novel high-frequency transformer network based on window cross-attention to fuse panchromatic (PAN) and multispectral (MS) images for a high resolution MS image. To overcome the problem brought by shallow feature extraction in the previous transformer-based fusion network, we combine high-pass filtering and deep feature extraction to explore more texture information. As a result, the obtained relationship between MS and PAN images according to feature similarity is more accurate. In particular, we build the cross-modality correlation by a window cross-attention mechanism at pixel-level between MS and PAN images’ local window. Compared with patch-level, pixel-level helps to preserve fine-grained features. Therefore, more spatial details from a PAN image are transferred to an MS image, leading to a clearer fused MS image with good preservation of spectral information. Experimental results demonstrate that the proposed method outperforms the comparison methods in terms of visual and quantitative qualities.