Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 00:09:06
04 Oct 2022

Semantic segmentation for remote sensing images (RSI) has been a thriving research topic for a long time. Existing supervised learning methods usually require a huge amount of labeled data. Meanwhile, large size, variation in object scales, and intricate details in RSI make it essential to capture both long-range context and local information. To address these problems, we propose Le-BEIT, a self-supervised Transformer with an improved positional encoding Local-Enhanced Positional Encoding (LePE). Self-supervised learning relieves the demanding requirement of a large amount of labeled data. The self-attention mechanism in Transformer has remarkable capability in capturing long-range context. Meanwhile, we use LePE as a substitution for Relative Positional Encoding (RPE) to represent local information more effectively. Moreover, considering the domain difference between natural images and RSI, instead of ImageNet-22K, we pre-train Le-BEIT on a very small high-resolution RSI dataset---GID. To investigate the influence of pre-training dataset size on segmentation accuracy, we furtherly conduct experiments on a larger pre-training dataset called GID-DOTA, which is 1/100 of ImageNet-22K, and have observed considerable accuracy improvements. The result of our method, which relies on a much smaller pre-trained dataset, achieves competitive accuracy compared to the counterpart on ImageNet-22K.

Value-Added Bundle(s) Including this Product

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00