Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 00:05:25
05 Oct 2022

Vision Transformer (ViT) has been introduced into the computer vision (CV) field with its self-attention mechanism to capture global dependency. However, simply deploying ViT on a hyperspectral image (HSI) classification task can not get satisfying results because ViT is a spatial-only self-attention model, but rich spectral information exists in HSI. Moreover, most HSI classifiers integrate spectral and spatial features in a cascaded flowchart, ignoring the internal correlation between spectral and spatial information. Furthermore, existing positional embedding (PE) methods can not fulfil the 3D configuration of ViT. Therefore, this paper proposes a unified spectral-spatial-based 3D ViT with cooperative 3D coordinate positional embedding. in the meanwhile, a novel local-global feature fusion strategy is proposed. The model does not contain convolution or recurrent units and can achieve more competitive classification performance than other state-of-the-art (SOTA) methods. Furthermore, compared with existing ViT-based HSI classifiers, our concept can get better results.

Value-Added Bundle(s) Including this Product

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00