Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 00:13:13
07 Oct 2022

This paper presents an efficient multi-scale vision transformer, called CBPT, that capably serves as a general-purpose backbone for computer vision. A challenging issue in transformer design is that window self-attention(WSA) often limits the information transmission of each token, whereas enlarging WSA?s receptive field is very expensive to compute. To address this issue, we develop the Locally-Enhanced Window Self-attention mechanism to double the receptive field and have a similar computational complexity to the typical WSA. in addition, we also propose information-Enhanced Patch Merging, which solves the loss of information in sampling the attention map. incorporated with these designs and the Cross Block Partial connection, CBPT not only significantly surpasses Swin by +1 box AP and mask AP on COCO object detection and instance segmentation, but also has 30% fewer parameters and 35% fewer FLOPs than Swin.

Value-Added Bundle(s) Including this Product

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00