Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 00:10:05
19 Apr 2023

We propose Masked Attention Transformers for Surgical Instrument Segmentation (MATIS), a two-stage fully transformer-based method that leverages modern pixel-wise attention mechanisms for instrument segmentation. MATIS exploits the instance-level nature of the task by employing a Masked Attention module that generates and classifies a set of highly granular instrument region proposals. Our method incorporates long-term video-level information using video transformers to improve temporal consistency and enhance mask classification. We validate our proposal in the two standard public benchmarks, Endovis 2017 and Endovis 2018. Our experiments demonstrate that MATIS’ sole baseline outperforms previous state-of-the-art methods and that the inclusion of our temporal consistency module boosts our model’s performance significantly. All our training and validation codes and our pretrained models will be publicly released upon acceptance.