Optimizing Vision Transformers for Medical Image Segmentation

qianying liu (university of glasgow); Chaitanya Kaul (University of Glasgow); Jun Wang (University of Warwick); Christos Anagnostopoulos (University of Glasgow); Roderick Murray-Smith (University of Glasgow); Fani Deligianni (University of Glasgow)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

07 Jun 2023

For medical image semantic segmentation (MISS), Vision Transformers have emerged as strong alternatives to convolutional neural networks thanks to their inherent ability to capture long-range correlations. However, existing research uses off-the-shelf vision Transformer blocks based on linear projections and feature processing which lack spatial and local context to refine organ boundaries. Furthermore, Transformers do not generalize well on small medical imaging datasets and rely on large-scale pre-training due to limited inductive biases. To address these problems, we demonstrate the design of a compact and accurate Transformer network for MISS, CS-Unet, which introduces convolutions in a multi-stage design for hierarchically enhancing spatial and local modeling ability of Transformers. This is mainly achieved by our well-designed Convolutional Swin Transformer (CST) block which merges convolutions with Multi-Head Self-Attention and Feed-Forward Networks for providing inherent localized spatial context and inductive biases. Experiments demonstrate CS-Unet without pre-training outperforms other counterparts by large margins on multi-organ and cardiac datasets with fewer parameters and achieves state-of-the-art performance. Our code is available at Github.

Tags:

Medical image feature extraction and fusion

Optimizing Vision Transformers for Medical Image Segmentation

qianying liu (university of glasgow); Chaitanya Kaul (University of Glasgow); Jun Wang (University of Warwick); Christos Anagnostopoulos (University of Glasgow); Roderick Murray-Smith (University of Glasgow); Fani Deligianni (University of Glasgow)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

IDEAL: Improved DEnse LocAL Contrastive Learning for Semi-Supervised Medical Image Segmentation

New Interpretable Patterns and Discriminative Features from Brain Functional Network Connectivity Using Dictionary Learning

ViTASD: Robust Vision Transformer Baselines for Autism Spectrum Disorder Facial Diagnosis

Join the IEEE Signal Processing Society