Multi-stage Aggregation Transformer for Medical Image Segmentation
Xiaoyan Wang (Zhejiang University of Technology); Minghan Shao (Zhejiang University of Technology); Dongyan Guo (Zhejiang University of Technology); Ying Cui (Zhejiang University of Technology); Xiaojie Huang (Zhejiang University); Ming Xia (Zhejiang University of Technology); Cong Bai (Zhejiang University of Technology)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Capturing rich multi-scale features is essential for resolving complex variations in medical image segmentation. In this paper, we explore how to fully utilize the advantages of Convolutional neural networks (CNN) and Transformer, and propose a novel multi-stage aggregation architecture named MA-Transformer for accurate segmentation of medical images with large variations and blurs. Specifically, an encoder module is introduced in each stage, which is a dual-branch structure parallelly combining Transformers and convolutions. By such design, the self-attention can provide a global context for CNN to extract multi-resolution complementary features stage by stage, thus the feature representations are gradually enhanced with local details and contextual information. Multi-scale semantic features are then combined with skip connections in the decoder to produce the final result. Extensive experiments on public medical imaging datasets demonstrate our superior segmentation performance, compared to the state-of-the-art CNN-based, Transformer-based approaches and CNN-Transformer combined approaches. Code will be made publicly available.