A Dynamic Cross-scale Transformer with Dual-compound Representation for 3D Medical Image Segmentation
Ruixia Zhang (Northeastern University); Zhiqiong Wang (Northeastern University); Zhongyang Wang (Northeastern University); Junchang Xin ( Northeastern University)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Transformer models exploit multi-head self-attention to capture long-range information. Further, window-based self-attention solves the problem of quadratic computational complexity and provides a new solution for dense prediction of 3D images. However, Transformers miss structural information due to the naive tokenization scheme. Furthermore, single-scale attention fails to achieve a balance between feature representation and semantic information. Aiming at the above problems, we propose a window-based dynamic cross-scale cross-attention transformer (DCS-Former) for precise representation of the diversity features. DCS-former first constructs dual-compound feature representations through Ante-hoc Structure-aware Module and Post-hoc Class-aware Module. Then, the bidirectional attention structure is designed to interactively fuse structural features with class representations. The experimental results show that our method outperforms various competing segmentation models on three different public datasets.