Skip to main content

Cats: Complementary Cnn And Transformer Encoders For Segmentation

Hao Li, Dewei Hu, Han Liu, Jiacheng Wang, Ipek Oguz

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 00:04:11
28 Mar 2022

Recently, deep learning methods have achieved state-of-the-art performance in many medical image segmentation tasks.Many of these are based on convolutional neural networks(CNNs). For such methods, the encoder is the key part for global and local information extraction from input images; the extracted features are then passed to the decoder for prediCT-ing the segmentations. In contrast, several recent works showa superior performance with the use of transformers, whichcan better model long-range spatial dependencies and cap-ture low-level details. However, transformer as sole encoderunderperforms for some tasks where it cannot efficiently re-place the convolution based encoder. In this paper, we pro-pose a model with double encoders for 3D biomedical im-age segmentation. Our model is a U-shaped CNN augmentedwith an independent transformer encoder. We fuse the infor-mation from the convolutional encoder and the transformer,and pass it to the decoder to obtain the results. We evalu-ate our methods on three public datasets from three differ-ent challenges: BTCV, MoDA and Decathlon. Compared to the state-of-the-art models with and without transformers, our proposed method obtains higher Dice scores across the board.