Multi-Dimension Unified Swin Transformer for 3D Lesion Segmentation In Multiple Anatomical Locations
Shaoyan Pan
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:02:16
Accurate 3D segmentation of lesions from CT scans is essential for the calculation of 3D radiomic features in lesions and lesion growth kinetic modeling, which aid in the prediction of treatment responses in clinical oncology. This work proposes a novel U-shaped neural network, denoted multi-dimension unified Swin transformer (MDU-ST), which incorporates a convolutional neural network (CNN) and a Shifted-window transformer (Swin-transformer) for automatic 3D lesion segmentation using both 2D and 3D inputs. The MDU-ST consists of 1) a Swin-transformer encoder that learns semantic features from the input CT scans; 2) a CNN decoder that contains multiple upsampling convolutional layers. The Swin-transformer encoder can automatically adapt to 2D and 3D inputs, allowing us to learn semantic information from multi-dimensional inputs in the same encoder. Motivated by this property, we introduce a novel framework for 3D lesion segmentation, which includes three stages: 1) we apply multiple self-supervised pretext tasks, leveraging unlabeled 3D lesion volumes, to learn the underlying pattern of lesion anatomy; 2) we fine-tune the Swin-transformer encoder to perform 2D lesion segmentation on 2D RECIST slices, to learn slice-level segmentation information; 3) we further fine-tune the Swin-transformer encoder to perform 3D lesion segmentation on labeled 3D volumes, to learn volume-level segmentation information. We compare the proposed MDU-ST with state-of-the-art CNN-based and transformer-based segmentation models using an internal 3D lesion dataset. The network’s performance is evaluated by the Dice similarity coefficient (DSC) for volume-based accuracy and Hausdorff distance (HD) for surface-based accuracy. The proposed MDU-ST with the pre-training framework demonstrates significant improvement over the competing models. The network may be applied in the future to the problem of automatic lesion segmentation to assist lung lesion response prediction.