GLFA-Net: A Hybrid Network for Mr-to-Ct Synthesis Via Global And Local Feature Aggregation
Zeli Chen
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:02:10
Synthesis of Computed Tomography (CT) images from Magnetic Resonance (MR) images is clinical significance for MR?only treatment planning to eliminate the co-registration errors between MR and CT images. Existing convolutional neural network-based methods, suffering from the inherent local inductive biases, struggle to distinguish bone and air which show low signals in conventional MR images. ViT-based methods can learn long-range contextual information by a global self-attention mechanism but are limited by the quadratic complexity and generating local detailed structures. Combining the merits of these two architectures, we propose a hybrid network for MR-to-CT synthesis via global and local feature aggregation from Transformer and CNN, named GLFA-NET. Specifically, we add a global patch embedding branch to supplement the patch-based global representative features directly from the image and design a residual dilated swin transformer block aggregating the local detailed features and global features to improve the synthesis performance of bone and air and reduce computational overhead. Furthermore, we adopt a wavelet PatchGAN discriminator to enhance the high-frequency detailed information of the synthetic CT. Our GLFA-NET was implemented on a dataset with 154 pairs of 3D MR-CT head and neck images. Experiments show that our GLFA-NET achieves impressive performance with MAE of 71.12 ± 10.87, SSIM of 0.771 ± 0.028, and PSNR of 28.91 ± 1.33. The visual synthetic CT results also show that the proposed GLFA-NET method achieves better discrimination of bone and air and higher structural similarity than other state-of-the-art methods.