-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:01:44
Accurate segmentation of primary tumors from Computed Tomography (CT) and Positron Emission Tomography (PET) images is essential for Head-and-Neck (H\&N) cancer radiotherapy planning. However, manual delineation is time-consuming, and its accuracy heavily depends on the operator's experience. Despite the great success of Convolutional Neural Networks (CNNs) in medical image segmentation, the inherent locality of convolutional layers limits the performance of most existing methods. Given the effectiveness of self-attention mechanisms in modeling long-range dependencies, Transformers have been introduced into CNNs as an alternative. However, pure Transformers require large amounts of data to learn inductive biases. This paper proposes a novel hybrid structure called FRNet, equipped with the learnable Feature Refinement Module (FRM), which combines transformer blocks and convolutional layers to enhance contextual information. Furthermore, we redesign the encoder block to improve the feature quality. Experimental results on a large public dataset demonstrate that our method outperforms existing state-of-the-art methods.