-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:02:04
Automatic semantic segmentation of endoscopic images is an essential part of computer-assisted intervention surgery. Recently, Convolutional Neural Networks (CNNs) have been widely applied to endoscopic image segmentation, but their performance is still limited due to the weak ability to capture global long-range dependencies. This paper proposes a model that combines CNN and Transformer to deal with this problem, and it is named as Multi-scale Convolution-Transformer Fusion Network (MCTFNet) and consists of three components: 1) Multiple-parallel Multi-scale Transformer Convolution (MMTC) modules in parallel branches to extract Multi-scale information, 2) Multi-scale Information Fusion (MIF) module that fuses parallel branch information to allow interaction between different resolutions and 3) High-resolution Information Processing (HIP) module to keep high-resolution features in the image and avoid loss of details. We verified our method on HeiSurF Dataset, and the results show that our method achieved an average Dice of 80.07%, which outperformed state-of-the-art CNNs including HRNet (79.93%) and DeepLabv3 (78.34%). It also outperformed several networks designed for medical image segmentation.