PYRAMID TRANSFORMER DRIVEN MULTIBRANCH FUSION FOR POLYP SEGMENTATION IN COLONOSCOPIC VIDEO IMAGES
Ao Wang, Ming Wu, Hao Qi, Hong Shi, Jianhua Chen, Yinran Chen, Xiongbiao Luo
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Automatic and accurate segmentation of colorectal polyps on colonoscopic video images is essential and valuable to early diagnosis and treatment of colorectal cancer. It remains challenging to accurately extract these polyps due to their small sizes, irregular shapes, image artifacts, and illumination variations. This work proposes a new encoder-decoder architecture called pyramid transformer driven multibranch fusion to precisely segment different types of colorectal polyps during colonoscopy. Different from current convolutional neural networks, our deep-learning architecture employs a simple, convolution-free pyramid transformer as its encoder that is a flexible and powerful feature extractor. Next, a multibranch fusion decoder is employed to reserve the detailed appearance information and fuse semantic global cues, which can deal with blurred polyp edges caused by nonuniform illumination and the shaky colonoscope. Additionally, a hybrid spatialfrequency loss function is introduced for accurate training. We evaluate our proposed architecture on colonoscopic polyp images with four types of polyps with different pathological features, with the experimental results showing that our architecture significantly outperforms other deep learning models. Particularly, our method improves the average dice similarity and intersection over union to 90.7% and 0.848, respectively.