Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 00:12:58
05 Oct 2022

Convolutional Neural Networks (CNNs) have been the controlling deep learning approach for a decade in automated medical image diagnosis. Recently, vision transformers (ViTs) have appeared as a competitive alternative to CNNs in computer vision, yielding similar levels of performance while possessing several interesting properties that could prove to be beneficial for the explanation of deep neural networks. Since most medical images are grayscale scans of CT, MRI, etc. and in 3-dimensional (3-D) spaces, which are highly different from natural images, we explore whether it is possible to move to transformer-based models or if we should keep working with CNNs in the domain of 3-D medical image classifications. If so, what are the advantages and drawbacks of switching to ViTs for medical image diagnosis? We consider these problems in a series of experiments on three 3-D medical image datasets. Our findings show that, while CNNs perform better when trained from scratch, ViTs gain strong benefit when pre-trained on ImageNet and outperform their CNN counterparts using self-supervised learning and sharpness-aware minimizer optimization method on the large datasets.

Value-Added Bundle(s) Including this Product

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00