Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
Lecture 10 Oct 2023

As an effective approach for Vision Transformers (ViT) to ob- tain better initializations and representations in natural image analysis, Masked image modeling (MIM), performs the pre- text task of reconstructing images by adopting partial obser- vations without any label. Several works adopted dissimilar mask strategies to make ViT aggregate contextual information to infer missed contents. Nonetheless, chest radiographs con- spicuously differ from photographic images, and conducting MIM in chest X-rays remains challenging. On that account, this paper came up with a specialized pre-training recipe cxr- MIM and a masking strategy for chest radiographs on the ba- sis of their physiological characters. In cxrMIM, the out-of- lung region was first analyzed, and the lung region was then reconstructed with the help of mechanical connections and similarities of anatomy and physiology. cxrMIM facilitates ViT to excavate the commonalities of pulmonary structures and promote better performance on downstream tasks. We conducted experiments on the ChestX-ray 14 dataset using advanced self-supervised methods (e.g. MoCo v3, MAE) for comparison. Quantitative and qualitative results signified that cxrMIM reinforced the efficiency of Vision Transformer to resolve the multi-label thorax disease classification problem, and cxrMIM pretrained Swin-B performed comparably to the state-of-the-art CNN models.

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00