LT-VIT: A VISION TRANSFORMER FOR MULTI-LABEL CHEST X-RAY CLASSIFICATION

Umar Marikkar, Sara Atito, Muhammad Awais, Adam Mahdi

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Lecture 09 Oct 2023

Vision Transformers (ViTs) are widely adopted in medical imaging tasks, and some existing efforts have been directed towards vision-language training for Chest X-rays (CXRs). However, we envision that there still exists a potential for improvement in vision-only training for CXRs using ViTs, by aggregating information from multiple scales, which has been proven beneficial for non-transformer networks. Hence, we have developed LT-ViT, a transformer that utilizes combined attention between image tokens and randomly initialized auxiliary tokens that represent labels. Our experiments demonstrate that LT-ViT (1) surpasses the state-of-the-art performance using pure ViTs on two publicly available CXR datasets, (2) is generalizable to other pre-training methods and therefore is agnostic to model initialization, and (3) enables model interpretability without grad-cam and its variants.

Tags:

transformers

medical imaging

multi-label classification

LT-VIT: A VISION TRANSFORMER FOR MULTI-LABEL CHEST X-RAY CLASSIFICATION

Umar Marikkar, Sara Atito, Muhammad Awais, Adam Mahdi

More Like This

Short Course Bundle: ICASSP 2022 COURSE 6: Transformer Architectures for Multimodal Signal Processing and Decision Making (Parts 1-3)

Tutorial: Fundamentals of Transformers: A Signal-processing View

Short Course Bundle: New trends in Computational MRI - Days 1-3, March 2024

Join the IEEE Signal Processing Society