Finding Optimal Numerical Format for Sub-8-bit Post-Training Quantization of Vision Transformers

Janghwan Lee (Hanyang University); Youngdeok Hwang (Baruch College - The City University of New York (CUNY)); Jungwook Choi (Hanyang University)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

07 Jun 2023

Vision Transformers (ViTs) have gained significant attention for their exceptional model accuracies on computer vision applications, but their demanding memory requirements and computational complexity have hindered active deployment. Post-training quantization (PTQ) is a practical method to tackle this challenge by directly reducing ViT's bit-precision. However, diverse data characteristics across different operations of ViT cannot be well captured solely by a single numerical format (fixed or floating-point). This work proposes an analytical framework that optimizes the numerical format of each matrix multiplication of ViTs for mixed-format sub-8-bit quantization. The extensive evaluation demonstrates that the proposed method can reduce the PTQ error and achieve state-of-the-art accuracy for popular ViT models.

Tags:

Signal processing systems

Finding Optimal Numerical Format for Sub-8-bit Post-Training Quantization of Vision Transformers

Janghwan Lee (Hanyang University); Youngdeok Hwang (Baruch College - The City University of New York (CUNY)); Jungwook Choi (Hanyang University)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

CAN2V: CAN-BUS DATA-BASED SEQ2SEQ MODEL FOR VEHICLE VELOCITY PREDICTION

Hardware-limited Non-uniform Task-based Quantizers

CANCELLING INTERMODULATION DISTORTIONS FOR OTOACOUSTIC EMISSION MEASUREMENTS WITH EARBUDS

Join the IEEE Signal Processing Society