TAQ: TOP-K ATTENTION-AWARE QUANTIZATION FOR VISION TRANSFORMERS

Lili Shi, Haiduo Huang, Bowei Song, Meng Tan, Wenzhe Zhao, Tian Xia, Pengju Ren

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Poster 10 Oct 2023

Model quantization can reduce the memory footprint of the neural network and improve the computing efficiency. However, the sparse attention in Transformer models is difficult to quantize, the main challenge is that changing the order of attention values and shifting attention regions might lead to incorrect prediction results. To address this problem, we propose quantization method, termed TAQ, which uses the proposed TOP-K attention-aware loss to search the quantization parameters. Further, we combine the sequential and parallel quantization methods to optimize the procedure. We evaluate the generalization ability of TAQ on various vision Transformer variants, and its performance on image classification and object detection tasks. TAQ makes the TOP-K attention ranking more consistent before and after quantization, and significantly reduces the attention shifting rate, compared with PTQ4ViT, TAQ improves the performance by 0.66 and 0.45, respectively on ImageNet and COCO, achieves the state-of-the-art performance.

Tags:

model quantization

vision transformer

attention ranking

attention shifting

TOP-K attention-aware

TAQ: TOP-K ATTENTION-AWARE QUANTIZATION FOR VISION TRANSFORMERS

Lili Shi, Haiduo Huang, Bowei Song, Meng Tan, Wenzhe Zhao, Tian Xia, Pengju Ren

More Like This

CLIP-FG:SELECTING DISCRIMINATIVE IMAGE PATCHES BY CONTRASTIVE LANGUAGE-IMAGE PRE-TRAINING FOR FINE-GRAINED IMAGE CLASSIFICATION

ADA-VIT: ATTENTION-GUIDED DATA AUGMENTATION FOR VISION TRANSFORMERS

IMAGE INPAINTING BY MSCSWIN TRANSFORMER ADVERSARIAL AUTOENCODER

Join the IEEE Signal Processing Society