TRANSFORMER-BASED DEEP HASHING METHOD FOR MULTI-SCALE FEATURE FUSION

Chao He (Inner Mongolia University); Hongxi Wei (Inner Mongolia University)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

06 Jun 2023

The deep image hashing aims to map the input image into simply binary hash codes via deep neural networks. Motivated by the recent advancements of Vision Transformers (ViT), many deep hashing methods based on ViT have been proposed. Nevertheless, the ViT has enormous number of model parameters and high computational complexity. Moreover, the last layer of the ViT outputs only the classification tokens as image feature vectors, while the rest of the vectors are discarded. This results in the inefficiency of model computation and the useful image information is neglected. Therefore, this paper proposes a Transformer-based deep hashing method for multi-scale feature fusion (TDH). Specifically, we use a hierarchical Transformer backbone to capture both global and local features of images. The hierarchical Transformer utilizes a local self-attention mechanism to process image blocks in parallel, which reduces computational complexity and promotes computational efficiency. Multi-scale feature fusion module captures image feature vectors of the hierarchical Transformer output to obtain more abundant image feature information. We perform comprehensive experiments on three widely-studied datasets: CIFAR-10, NUS-WIDE and IMAGENET. The experimental results demonstrate that the proposed method in this paper indicates superior results compared to the existing state-of-the-art work.

Tags:

Image and video storage and retrieval

TRANSFORMER-BASED DEEP HASHING METHOD FOR MULTI-SCALE FEATURE FUSION

Chao He (Inner Mongolia University); Hongxi Wei (Inner Mongolia University)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

MABNet: Master Assistant Buddy Network with Hybrid Learning for Image Retrieval

Counterfactual Two-stage Debiasing for Video Corpus Moment Retrieval

Joint Multi-Level Feature Network for Lightweight Person Re-Identification

Join the IEEE Signal Processing Society