Sample-Adapt Fusion Network for RGB-D Hand Detection in the Wild

Xingyu Liu (Beijing University of Posts and Telecommunications); Pengfei Ren (Beijing University of Posts and Telecommunications); Yuchen Chen (Beijing University of Posts and Telecommunications); Cong Liu (China Mobile); Jing Wang (Beijing University of Posts and Telecommunications); Haifeng Sun (Beijing university of posts and telecommunications); Qi Qi (Beijing University of Posts and Telecommunications); Jingyu Wang (Beijing University of Posts and Telecommunications)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

06 Jun 2023

RGB and depth modalities provide complementary information, which can be effectively utilized to improve the performance of hand detection in the wild. Most existing fusion-based methods model the channel-wise or spatial-wise cross-modal correlation to exploit the complementary RGB-D information, in which the modeling operations are shared across all input samples. However, the input images show various modes due to the high diversity of scenes in the wild. This inter-sample variance cannot be effectively perceived by static modeling operations shared across all samples. To address this problem, we propose a Sample-Adapt Fusion Network (SAFNet) with Channel Dynamic Refinement Module (CDRM) and Spatial Dynamic Aggregation Module (SDAM) to adaptively model the channel-wise and spatial-wise cross-modal correlation. Specifically, we propose a Multi-kernel Attention Module (MAM) to individually generate attention maps for each input sample by applying learnable weighting operations to multiple convolutional kernels. Our method outperforms state-of-the-art methods on CUG Hand dataset.

Tags:

Machine learning for image processing

Sample-Adapt Fusion Network for RGB-D Hand Detection in the Wild

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Learning Generalizable Light Field Networks from Few Images

M2TSR: Multi-range and Mix-grained Transformer for Single Image Super-Resolution

Multistage Spatial Context Models for Learned Image Compression

Join the IEEE Signal Processing Society