CROSS-MODAL MATCHING AND ADAPTIVE GRAPH ATTENTION NETWORK FOR RGB-D SCENE RECOGNITION

Yuhui Guo (Renmin University of China); Xun Liang (Renmin University of China); james kwok (The Hong Kong University of Science and Technology); Xiangping Zheng (Renmin University of China); Bo Wu (Renmin University of China); Yuefeng Ma (Qufu Normal University)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

07 Jun 2023

Despite the significant advances in RGB-D scene recognition, there are several major limitations that need further investigation. For example, simply extracting modal-specific features neglects the complex relationships among multiple modalities of features. Moreover, cross-modal features have not been considered in most existing methods. To address these concerns, we propose to integrate the tasks of cross-modal matching and modal-specific recognition, termed as Matching-to-Recognition Network (MRNet). Specifically, the cross-modal matching network enhances the descriptive power of the recognition network via a layer-wise semantic loss. The recognition network obtains multi-modal features from a two-stream CNN: global features are obtained by a higher-layer of a CNN to preserve the semantic content, and local layout features are learned by the graph attention network, thus better capturing the key object regions and modelling their relationships. Extensive experiments results demonstrate the MRNet achieves superior performance to state-of-the-art methods, especially for recognition solely based on single modality.

Tags:

Machine learning for image processing

CROSS-MODAL MATCHING AND ADAPTIVE GRAPH ATTENTION NETWORK FOR RGB-D SCENE RECOGNITION

Yuhui Guo (Renmin University of China); Xun Liang (Renmin University of China); james kwok (The Hong Kong University of Science and Technology); Xiangping Zheng (Renmin University of China); Bo Wu (Renmin University of China); Yuefeng Ma (Qufu Normal University)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

RETIFORMER: RETINEX-BASED ENHANCEMENT IN TRANSFORMER FOR LOW-LIGHT IMAGE

Learning Supervised Covariation Projection Through General Covariance

Learning Generalizable Light Field Networks from Few Images

Join the IEEE Signal Processing Society