DEPTH ESTIMATION OF MULTI-MODAL SCENE BASED ON MULTI-SCALE MODULATION
Anjie Wang, Zhijun Fang, Xiaoyan Jiang, Yongbin Gao, Gaofeng Cao, Siwei Ma
-
SPS
IEEE Members: $11.00
Non-members: $15.00
As multimodal information is complementary, effectively utilizing scene multimodal information has become an increasingly important research topic for many scholars. This paper proposes a novel multi-scale global learning strategy that utilizes both echo and visual modal data as inputs to estimate scene depth. The framework involves constructing a multi-scale feature extraction method using pyramid pooling modules to aggregate contextual information from different regions and improve global information acquisition ability. Furthermore, a recurrent multi-scale feature modulation module is introduced to generate more semantic and accurate spatial representations in each iteration update process. Additionally, a multi-scale fusion method is constructed for the fusion of echo and visual modalities. The proposed method's superior performance is demonstrated through sufficient experiments conducted on the Replica dataset.