Towards Reliable Image Outpainting: Learning Structure-Aware Multimodal Fusion with Depth Guidance

Lei Zhang (Beijing Jiaotong University); Chunyu Lin (Beijing Jiaotong University); Kang Liao (Beijing Jiaotong University); Yao Zhao (Beijing Jiaotong University)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

09 Jun 2023

Image outpainting technology generates visually plausible content regardless of authenticity, making it unreliable to be applied in practice. Thus, we propose a reliable image outpainting task, introducing the sparse depth from LiDARs (Light Detection And Ranging devices) to extrapolate authentic RGB scenes. The large field view of LiDARs allows it to serve for data enhancement and further multimodal tasks. Concretely, we propose a Depth-Guided Outpainting Network to model different feature representations of two modalities and learn the structure-aware cross-modal fusion. And two components are designed: 1) The Multimodal Learning Module produces unique depth and RGB feature representations from the perspectives of different modal characteristics. 2) The Depth Guidance Fusion Module leverages the complete depth modality to guide the establishment of RGB contents by progressive multimodal feature fusion. Furthermore, we specially design an additional constraint strategy consisting of Cross-modal Loss and Edge Loss to enhance ambiguous contours and expedite reliable content generation. Extensive experiments on KITTI and Waymo datasets demonstrate our superiority over the state-of-the-art method, quantitatively and qualitatively.

Tags:

Image and video representation

Towards Reliable Image Outpainting: Learning Structure-Aware Multimodal Fusion with Depth Guidance

Lei Zhang (Beijing Jiaotong University); Chunyu Lin (Beijing Jiaotong University); Kang Liao (Beijing Jiaotong University); Yao Zhao (Beijing Jiaotong University)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

SSGD: A smartphone screen glass dataset for defect detection

YOLOX-B: A BETTER YOLOX MODEL FOR REAL-TIME DRIVER BEHAVIOR DETECTION

MEMORY-AUGMENTED CONTRASTIVE LEARNING FOR TALKING HEAD GENERATION

Join the IEEE Signal Processing Society