Deep Monocular Video Depth Estimation Using Temporal Attention

Haoyu Ren, Mostafa El-khamy, Jungwon Lee

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 14:58

04 May 2020

Monocular video depth estimation (MVDE) plays a crucial role in 3D computer vision. In this paper, we propose an end-to-end monocular video depth estimation network based on temporal attention. Our network starts by a motion compensation module where the spatial temporal transformer network (STN) is utilized to warp the input frames using the estimated optical flow. Next, a temporal attention module is used to combine features from the warped frames, while emphasizing the temporal consistency. A monocular depth estimation network is used to estimate the depth from the temporally combined features. Experimental results demonstrate that our proposed framework achieves better performance compared to the state-of-the-art single image depth estimation (SIDE) networks, as well as existing MVDE methods.

Tags:

sps conference

icassp 2020 virtual conference

May 2020

icassp 2020