Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 14:58
04 May 2020

Monocular video depth estimation (MVDE) plays a crucial role in 3D computer vision. In this paper, we propose an end-to-end monocular video depth estimation network based on temporal attention. Our network starts by a motion compensation module where the spatial temporal transformer network (STN) is utilized to warp the input frames using the estimated optical flow. Next, a temporal attention module is used to combine features from the warped frames, while emphasizing the temporal consistency. A monocular depth estimation network is used to estimate the depth from the temporally combined features. Experimental results demonstrate that our proposed framework achieves better performance compared to the state-of-the-art single image depth estimation (SIDE) networks, as well as existing MVDE methods.

Value-Added Bundle(s) Including this Product

More Like This

  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00