A LIGHTWEIGHT SELF-SUPERVISED TRAINING FRAMEWORK FOR MONOCULAR DEPTH ESTIMATION
Tim Heydrich, Yimin Yang, Shan Du
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:06:56
Depth estimation attracts great interest in various sectors such as robotics, human computer interfaces, intelligent visual surveillance, and wearable augmented reality gear. Monocular depth estimation is of particular interest due to its low complexity and cost. Research in recent years was shifted away from supervised learning towards unsupervised or self-supervised approaches. While there have been great achievements, most of the research has focused on large heavy networks which are highly resource intensive that makes them unsuitable for systems with limited resources. We are particularly concerned about the increased complexity during training that current self-supervised approaches bring. In this paper, we propose a lightweight self-supervised training framework which utilizes computationally cheap methods to compute ground truth approximations. In particular, we utilize a stereo pair of images during training which are used to compute photometric reprojection loss and a disparity ground truth approximation. Due to the ground truth approximation, our framework is able to remove the need of pose estimation and the corresponding heavy prediction networks that current self-supervised methods have. In the experiments, we have demonstrated that our framework is capable of increasing the generator?s performance at a fraction of the size required by the current state-of-the-art self-supervised approach.