Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
Poster 10 Oct 2023

Although context-based monocular depth estimation has shown remarkable improvement, the adaptation to unseen contexts is still a major challenge. On the other hand, the use of physical depth cues, such as defocus associated with lens aberration, allows context-independent depth estimation. However, explicitly supervising physical depth cues would have a significant impact on cost and versatility, because of the need to use expensive equipment to obtain the ground truth. Therefore, we propose a novel self-supervised learning for a single-shot neural depth from defocus (DfD) utilizing structure from motion (SfM) images taken by the target lens. Since the scale of SfM depth is ambiguous, we used rank loss to train the network. To demonstrate the versatility of our method, we conducted validation experiments using not only DSLR cameras but also smartphones with small image sensors. We confirmed that our method is highly accurate by a large margin over state-of-the-art methods including the physically-calibrated neural single-shot DfD and context-based methods.