Depth Estimation From Single Image Through Multi-Path-Multi-Rate Diverse Feature Extractor
Wen-Yi Lo, Ching-Te Chiu, Jie-Yu Luo
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 12:36
Convolutional neural networks can effectively learn features and predict the depth by considering different scene types. However, previous studies have not accurately predicted the depth in cases wherein the objects or scenes were small and the background was complex. These studies have used the bilinear up-sampling method to enlarge the feature maps during training, or to disable the transfer of multiscale information to the end of the network. However, this has resulted in blurred regions in the depth maps and contour loss. This paper proposes a multi-path-multi-rate feature extractor, which can effectively extract multi-scale information to make accurate depth predictions. We used the U-NET architecture to obtain depth maps with high resolution, and also used the proposed multi-path-multi-rate feature extractor to translate useful features from the encoder to the decoder. Dilated convolutions with different rates can provide different types of field-of-view information, which increases the precision of depth estimation and maintains the object contours. Finally, we conducted experiments using an indoor scene (NYUv2). The results show that the proposed framework achieved an improvement of 12.9% in RMSE, 9.9% in REL, and 9.3% in log10, and it requires approximately 0.048 seconds to predict a depth map from a single image.