Contextualized CNN for Scene-Aware Depth Estimation From Single RGB Image. IEEE Trans. Multimedia
Shaui Li, Ji Liu, Aimin Hao, Qinping Zhao, Hong Qin, Wenfeng Song
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 01:43
Directly benefited from deep learning techniques, depth estimation from single image has gained great momentum in recent years. However, most of the existing approaches treat depth prediction as an isolated problem without taking into consideration high-level semantic context information, which results in inefficient utilization of training dataset and unavoidably requires a large number of captured depth data during the training phase. To ameliorate, this paper develops a novel scene-aware contextualized convolution neural network (CCNN), which characterizes the semantic context relationship at the class-level and refines depth at the pixel-level. Our newly-proposed CCNN is built upon the intrinsic exploitation of context-dependent depth association, including inner-object continuous depth and inter-object depth change priors nearby. Specifically, rather than conducting regression on depth in single CNN, we make the first attempt to integrate both class-level and pixel-level conditional random fields (CRFs) based probabilistic graphical model into the powerful CNN framework to simultaneously learn different-level features within the same CNN layer. With our CCNN, the former model will guide the latter one to learn the contextualized RGB-Depth mapping. Hence, CCNN has desirable properties in both class-level integrity and pixel-level discrimination, which makes it ideal to share such two-level convolutional features in parallel during the end-to-end training with the commonly-used back-propagation algorithm. We conduct extensive experiments and comprehensive evaluations on public benchmarks involving various indoor and outdoor scenes, and all the experiments confirm that, our method outperforms the state-of-the-art depth estimation methods, especially for the cases where only small-scale training data are readily available.