CLIP4STEREO: REVISITING DOMAIN GENERALIZED STEREO MATCHING VIA CLIP
Chihao Ma, Pengcheng Zeng, Jucai Zhai, Yang Liu, Yong Zhao, Xinan Wang
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Despite supervised deep stereo matching networks have achieved impressive performance given sufficient training data, the poor generalization ability caused by the domain shifts prevents them from being applied to unseen domains. Recent progress has shown that CLIP could be a promising alternative for zero-shot visual representation task under the natural language supervision. In this paper, we present a new framework for domain generalized stereo matching by leveraging the contrastive language-image pre-training (CLIP), which distills text-guided discriminative content information rather than task-irrelevant style information. Extensive experiments show that the model generalization ability can be improved significantly in the unseen domain when transferring from SceneFlow to Middlebury, ETH3D and KITTI.