Huoling Luo1, Congcong Wang2, Xingguang Duan3, Hao Liu4, Ping Wang5, Qingmao Hu1, Fucang Jia6. 1. Research Lab for Medical Imaging and Digital Surgery, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China; Shenzhen College of Advanced Technology, University of Chinese Academy of Sciences, Shenzhen, China. 2. School of Computer Science and Engineering, Tianjin University of Technology, Tianjin, China; Department of Computer Science, Norwegian University of Science and Technology, Trondheim, Norway. 3. Advanced Innovation Centre for Intelligent Robots & Systems, Beijing Institute of Technology, Beijing, China. 4. State Key Lab for Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang, China. 5. Department of Hepatobiliary Surgery, The First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China. 6. Research Lab for Medical Imaging and Digital Surgery, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China; Shenzhen College of Advanced Technology, University of Chinese Academy of Sciences, Shenzhen, China; Pazhou Lab, Guangzhou, China. Electronic address: fc.jia@siat.ac.cn.
Abstract
BACKGROUND: Learning-based methods have achieved remarkable performances on depth estimation. However, the premise of most self-learning and unsupervised learning methods is built on rigorous, geometrically-aligned stereo rectification. The performances of these methods degrade when the rectification is not accurate. Therefore, we explore an approach for unsupervised depth estimation from stereo images that can handle imperfect camera parameters. METHODS: We propose an unsupervised deep convolutional network that takes rectified stereo image pairs as input and outputs corresponding dense disparity maps. First, a new vertical correction module is designed for predicting a correction map to compensate for the imperfect geometry alignment. Second, the left and right images, which are reconstructed based on the input image pair and corresponding disparities as well as the vertical correction maps, are regarded as the outputs of the generative term of the generative adversarial network (GAN). Then, the discriminator term of the GAN is used to distinguish the reconstructed images from the original inputs to force the generator to output increasingly realistic images. In addition, a residual mask is introduced to exclude pixels that conflict with the appearance of the original image in the loss calculation. RESULTS: The proposed model is validated on the publicly available Stereo Correspondence and Reconstruction of Endoscopic Data (SCARED) dataset and the average MAE is 3.054 mm. CONCLUSION: Our model can effectively handle imperfect rectified stereo images for depth estimation.
BACKGROUND: Learning-based methods have achieved remarkable performances on depth estimation. However, the premise of most self-learning and unsupervised learning methods is built on rigorous, geometrically-aligned stereo rectification. The performances of these methods degrade when the rectification is not accurate. Therefore, we explore an approach for unsupervised depth estimation from stereo images that can handle imperfect camera parameters. METHODS: We propose an unsupervised deep convolutional network that takes rectified stereo image pairs as input and outputs corresponding dense disparity maps. First, a new vertical correction module is designed for predicting a correction map to compensate for the imperfect geometry alignment. Second, the left and right images, which are reconstructed based on the input image pair and corresponding disparities as well as the vertical correction maps, are regarded as the outputs of the generative term of the generative adversarial network (GAN). Then, the discriminator term of the GAN is used to distinguish the reconstructed images from the original inputs to force the generator to output increasingly realistic images. In addition, a residual mask is introduced to exclude pixels that conflict with the appearance of the original image in the loss calculation. RESULTS: The proposed model is validated on the publicly available Stereo Correspondence and Reconstruction of Endoscopic Data (SCARED) dataset and the average MAE is 3.054 mm. CONCLUSION: Our model can effectively handle imperfect rectified stereo images for depth estimation.