Marco Visentini-Scarzanella1, Takamasa Sugiura2, Toshimitsu Kaneko2, Shinichiro Koto2. 1. Multimedia Laboratory, Toshiba Corporate Research and Development Center, 1, Komukai-Toshiba-cho, Kawasaki, 212-8582, Japan. marco.visentiniscarzanella@gmail.com. 2. Multimedia Laboratory, Toshiba Corporate Research and Development Center, 1, Komukai-Toshiba-cho, Kawasaki, 212-8582, Japan.
Abstract
PURPOSE: In bronchoschopy, computer vision systems for navigation assistance are an attractive low-cost solution to guide the endoscopist to target peripheral lesions for biopsy and histological analysis. We propose a decoupled deep learning architecture that projects input frames onto the domain of CT renderings, thus allowing offline training from patient-specific CT data. METHODS: A fully convolutional network architecture is implemented on GPU and tested on a phantom dataset involving 32 video sequences and [Formula: see text]60k frames with aligned ground truth and renderings, which is made available as the first public dataset for bronchoscopy navigation. RESULTS: An average estimated depth accuracy of 1.5 mm was obtained, outperforming conventional direct depth estimation from input frames by 60%, and with a computational time of [Formula: see text]30 ms on modern GPUs. Qualitatively, the estimated depth and renderings closely resemble the ground truth. CONCLUSIONS: The proposed method shows a novel architecture to perform real-time monocular depth estimation without losing patient specificity in bronchoscopy. Future work will include integration within SLAM systems and collection of in vivo datasets.
PURPOSE: In bronchoschopy, computer vision systems for navigation assistance are an attractive low-cost solution to guide the endoscopist to target peripheral lesions for biopsy and histological analysis. We propose a decoupled deep learning architecture that projects input frames onto the domain of CT renderings, thus allowing offline training from patient-specific CT data. METHODS: A fully convolutional network architecture is implemented on GPU and tested on a phantom dataset involving 32 video sequences and [Formula: see text]60k frames with aligned ground truth and renderings, which is made available as the first public dataset for bronchoscopy navigation. RESULTS: An average estimated depth accuracy of 1.5 mm was obtained, outperforming conventional direct depth estimation from input frames by 60%, and with a computational time of [Formula: see text]30 ms on modern GPUs. Qualitatively, the estimated depth and renderings closely resemble the ground truth. CONCLUSIONS: The proposed method shows a novel architecture to perform real-time monocular depth estimation without losing patient specificity in bronchoscopy. Future work will include integration within SLAM systems and collection of in vivo datasets.
Entities:
Keywords:
3D reconstruction; Assisted navigation; Bronchoscopy; Deep learning
Authors: Daniel J Mirota; Hanzi Wang; Russell H Taylor; Masaru Ishii; Gary L Gallia; Gregory D Hager Journal: IEEE Trans Med Imaging Date: 2011-11-18 Impact factor: 10.048
Authors: L Maier-Hein; P Mountney; A Bartoli; H Elhawary; D Elson; A Groch; A Kolb; M Rodrigues; J Sorger; S Speidel; D Stoyanov Journal: Med Image Anal Date: 2013-05-03 Impact factor: 8.545
Authors: Felix J F Herth; Ralf Eberhardt; Daniel Sterman; Gerard A Silvestri; Hans Hoffmann; Pallav L Shah Journal: Thorax Date: 2015-03-06 Impact factor: 9.139
Authors: Ji Young Yoo; Se Yoon Kang; Jong Sun Park; Young-Jae Cho; Sung Yong Park; Ho Il Yoon; Sang Jun Park; Han-Gil Jeong; Tackeun Kim Journal: Sci Rep Date: 2021-12-09 Impact factor: 4.379