| Literature DB >> 33930829 |
Kutsev Bengisu Ozyoruk1, Guliz Irem Gokceler2, Taylor L Bobrow3, Gulfize Coskun2, Kagan Incetan2, Yasin Almalioglu4, Faisal Mahmood5, Eva Curto6, Luis Perdigoto6, Marina Oliveira6, Hasan Sahin2, Helder Araujo6, Henrique Alexandrino7, Nicholas J Durr3, Hunter B Gilbert8, Mehmet Turan9.
Abstract
Deep learning techniques hold promise to develop dense topography reconstruction and pose estimation methods for endoscopic videos. However, currently available datasets do not support effective quantitative benchmarking. In this paper, we introduce a comprehensive endoscopic SLAM dataset consisting of 3D point cloud data for six porcine organs, capsule and standard endoscopy recordings, synthetically generated data as well as clinically in use conventional endoscope recording of the phantom colon with computed tomography(CT) scan ground truth. A Panda robotic arm, two commercially available capsule endoscopes, three conventional endoscopes with different camera properties, two high precision 3D scanners, and a CT scanner were employed to collect data from eight ex-vivo porcine gastrointestinal (GI)-tract organs and a silicone colon phantom model. In total, 35 sub-datasets are provided with 6D pose ground truth for the ex-vivo part: 18 sub-datasets for colon, 12 sub-datasets for stomach, and 5 sub-datasets for small intestine, while four of these contain polyp-mimicking elevations carried out by an expert gastroenterologist. To verify the applicability of this data for use with real clinical systems, we recorded a video sequence with a state-of-the-art colonoscope from a full representation silicon colon phantom. Synthetic capsule endoscopy frames from stomach, colon, and small intestine with both depth and pose annotations are included to facilitate the study of simulation-to-real transfer learning algorithms. Additionally, we propound Endo-SfMLearner, an unsupervised monocular depth and pose estimation method that combines residual networks with a spatial attention module in order to dictate the network to focus on distinguishable and highly textured tissue regions. The proposed approach makes use of a brightness-aware photometric loss to improve the robustness under fast frame-to-frame illumination changes that are commonly seen in endoscopic videos. To exemplify the use-case of the EndoSLAM dataset, the performance of Endo-SfMLearner is extensively compared with the state-of-the-art: SC-SfMLearner, Monodepth2, and SfMLearner. The codes and the link for the dataset are publicly available at https://github.com/CapsuleEndoscope/EndoSLAM. A video demonstrating the experimental setup and procedure is accessible as Supplementary Video 1.Entities:
Keywords: Capsule endoscopy; Monocular depth estimation; SLAM dataset; Spatial attention module; Standard endoscopy; Visual odometry
Year: 2021 PMID: 33930829 DOI: 10.1016/j.media.2021.102058
Source DB: PubMed Journal: Med Image Anal ISSN: 1361-8415 Impact factor: 8.545