| Literature DB >> 33909264 |
Jorge F Lazo1,2, Aldo Marzullo3, Sara Moccia4,5, Michele Catellani6, Benoit Rosa7, Michel de Mathelin7, Elena De Momi8.
Abstract
PURPOSE: Ureteroscopy is an efficient endoscopic minimally invasive technique for the diagnosis and treatment of upper tract urothelial carcinoma. During ureteroscopy, the automatic segmentation of the hollow lumen is of primary importance, since it indicates the path that the endoscope should follow. In order to obtain an accurate segmentation of the hollow lumen, this paper presents an automatic method based on convolutional neural networks (CNNs).Entities:
Keywords: Convolutional neural networks; Deep learning; Image segmentation; Upper tract urothelial carcinoma (UTUC); Ureteroscopy
Mesh:
Year: 2021 PMID: 33909264 PMCID: PMC8166718 DOI: 10.1007/s11548-021-02376-3
Source DB: PubMed Journal: Int J Comput Assist Radiol Surg ISSN: 1861-6410 Impact factor: 2.924
Fig. 1Sample images in our dataset showing: a the hue variability of the surrounding tissue as well as the shape and location of the lumen (the hollow lumen is highlighted in green to show clearly the variety of shapes in which it could appear). b–e Samples of artifacts (the lumen was not highlighted to have a clear view of the image artifacts)
Fig. 2Workflow of the proposed ensemble for lumen segmentation in ureteroscopic videos. Blocks of 3 consecutive frames of size (where p and q refer to the spatial dimensions and to the number of channels of each individual frame) are fed into the ensemble. Models and (orange line) take these blocks as input, whereas models and only take the central frame (red line). Each of the predictions made by each model is ensembled with the function defined in Eq. 1 to perform the final output
Fig. 3The initial stage of the models M. The blocks of consecutive frames of size (where p and q refer to the spatial dimensions and to the number of channels (ch) of each individual frame) pass through an initial 3D convolution with number of kernels. The output of this step has a shape of size which is padding with zeros in the second and third dimensions to latter, and then reshaped to fit as input for the m core-models
Information about the dataset collected
| Patient no. | Video no. | No. of annotated frames | Image size (pixels) |
|---|---|---|---|
| 1 | Video 1 | 21 | 356 |
| 1 | Video 2 | 240 | 256 |
| 2 | Video 3 | 462 | 296 |
| 2 | Video 4 | 234 | 296 |
| 3 | Video 5 | 51 | 296 |
| 4 | Video 6 | 201 | 296 |
| 6 | Video 8 | 387 | 256 |
| 6 | Video 9 | 234 | 256 |
| 6 | Video 10 | 117 | 256 |
| 6 | Video 11 | 360 | 256 |
| Total | – | 2673 | – |
The video marked in bold indicates the patient-case that was used for testing
Fig. 4Box plots of the precision (Prec), recall (Rec) and the Dice similarity coefficient (DSC) for the models tested. (yellow): ResUNet with single image frames, (green): ResUNet using consecutive temporal frames, (brown): Mask-RCNN with single image frames, (pink): Mask-RCNN using consecutive temporal frames, and the proposed ensemble method (blue) formed by all the previous models. The asterisks represent the significant difference between the different architectures in terms of the Kruskal–Wallis sign rank test (*, **, ***)
Average Dice similarity coefficient (DSC), precision (Prec) and recall (Rec) in the cases in which the ensembles were formed only by: 1. spatial models ; 2. spatial-temporal , 3. ResUnet with both spatial and temporal inputs and 4. Mask-RCNN with the same setup
| DSC | Prec | Rec | |
|---|---|---|---|
| ( | 0.78 | 0.65 | 0.71 |
| ( | 0.71 | 0.55 | 0.57 |
| ( | 0.72 | 0.56 | 0.66 |
| ( | 0.68 | 0.51 | 0.63 |
refers to the ensemble function used (Eq. 1), and the components used to form the ensemble are stated between the parenthesis
Fig. 5Samples of segmentation with the different models test. The colors in the overlay images represent the following for each pixel. True positive (TP): yellow, false positive (FP): pink, false negative (FN): blue, true negative (TN): black. The first two rows depict images where the lumen is clear with the respective segmentation from each model. Rows 3–4 show cases in which some kind of occlusion appears. Finally, the rows 5–6 depict cases in which the lumen is contracted, and/or there is debris crossing the FOV