| Literature DB >> 35246913 |
Erica Padovan1, Giorgia Marullo1, Leonardo Tanzi1, Pietro Piazzolla2, Sandro Moos1, Francesco Porpiglia2, Enrico Vezzetti1.
Abstract
INTRODUCTION: The current study presents a deep learning framework to determine, in real-time, position and rotation of a target organ from an endoscopic video. These inferred data are used to overlay the 3D model of patient's organ over its real counterpart. The resulting augmented video flow is streamed back to the surgeon as a support during laparoscopic robot-assisted procedures.Entities:
Keywords: Kidney; abdominal; prostate
Mesh:
Year: 2022 PMID: 35246913 PMCID: PMC9286374 DOI: 10.1002/rcs.2387
Source DB: PubMed Journal: Int J Med Robot ISSN: 1478-5951 Impact factor: 2.483
FIGURE 1Schematic illustration of the system workflow. The video frame is fed to a Segmentation Convolutional Neural Networks (CNN) and a Rotation CNN, depending on the Case Study. Target's position and scale are obtained through the Segmentation CNN for every Case Study. Target's orientation is retrieved in different ways depending on the Case Study: Rotation CNN output the predicted rotation for each frame (Case Study 1); RotationCNN for registration in an initial frame and OF tracking for next frames (Case Study 2); Manual Registration in an initial frame and OF tracking for next frames (Case Study 3)
Values of mean accuracies when testing with a synthetic dataset, a real dataset and the same real dataset after fine‐tuning the network
| Accuracies [−5, +5] | Accuracies [−10, +10] | # of images | ||||||
|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
| |||
| Synthetic dataset | C | 0.9987 | ‐ | ‐ | 0.9987 | ‐ | ‐ | 3000 |
| P | 0.9971 | 1 | 0.9983 | 0.9994 | 1 | 0.9994 | 1750 | |
| K | 0.9990 | 1 | 0.9990 | 0.9997 | 1 | 0.9987 | 3000 | |
| Real dataset | C | 0.5333 | 0.9457 | ‐ | 0.7333 | 0.9886* | ‐ | 30 |
| P | 0.5000 | 0.0000 | 0.2778 | 0.5556 | 0.0833 | 0.3611 | 36 | |
| K | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.2000 | 30 | |
| Real dataset (after fine tuning) | P | 0.3333 | 0.5000 | 0.6667 | 0.7778 | 0.7222 | 0.9444 | 36 |
| K | 0.0000 | 0.0000 | 0.5000 | 0.0000 | 0.0000 | 0.7333 | 30 | |
Note: Each value is computed for the considered target classes: catheter (C), prostate (P) and kidney (K). The accuracies for X, Y and Z, were computed as the number of correct predictions over the total number of samples, using different acceptable ranges: prediction with an error in the range [−5, +5] and in the range [−10, +10].
*Y−Axis rotation values for C were retrieved from the semantic segmentation CNN.
FIGURE 2Rotation Neural Network overall architecture, which involves a modified version of the ResNet50 model, with a branch for each axis rotation
Values referring to the images used for datasets creation, both synthetic and real, for segmentation and rotation Convolutional Neural Networks
| Segmentation | Rotation | ||||
|---|---|---|---|---|---|
| Surgical operation | Case study | Target | # Real images | # Synthetic images | Rotation ranges in degrees (−X,+X)/(−Y,+Y)/(−Z, +Z) |
| RARP | 1 | Catheter | 375 | 40000 | (−40, 10)/‐/‐ |
| RARP | 2 | Prostate | 388 | 35000 | (−15, 20)/(−25, 25)/(−5, 15) |
| RAPN | 3 | Kidney | 208 | 40000 | (−10, 10)/(−10, 10)/(−10, 10) |
Note: For the latter, considered rotation ranges are also shown. For the catheter, the Rotation CNN was trained to predict only the X rotation, as Y was derived directly from the segmentation map and Z was considered irrelevant.
IoU scores referring to the mean value of the actor classes (background, tool and target) and the target class only: catheter (C), prostate (P) and kidney (K)
| IoU score | ||
|---|---|---|
| Target class | Mean | |
| C | 0.9450 | 0.8940 |
| P | 0.7296 | 0.8067 |
| K | 0.8602 | 0.9069 |
FIGURE 3Sample images referring to original endoscopic images, segmentation masks and mask overlay (columns) for each case study (rows): catheter (C), prostate (P) and kidney (K)
FIGURE 4Trends representation of organ orientation on X, Y and Z axes comparing tagged (blue) and OF estimated (red) rotation values for each sample frame