| Literature DB >> 34707141 |
Yuta Kumazu1,2, Nao Kobayashi2, Naoki Kitamura3, Elleuch Rayan3, Paul Neculoiu3, Toshihiro Misumi4, Yudai Hojo5, Tatsuro Nakamura5, Tsutomu Kumamoto5, Yasunori Kurahashi5, Yoshinori Ishida5, Munetaka Masuda1, Hisashi Shinohara6.
Abstract
The prediction of anatomical structures within the surgical field by artificial intelligence (AI) is expected to support surgeons' experience and cognitive skills. We aimed to develop a deep-learning model to automatically segment loose connective tissue fibers (LCTFs) that define a safe dissection plane. The annotation was performed on video frames capturing a robot-assisted gastrectomy performed by trained surgeons. A deep-learning model based on U-net was developed to output segmentation results. Twenty randomly sampled frames were provided to evaluate model performance by comparing Recall and F1/Dice scores with a ground truth and with a two-item questionnaire on sensitivity and misrecognition that was completed by 20 surgeons. The model produced high Recall scores (mean 0.606, maximum 0.861). Mean F1/Dice scores reached 0.549 (range 0.335-0.691), showing acceptable spatial overlap of the objects. Surgeon evaluators gave a mean sensitivity score of 3.52 (with 88.0% assigning the highest score of 4; range 2.45-3.95). The mean misrecognition score was a low 0.14 (range 0-0.7), indicating very few acknowledged over-detection failures. Thus, AI can be trained to predict fine, difficult-to-discern anatomical structures at a level convincing to expert surgeons. This technology may help reduce adverse events by determining safe dissection planes.Entities:
Mesh:
Year: 2021 PMID: 34707141 PMCID: PMC8551298 DOI: 10.1038/s41598-021-00557-3
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Deep learning algorithm and the AI model developed in this study. (a) The deep-learning architecture implementing U-Net. Conv, convolution; concat, concatenation. (b) Development and performance evaluation of the AI model. MR misrecognition.
Figure 2The questionnaire for qualitative evaluation of the AI’s segmentation performance completed by expert surgeons.
Figure 3Comparison of segmentation performance at different stages in deep learning. (a) An original frame. CHA common hepatic artery, F fat tissue, LN lymph node; *, nerve. (b) Magnified view of the square in A showing prediction of loose connective-tissue fibers (LCTFs) highlighted in turquoise by the prototype AI model. White circle indicates an area of over-detection. (c) Prediction by the latest AI model. Arrows indicate LCTFs that could not be detected by the prototype AI model.
Performance metrics and qualitative scores in the 20 randomly sampled video frames.
| Frame | Recall score | F1/Dice score | Sensitivity score mean (SD) | MR score mean (SD) |
|---|---|---|---|---|
| 1 | 0.792 | 0.532 | 3.80 (0.41) | 0.05 (0.22) |
| 2 | 0.522 | 0.509 | 3.40 (0.50) | 0.20 (0.41) |
| 3 | 0.583 | 0.587 | 3.90 (0.31) | 0.55 (0.60) |
| 4 | 0.338 | 0.341 | 3.75 (0.44) | 0.70 (0.57) |
| 5 | 0.626 | 0.630 | 3.35 (0.75) | 0 |
| 6 | 0.750 | 0.642 | 3.90 (0.31) | 0 |
| 7 | 0.445 | 0.571 | 3.20 (0.70) | 0.20 (0.41) |
| 8 | 0.609 | 0.462 | 3.90 (0.70) | 0 |
| 9 | 0.819 | 0.601 | 3.90 (0.70) | 0 |
| 10 | 0.861 | 0.587 | 3.95 (0.22) | 0.10 (0.31) |
| 11 | 0.230 | 0.335 | 2.50 (0.61) | 0 |
| 12 | 0.458 | 0.521 | 2.95 (0.76) | 0 |
| 13 | 0.777 | 0.691 | 3.80 (0.41) | 0 |
| 14 | 0.667 | 0.649 | 3.55 (0.60) | 0.35 (0.59) |
| 15 | 0.544 | 0.511 | 3.25 (0.55) | 0.05 (0.22) |
| 16 | 0.748 | 0.621 | 3.65 (0.59) | 0 |
| 17 | 0.660 | 0.575 | 3.75 (0.55) | 0.05 (0.22) |
| 18 | 0.705 | 0.590 | 3.95 (0.22) | 0.45 (0.51) |
| 19 | 0.454 | 0.493 | 2.45 (0.60) | 0.05 (0.22) |
| 20 | 0.538 | 0.541 | 3.40 (0.60) | 0 |
| Mean | 0.606 | 0.549 | 3.52 (0.46) | 0.14 (0.21) |
MR misrecognition, SD standard deviation.
Figure 4Relations between computed performance metrics and qualitative scores. (a) A mosaic diagram showing the distribution of all scores assigned by 20 evaluators to 20 randomly sampled frames. Blue, light blue, and gray panels respectively represent scores of 4, 3, and 2 for Question 1 (see Fig. 2). Vertical and horizontal axes respectively represent the proportion of scores assigned to Questions 1 and 2. Values in the rectangles represent the ratio of each category against the total. There were no scores below 1 for Question 1 and no scores above 3 for Question 2. (b) Scatter plot showing the relation between sensitivity and misrecognition (MR) scores for each frame. Blue area is the confidence ellipse, representing the area of 95% probability that the plots exist. (c) Scatter plot showing the relation between sensitivity and Recall scores. The correlation coefficient was 0.733 and the 95% confidence interval was 0.430–0.887. Blue line represents the regression formula, calculated as Y = 2.302 + 2.001X. Y sensitivity score, X Recall score.
Figure 5AI prediction results for (a) frame 6 with the highest sensitivity score and (b) frame 19 with the lowest sensitivity score. The area surrounded by the broken line is an under-detection area.
Figure 6Examples where the AI misrecognized (a) gauze mesh fiber, (b) fine grooves at the tips of forceps, and (c) minor halation of fat or blood surfaces as loose connective tissue.