| Literature DB >> 35869249 |
Daichi Kitaguchi1,2, Toru Fujino1, Nobuyoshi Takeshita1,2, Hiro Hasegawa1,2, Kensaku Mori3, Masaaki Ito4,5.
Abstract
Clarifying the generalizability of deep-learning-based surgical-instrument segmentation networks in diverse surgical environments is important in recognizing the challenges of overfitting in surgical-device development. This study comprehensively evaluated deep neural network generalizability for surgical instrument segmentation using 5238 images randomly extracted from 128 intraoperative videos. The video dataset contained 112 laparoscopic colorectal resection, 5 laparoscopic distal gastrectomy, 5 laparoscopic cholecystectomy, and 6 laparoscopic partial hepatectomy cases. Deep-learning-based surgical-instrument segmentation was performed for test sets with (1) the same conditions as the training set; (2) the same recognition target surgical instrument and surgery type but different laparoscopic recording systems; (3) the same laparoscopic recording system and surgery type but slightly different recognition target laparoscopic surgical forceps; (4) the same laparoscopic recording system and recognition target surgical instrument but different surgery types. The mean average precision and mean intersection over union for test sets 1, 2, 3, and 4 were 0.941 and 0.887, 0.866 and 0.671, 0.772 and 0.676, and 0.588 and 0.395, respectively. Therefore, the recognition accuracy decreased even under slightly different conditions. The results of this study reveal the limited generalizability of deep neural networks in the field of surgical artificial intelligence and caution against deep-learning-based biased datasets and models.Trial Registration Number: 2020-315, date of registration: October 5, 2020.Entities:
Mesh:
Year: 2022 PMID: 35869249 PMCID: PMC9307578 DOI: 10.1038/s41598-022-16923-8
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1Representative images of recognition target surgical instruments in this study. (A) Surgical instruments contained in the training set (T1: harmonic shears; T2: endoscopic surgical electrocautery; T3: Aesculap AdTec atraumatic universal forceps). (B) Laparoscopic surgical forceps not contained in the training set (T4: Maryland; T5: Croce-Olmi; T6: needle holder).
Dataset characteristics.
| Number of videos | Number of annotated images | Laparoscopic recording system | Recognition target surgical instruments | Type of surgery | |
|---|---|---|---|---|---|
| Training set | 85 | 4788 | Olympus | T1, T2, T3 | LCRR |
| Validation set | 9 | 345 | |||
| Test set 1 | 10 | 369 | |||
| 5 | 103 | T1, T2, T3 | LCRR | ||
| Sub test set 2.1 | 2 | 40 | Stryker | ||
| Sub test set 2.2 | 3 | 63 | Karl Storz | ||
| 3 | 124 | LCRR | |||
| Sub test set 3.1 | 1 | 31 | Olympus | T4 | |
| Sub test set 3.2 | 1 | 74 | T5 | ||
| Sub test set 3.3 | 1 | 19 | T6 | ||
| 16 | 223 | T1, T2, T3 | |||
| Sub test set 4.1 | 5 | 65 | Olympus | LDG | |
| Sub test set 4.2 | 5 | 81 | LC | ||
| Sub test set 4.3 | 6 | 77 | LPH |
T1 harmonic shears, T2 endoscopic surgical electrocautery, T3 Aesculap AdTec atraumatic universal forceps, T4 Maryland, T5 Croce-Olmi, T6 needle holder; LCRR laparoscopic colorectal resection, LDG laparoscopic distal gastrectomy, LC laparoscopic cholecystectomy, LPH laparoscopic partial hepatectomy.
Figure 2Surgical-instrument recognition-accuracy results (AP average precision, IoU intersection over union, mAP mean average precision, mIoU mean intersection over union). (A) AP and IoU under the same condition as the training set (T1: harmonic shears; T2: endoscopic surgical electrocautery; T3: Aesculap AdTec atraumatic universal forceps). (B) mAP and mIoU for different types of laparoscopic recording systems. (C) AP and IoU for different types of laparoscopic surgical forceps (T3: Aesculap AdTec atraumatic universal forceps; T4: Maryland; T5: Croce-Olmi; T6: needle holder). (D) mAP and mIoU for different types of surgery (LCRR laparoscopic colorectal resection, LDG laparoscopic distal gastrectomy, LC laparoscopic cholecystectomy, LPH laparoscopic partial hepatectomy).
Figure 3Representative images recorded by each laparoscopic recording system. (A) Endoeye laparoscope (Olympus Co., Ltd., Tokyo, Japan) and Visera Elite II system (Olympus Co., Ltd, Tokyo, Japan). (B) 1488 HD 3-Chip camera system (Stryker Corp., Kalamazoo, MI, USA). (C) Image 1 S camera system (Karl Storz SE & Co., KG, Tuttlingen, Germany).
Figure 4Representative images of each type of surgery. (A) LCRR; (B) LDG; (C) LC; (D) LPH.
Surgical-instrument segmentation accuracy for each test set.
| AP | IoU | |
|---|---|---|
| T1 | 0.958 (± 0.015) | 0.892 (± 0.011) |
| T2 | 0.969 (± 0.011) | 0.895 (± 0.011) |
| T3 | 0.895 (± 0.009) | 0.876 (± 0.001) |
| Mean | 0.941 (± 0.035) | 0.887 (± 0.012) |
| Sub test set 2.1 (Stryker) | 0.893 (± 0.021) | 0.608 (± 0.068) |
| Sub test set 2.2 (Karl Storz) | 0.839 (± 0.021 | 0.735 (± 0.019) |
| Sub test set 3.1 (T4) | 0.715 (± 0.010) | 0.678 (± 0.014) |
| Sub test set 3.2 (T5) | 0.756 (± 0.020) | 0.592 (± 0.008) |
| Sub test set 3.3 (T6) | 0.846 (± 0.041) | 0.758 (± 0.020) |
| Sub test set 4.1 (LDG) | 0.782 (± 0.013) | 0.565 (± 0.025) |
| Sub test set 4.2 (LC) | 0468 (± 0.071) | 0.300 (± 0.022) |
| Sub test set 4.3 (LPH) | 0.513 (± 0.051) | 0.319 (± 0.022) |
Mean (± SD).
AP: average precision, IoU intersection over union, T1 harmonic shears, T2 endoscopic surgical electrocautery, T3 Aesculap AdTec atraumatic universal forceps, T4 Maryland, T5 Croce-Olmi, T6 needle holder, LDG laparoscopic distal gastrectomy, LC laparoscopic cholecystectomy, LPH laparoscopic partial hepatectomy, SD standard deviation.