| Literature DB >> 35050468 |
Yitong Zhang1, Sophia Bano2, Ann-Sophie Page3, Jan Deprest3, Danail Stoyanov2, Francisco Vasconcelos2.
Abstract
PURPOSE: Laparoscopic sacrocolpopexy is the gold standard procedure for the management of vaginal vault prolapse. Studying surgical skills and different approaches to this procedure requires an analysis at the level of each of its individual phases, thus motivating investigation of automated surgical workflow for expediting this research. Phase durations in this procedure are significantly larger and more variable than commonly available benchmarks such as Cholec80, and we assess these differences.Entities:
Keywords: Laparoscopic sacrocolpopexy; Long short-term memory networks; Machine learning; Surgical workflow segmentation; Transformer networks
Mesh:
Year: 2022 PMID: 35050468 PMCID: PMC8873061 DOI: 10.1007/s11548-021-02544-5
Source DB: PubMed Journal: Int J Comput Assist Radiol Surg ISSN: 1861-6410 Impact factor: 2.924
Fig. 1Surgical phases of laparoscopic sacrocolpopexy with average duration in seconds: 1) promontory preparation (); 2) dissection of vault and gutter (); 3) mesh fixation to vault (); 4) mesh fixation to promontory (); 5) peritonealisation ()
Fig. 2Network architectures for coarse-level sequential models. The main differences from the sequence-to-sequence to the many-to-many model are: 1) the presence of an encoder-decoder structure, allowing input/output sequences to have different sizes; 2) In addition to a sequence of feature vectors (input sequence), the input to this model also includes a sequence of label classifications (target sequence). The colour legend can be referred to Fig. 2
Fig. 3Seq2seq Network Architecture with a sequential input of 100 clips. The length of the target and output sequence depends on the configuration of the network: a in the time-synchronous configuration, the target, input and output sequences correspond to the same time interval of 100 clips; b in the time-shifted configuration, the target and output sequences have a length of 90 time steps with a shift of 10 between them. Together they span a length of 100 clips, which corresponds to the size of the input sequence that is obtained from the Conv3D feature extractor. To obtain segmentations for consecutive sequences in a video, the seq2seq predictions become the target sequence of the next prediction iteration
Ablative phase recognition results(%) over different proposed architectures on Sacrocolpolpexy dataset the best among each configuration are shown as italic for 100 series and bolditalic for 90 series
| Architectures | Precision (Macro) | Recall (Macro) | F1-Score | Accuracy (Micro) | ||
|---|---|---|---|---|---|---|
| LSTM(L) | 100 | Baseline | 61.6 ± 6.7 | 74.8 ± 9.7 | 0.68 | 70.7 ± 9.0 |
| Pred | 72.8 ± 12.8 | 69.6 ± 17.6 | 0.71 | 80.4 ± 13.0 | ||
| Noised | 74.6 ± 11.8 |
|
|
| ||
| 90 | Baseline | 53.7 ± 24.1 | 54.4 ± 17.5 | 0.54 | 67.2 ± 22.3 | |
| Pred | 57.7 ± 16.0 | 59.0 ± 15.1 | 0.58 | 75.5 ± 20.2 | ||
| Noised | 53.5 ± 16.3 | 58.8 ± 11.7 | 0.56 | 76.5 ± 16.0 | ||
| Transformer(T) | 100 | Baseline | 64.6 ± 13.7 | 63.2 ± 14.7 | 0.64 | 73.1 ± 13.4 |
| Pred |
| 69.4 ± 14.2 | 0.72 | 80.6 ± 16.1 | ||
| Noised | 72.9 ± 14.2 | 68.6 ± 15.7 | 0.71 | 82.7 ± 13.5 | ||
| 90 | Baseline |
|
|
| 81.1 ± 15.5 | |
| Pred | 71.7 ± 14.2 | 65.1 ± 13.1 | 0.68 | 80.4 ± 14.1 | ||
| Noised | 74.9 ± 13.6 | 71.2 ± 15.5 | 0.73 |
| ||
Bold values indicate the best performance
Fig. 5Phase diagrams for six different configurations, showing output sequences from the best Sacrocolpopexy fold. Groundtruth (blue) and predicted (orange) labels are shown
Fig. 4Sacrocolpopexy per phase results: averaged confusion matrices(%) over all cross-validation folds normalized by the sample number of each phase with the two best methods in sequential models. (Note: transition phase is eliminated from the graph)
Comparison of the phase recognition results(%) with other methods on the Sacrocolpopexy and Cholec80 datasets
| Sacrocolpopexy (average of 1389 clips per video) | Cholec80 (average of 360 clips per video) | ||||||
|---|---|---|---|---|---|---|---|
| Method | Pre. (Macro) | Rec. (Macro) | F1-Score | Acc. (Micro) | Pre. (Macro) | Rec. (Macro) | Acc. (Micro) |
| C3D+LSTM+Tool(Endo3D)*[ | 81.3 | 91.2 | |||||
| ResNet-50+LSTM+PKI (SV-RCNet)*[ | 86.2 ± 15.3 | ||||||
| ResNet-50+LSTM *[ | 80.7 ± 7.0 | 83.5 ± 7.5 | 85.3 ± 7.3 | ||||
| ResNet-50+TCN(TeCNO Stage I)*[ | |||||||
| C3D | 58.5 ± 6.8 | 68.6 ± 10.1 | 0.63 | 69.2 ± 8.8 | 67.5 ± 8.1 | 74.7 ± 7.4 | 71.0 ± 8.5 |
| C3D + Mode average | 73.9 ± 10.6 | 81.2 ± 9.9 | 79.5 ± 8.1 | ||||
| C3D+TCN | 76.6 ± 12.6 | 74.3 ± 15.3 | 0.72 | 82.6 ± 12.4 | 81.3 ± 5.9 | 83.8 ± 7.8 | |
| C3D+LSTM | 71.6 ± 22.6 | 64.8 ± 19 | 0.68 | 77.1 ± 18.8 | |||
| C3D+LSTM+Sliding Window | 71.2 ± 17.5 | 65.8 ± 15.7 | 0.68 | 79.2 ± 14.7 | |||
| C3D+T90 noised(Proposed) | 74.9 ± 13.6 | 71.2 ± 15.5 | 0.73 | 81.9 ± 14.1 | 43.7 ± 18.7 | 48.1 ± 16.0 | 71.1 ± 13.9 |
| C3D+L100 noised(Proposed) | 74.6 ± 11.8 | 78.8 ± 11.5 | 0.77 | 64.9 ± 9.6 | 73.5 ± 10.6 | 81.1 ± 5.3 | |
Bold and Bold italic values indicate the best performance
Asterisk (*) denotes cholec80 results were directly extracted from respective publications, while the others are our own implementations. This table is grouped by (row 1–2) methods that use models specific to cholecystectomy (tools or priors), as reported in previous literature; (row 3–4) models with ResNet-50 backbone, as reported in previous literature; (row 5–11) models with a C3D backbone, as proposed in this paper
Ward Metric results summed over all Sacrocolpopexy cross-validation folds
| Method | F | C | F’ | event ratio |
|---|---|---|---|---|
| C3D | 79 | 4 | 2299 | 0.015 |
| C3D+Mode Average | 49 | 33 | 218 | 0.172 |
| C3D+TCN | 66 | 15 | 578 | 0.072 |
| C3D+LSTM(no-tool) | 30 | 41 | 123 | 0.287 |
| C3D+LSTM+sliding window | 39 | 35 | 150 | 0.237 |
| L100 noised(Proposed) | 63 | 19 | 415 | 0.098 |
| T90 noised(Proposed) | ||||
| LSTM avg. | 40 | 24 | 347 | 0.217 |
| Trans avg. | 36 | 34 | 188 | 0.215 |
| 100 series avg. | 54 | 22 | 427 | 0.123 |
| 90 series avg. | 22 | 36 | 107 | 0.309 |
Bold values indicate the best performance in each category
F and F represents the fragmentation label where an event F in the groundtruth is fragmented into multiple F events in the predictions. C represents the correct labels for the events in predictions that are matched with the corresponding events in ground truth. This table is grouped by (row 1-7) the Ward metric of our tested methods and (row 8-11) the average Ward metric over each category of our proposed methods