| Literature DB >> 35581213 |
Dhiraj J Pangal1, Guillaume Kugener1, Yichao Zhu2, Aditya Sinha1, Vyom Unadkat2, David J Cote1, Ben Strickland1, Martin Rutkowski3, Andrew Hung4, Animashree Anandkumar5,6, X Y Han7, Vardan Papyan8, Bozena Wrobel9, Gabriel Zada1, Daniel A Donoho10.
Abstract
Major vascular injury resulting in uncontrolled bleeding is a catastrophic and often fatal complication of minimally invasive surgery. At the outset of these events, surgeons do not know how much blood will be lost or whether they will successfully control the hemorrhage (achieve hemostasis). We evaluate the ability of a deep learning neural network (DNN) to predict hemostasis control ability using the first minute of surgical video and compare model performance with human experts viewing the same video. The publicly available SOCAL dataset contains 147 videos of attending and resident surgeons managing hemorrhage in a validated, high-fidelity cadaveric simulator. Videos are labeled with outcome and blood loss (mL). The first minute of 20 videos was shown to four, blinded, fellowship trained skull-base neurosurgery instructors, and to SOCALNet (a DNN trained on SOCAL videos). SOCALNet architecture included a convolutional network (ResNet) identifying spatial features and a recurrent network identifying temporal features (LSTM). Experts independently assessed surgeon skill, predicted outcome and blood loss (mL). Outcome and blood loss predictions were compared with SOCALNet. Expert inter-rater reliability was 0.95. Experts correctly predicted 14/20 trials (Sensitivity: 82%, Specificity: 55%, Positive Predictive Value (PPV): 69%, Negative Predictive Value (NPV): 71%). SOCALNet correctly predicted 17/20 trials (Sensitivity 100%, Specificity 66%, PPV 79%, NPV 100%) and correctly identified all successful attempts. Expert predictions of the highest and lowest skill surgeons and expert predictions reported with maximum confidence were more accurate. Experts systematically underestimated blood loss (mean error - 131 mL, RMSE 350 mL, R2 0.70) and fewer than half of expert predictions identified blood loss > 500 mL (47.5%, 19/40). SOCALNet had superior performance (mean error - 57 mL, RMSE 295 mL, R2 0.74) and detected most episodes of blood loss > 500 mL (80%, 8/10). In validation experiments, SOCALNet evaluation of a critical on-screen surgical maneuver and high/low-skill composite videos were concordant with expert evaluation. Using only the first minute of video, experts and SOCALNet can predict outcome and blood loss during surgical hemorrhage. Experts systematically underestimated blood loss, and SOCALNet had no false negatives. DNNs can provide accurate, meaningful assessments of surgical video. We call for the creation of datasets of surgical adverse events for quality improvement research.Entities:
Mesh:
Year: 2022 PMID: 35581213 PMCID: PMC9114003 DOI: 10.1038/s41598-022-11549-2
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1SOCALNet architecture. Deep learning model used to predict blood loss and task success in critical hemorrhage control task. (A) Video is snapshotted into individual frames. (B) A pretrained ResNet convolutional neural network (CNN) is fine-tuned on SOCAL images from (A), to predict of blood loss and task success in each individual frame. The penultimate layer of the network was removed and a 1 × 4 matrix of values predictive of success/failure or BL was obtained. This is repeated for all frames, generating a new matrix with N (number of frames) rows and 4 columns. Output matrix from (B) and Tool Presence Information (C) [e.g. ‘Is suction present? Yes (check); is Muscle present? No (X), etc.; encoded as 8 binary values per frame (Nx8)] is input into a temporal layer. (D) Temporal layer: Long-short-term memory (LSTM) modified recurrent neural network allowing for temporal analysis across all frames. The 2D matrix of: features from the ResNet and Tool Presence Information (‘check mark’, ‘X’) from each frame are fed into the Temporal Layer. All LSTM predictions are consolidated in one dense layer and (E) a final prediction of success/failure, and blood loss (in mL) is output.
Results comparing deep learning model with expert Surgeons.
| Accuracy (SN %, SP %) | RMSE (R2) | M-S agreement:a success/failure | M-S agreement:b blood loss | |
|---|---|---|---|---|
| Ground truth | 11 success 9 failures | – | – | Avg blood loss: 568 (range:20–1640) |
| Model | 17/20 (85%) (100, 66) | 295 (0.74) | – | – |
| Expert cohort | 55/80 (68.75) (79, 56) | 351 (0.70) | 0.43‡ | 0.73c |
| Surgeon 1 | 13/20 (65%) (73, 55) | 306 (0.73) | 0.34 | 0.74 |
| Surgeon 2 | 14/20 (65%) (81, 55) | 335 (0.66) | 0.43 | 0.66 |
| Surgeon 3 | 14/20 (65%) (81, 55) | 423 (0.65) | 0.43 | 0.65 |
| Surgeon 4 | 14/20 (65%) (81, 55) | 329 (0.74) | 0.43 | 0.72 |
SN: sensitivity; SP: specificity; M-S: model-surgeon.
aKappa coefficient.
bInter-class coefficient.
cInter-Surgeon Agreement: Success/Failure = 0.95, Blood-Loss: 0.72.
Figure 2Association between expert confidence, surgeon skill level and accuracy of prediction. Experts are most accurate when viewing trials of surgeons with low or high skill, or where they (experts) are maximally confident. For those with moderate skill or when experts have moderate confidence, prediction accuracy is lower. Size of circle denotes number of trials. Color denotes accuracy.
Figure 3Expert and SOCALNet blood loss quantification. Predicted versus observed blood loss estimations by individual surgeons (grey), surgeon mean (blue), and model (green). Red points represent measured blood loss (ground truth).
Figure 4Outcome predictions of experts and SOCALNet. Outcomes of experts (Blue) and model (Red) in predicting task success using 1 min of video. Circle size denotes number of trials (N). Success (S) and failure (F) denoted underneath each N. When the union of successful predictions is taken, the model + expert grouping would successfully predict outcome in 18/20 cases. In the 2 remaining cases (bottom left quadrant), a critical error took place following the cessation of the video and was evaluated in subsequent counterfactual experiments.