| Literature DB >> 35165304 |
Samir Touma1,2, Fares Antaki1,2, Renaud Duval3,4.
Abstract
This study assessed the performance of automated machine learning (AutoML) in classifying cataract surgery phases from surgical videos. Two ophthalmology trainees without coding experience designed a deep learning model in Google Cloud AutoML Video Classification for the classification of 10 different cataract surgery phases. We used two open-access publicly available datasets (total of 122 surgeries) for model training, validation and testing. External validation was performed on 10 surgeries issued from another dataset. The AutoML model demonstrated excellent discriminating performance, even outperforming bespoke deep learning models handcrafter by experts. The area under the precision-recall curve was 0.855. At the 0.5 confidence threshold cut-off, the overall performance metrics were as follows: sensitivity (81.0%), recall (77.1%), accuracy (96.0%) and F1 score (0.79). The per-segment metrics varied across the surgical phases: precision 66.7-100%, recall 46.2-100% and specificity 94.1-100%. Hydrodissection and phacoemulsification were the most accurately predicted phases (100 and 92.31% correct predictions, respectively). During external validation, the average precision was 54.2% (0.00-90.0%), the recall was 61.1% (0.00-100%) and specificity was 96.2% (91.0-99.0%). In conclusion, a code-free AutoML model can accurately classify cataract surgery phases from videos with an accuracy comparable or better than models developed by experts.Entities:
Mesh:
Year: 2022 PMID: 35165304 PMCID: PMC8844421 DOI: 10.1038/s41598-022-06127-5
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Example image for each of the ten phases of cataract surgery from the Cataract 101 dataset.
Performance and evaluation of the overall model and by phase.
| Number | TP | FP | TN | FN | AUPRC | PPV | Sensitivity | Specificity | F1 score | |
|---|---|---|---|---|---|---|---|---|---|---|
| Overall | 144 | NR | NR | NR | NR | 0.855 | 81.0% | 77.1% | 98.0% | 0.79 |
| Incision | 12 | 10 | 3 | 129 | 2 | NR | 76.9% | 83.3% | 97.7% | 0.80 |
| Viscous agent injection | 26 | 21 | 7 | 111 | 5 | NR | 75.0% | 80.8% | 94.1% | 0.79 |
| Rhexis | 13 | 6 | 0 | 131 | 7 | NR | 100.0% | 46.2% | 100.0% | 0.63 |
| Hydrodissection | 12 | 12 | 3 | 129 | 0 | NR | 80.0% | 100.0% | 97.7% | 0.89 |
| Phacoemulsification | 13 | 12 | 3 | 128 | 1 | NR | 80.0% | 92.3% | 97.7% | 0.86 |
| Irrigation & aspiration | 16 | 11 | 3 | 125 | 5 | NR | 78.6% | 68.8% | 97.7% | 0.73 |
| Capsule polishing | 14 | 10 | 5 | 125 | 4 | NR | 66.7% | 71.4% | 96.2% | 0.68 |
| Lens implantation | 12 | 9 | 0 | 132 | 3 | NR | 100.0% | 75.0% | 100.0% | 0.86 |
| Viscous agent removal | 13 | 9 | 1 | 130 | 4 | NR | 90.0% | 69.2% | 99.2% | 0.78 |
| Tonifying & antibiotics | 13 | 11 | 0 | 131 | 2 | NR | 91.7% | 84.6% | 100.0% | 0.88 |
TP True positive, FP False positive, TN True negative, FN False negative, AUPRC Area under the precision-recall curve, PPV Positive predictive value.
Misclassification matrix.
| True label | Correct prediction | Misclassification | ||
|---|---|---|---|---|
| Most common | 2nd most common | 3rd most common | ||
| Incision | 91.67% | Viscous inj. (8.3%) | – | – |
| Viscous agent injection | 80.77% | Incision (7.69%) | Capsule polishing (7.69%) | Hydrodissection (3.85%) |
| Rhexis | 53.85% | Viscous injection (30.77%) | Hydrodissection (7.69%) | Phacoemulsification (7.69%) |
| Hydrodissection | 100.00% | – | – | – |
| Phacoemulsification | 92.31% | Hydrodissection (7.69%) | – | – |
| Irrigation & aspiration | 68.75% | Capsule polishing (12.5%) | Phacoemulsification (12.5%) | Viscous removal (6.25%) |
| Capsule polishing | 71.43% | Viscous injection (14.29%) | Incision (7.14%) | Phacoemulsification (7.14%) |
| Lens implantation | 75.00% | Tonifying (16.67%) | Viscous injection (8.33%) | – |
| Viscous agent removal | 76.92% | Irrigation (23.08%) | – | – |
| Tonifying & antibiotics | 84.62% | Capsule polishing (15.38%) | – | – |
External validation of the overall model and by phase.
| Number | Proportion | TP | FP | TN | FN | PPV | Sensitivity | Specificity | Accuracy | |
|---|---|---|---|---|---|---|---|---|---|---|
| Overall | 100 | 100.0% | NR | NR | NR | NR | 54.2% | 61.1% | 96.2% | 93.0% |
| Incision | 10 | 10.0% | 8 | 5 | 95 | 2 | 61.5% | 80.0% | 95.0% | 93.6% |
| Viscous agent injection | 20 | 20.0% | 10 | 9 | 91 | 0 | 52.6% | 100.0% | 91.0% | 91.8% |
| Rhexis | 10 | 10.0% | 1 | 4 | 96 | 9 | 20.0% | 10.0% | 96.0% | 88.2% |
| Hydrodissection | 10 | 10.0% | 7 | 5 | 95 | 3 | 58.3% | 70.0% | 95.0% | 92.7% |
| Phacoemulsification | 10 | 10.0% | 9 | 1 | 99 | 1 | 90.0% | 90.0% | 99.0% | 98.2% |
| Irrigation & aspiration | 10 | 10.0% | 0 | 1 | 99 | 10 | 0.0% | 0.0% | 99.0% | 90.0% |
| Capsule polishing | 0 | 0.0% | 0 | 0 | 100 | 0 | NR | NR | NR | NR |
| Lens implantation | 10 | 10.0% | 2 | 1 | 99 | 8 | 66.7% | 20.0% | 99.0% | 91.8% |
| Viscous agent removal | 10 | 10.0% | 9 | 4 | 96 | 1 | 69.2% | 90.0% | 96.0% | 95.5% |
| Tonifying & antibiotics | 10 | 10.0% | 9 | 4 | 96 | 1 | 69.2% | 90.0% | 96.0% | 95.5% |
TP True positive, FP False positive, TN True negative, FN False negative, PPV Positive predictive value.
Summary of cataract phase classification models.
| Dataset name | Dataset size | Method | AUPRC | AUROC | PPV | Sensitivity | Specificity | Accuracy | F1 score | |
|---|---|---|---|---|---|---|---|---|---|---|
| Touma et al. (this study) | Cataract 21 & Cataract 101 | 122 | Google AutoML Video Intelligence | 0.855 | NR | 81.0% | 77.1% | 98.0% | 96.0% | 0.79 |
| Model #3 | Own dataset | 100 | CNN input with cross-sectional image data | NR | 0.712 | 78.6% | 74.5% | 97.5% | 95.6% | NR |
| Model #4 | Own dataset | 100 | CNN-RNN input with a time series of images | NR | 0.752 | 62.0% | 59.3% | 95.6% | 92.1% | NR |
| Primus et al.[ | Cataract 21 | 21 | GoogLeNet CNN | NR | NR | 69.0% (74.0%) | 67.0% (72.0%) | NR | NR | 0.68 (0.73) |
| Zisimopoulos et al.[ | CATARACTS | 50 | ResNET-RNN | NR | NR | NR | NR | NR | 78.3% | 0.75 |
| Quellec et al.[ | Own dataset | 186 | Adaptive spatiotemporal polynomial (real-time detection) | NR | NR | NR | NR | NR | 85.3% | NR |
| Qi et al.[ | Cataract 101 | 101 | ResNET (real-time detection) | NR | NR | NR | NR | NR | 88.1% | NR |
| Charrière et al.[ | Own dataset | 30 | Bayesian network (real-time detection) | NR | 0.828 | NR | NR | NR | NR | NR |
| Lalys et al.[ | Own dataset | 20 | Dynamic time wrapping and hidden Markov models (real-time detection) | NR | NR | NR | NR | NR | 95.0% | NR |
AUPRC Area under the precision-recall curve, AUROC Area under the receiver operating characteristic, PPV Positive predictive value.