| Literature DB >> 35122132 |
Abdallah Abbas1, Ciara O'Byrne2,3, Dun Jack Fu2, Gabriella Moraes2, Konstantinos Balaskas2, Robbert Struyven2, Sara Beqiri4, Siegfried K Wagner5, Edward Korot6, Pearse A Keane5.
Abstract
PURPOSE: Neovascular age-related macular degeneration (nAMD) is a major global cause of blindness. Whilst anti-vascular endothelial growth factor (anti-VEGF) treatment is effective, response varies considerably between individuals. Thus, patients face substantial uncertainty regarding their future ability to perform daily tasks. In this study, we evaluate the performance of an automated machine learning (AutoML) model which predicts visual acuity (VA) outcomes in patients receiving treatment for nAMD, in comparison to a manually coded model built using the same dataset. Furthermore, we evaluate model performance across ethnic groups and analyse how the models reach their predictions.Entities:
Keywords: Anti-VEGF; Artificial intelligence; Automated machine learning; Model interpretability; Neovascular age-related macular degeneration; OCT
Mesh:
Substances:
Year: 2022 PMID: 35122132 PMCID: PMC9325856 DOI: 10.1007/s00417-021-05544-y
Source DB: PubMed Journal: Graefes Arch Clin Exp Ophthalmol ISSN: 0721-832X Impact factor: 3.535
Fig. 1Summary of project workflow
Fig.2Segmentation of retinal compartments using deep learning algorithm. Exemplar OCT scan and segmentation map for a patient with neovascular age-related macular degeneration. The colour key shows the features quantified by the segmentation algorithm. Volumes outputted were scaled from voxels (2.60 × 11.72 × 47.24 cuboids) to cubic millimetres before their use as input features in this study. PED = pigment epithelium detachment
Input feature summary statistics categorised by outcome label. For continuous variables, we report the median (Q1–Q3); as using the Shapiro–Wilk test, all were found to have non-normal distributions. Differences between outcome groups were analysed using the Mann–Whitney U test for continuous variables and Fisher’s exact test for categorical variables. *HRF to 5.d.p: yotal = 0.00071 (0.00021–0.00236); Above = 0.00061 (0.00016–0.00211) and Below = 0.00077 (0.00023–0.00256)
| Total ( | Above ( | Below ( | |||
|---|---|---|---|---|---|
| Age (years) | 80 (73–85) | 78 (71–83) | 81 (75–86) | < 0.01 | |
| Ethnicity | White | 868 (53.2%) | 388 (58.5%) | 480 (49.6%) | < 0.01 |
| Asian | 170 (10.4%) | 64 (9.7%) | 106 (11.0%) | 0.41 | |
| Black | 33 (2.0%) | 18 (2.7%) | 15 (1.5%) | 0.11 | |
| Other | 380 (23.3%) | 131 (19.8%) | 249 (25.7%) | 0.01 | |
| Unknown | 180 (11.0%) | 62 (9.4%) | 118 (12.2%) | 0.08 | |
| Gender | Female | 988 (60.6%) | 400 (60.3%) | 588 (60.7%) | 0.88 |
| Male | 643 (39.4%) | 263 (39.7%) | 380 (39.3%) | 0.88 | |
| Baseline VA (ETDRS) | 58 (46–68) | 67 (60–70) | 50 (38–60) | < 0.01 | |
| OCT Features (mm3) | RPE | 0.81 (0.77–0.86) | 0.83 (0.78–0.87) | 0.80 (0.76–0.85) | < 0.01 |
| IRF | 0.00 (0.00–0.08) | 0.00 (0.00–0.03) | 0.01 (0.00–0.13) | < 0.01 | |
| SRF | 0.20 (0.03–0.57) | 0.19 (0.03–0.60) | 0.20 (0.03–0.55) | 0.85 | |
| HRF* | 0.00 (0.00–0.00) | 0.00 (0.00–0.00) | 0.00 (0.00–0.00) | 0.01 | |
| SHRM | 0.13 (0.02–0.41) | 0.07 (0.01–0.25) | 0.19 (0.04–0.55) | < 0.01 | |
| PED | 0.37 (0.13–0.90) | 0.28 (0.10–0.71) | 0.44 (0.17–1.04) | < 0.01 | |
Fig.3Receiver operating characteristic (ROC) curves and confusion matrices for AutoML and XGBoost models. a ROC curves for both models on test data, showing discriminative performance at predicting whether patients with nAMD would have a VA ‘Above’ or ‘Below’ 70 after one year of treatment. Grey line represents a random classifier. b Confusion matrices for AutoML and XGBoost models. Predicted labels were assigned using the default classification threshold of 0.5
Summary performance metrics for the AutoML Tables and XGBoost models. Metrics were calculated at the default classification threshold of 0.5. PPV positive predictive value. NPV negative predictive value
| AUROC | Sensitivity | Specificity | PPV | NPV | Accuracy | F1 score | |
|---|---|---|---|---|---|---|---|
| AutoML | 0.849 | 69.0% | 82.1% | 72.6% | 79.3% | 76.7% | 0.71 |
| XGBoost | 0.847 | 67.0% | 84.8% | 75.3% | 78.8% | 77.6% | 0.71 |
Fig.4AutoML Tables and XGBoost model architectures. a Simplified diagram of one of the neural networks from the AutoML Tables ensemble model, consisting of one input layer, two hidden layers each with 128 nodes, a dropout of 0.25 and dense skip connections (curved arrows). Lines represent flow of information through the network from top to bottom. b Diagram of decision tree number 20 from the XGBoost model. Leaf values displayed are summed across all 50 trees and transformed using a logistic function to give the model’s estimated probability of an eye belonging to the ‘Above’ class. Full hyperparameter information for both models is available in Supplementary Tables 3 and 4
Fig.5Feature importance and partial dependence plots (PDPs). a Relative feature importance, showing the average marginal contribution of each feature to each model’s predictions. These values were normalised to sum to 1.0 (see the ‘Methods’ section). PDPs: These show how the inference score (model’s predicted probability that an eye belongs to the ‘Above’ class) changes when a specified input feature is varied, and all other features are held at their true value. This is averaged for all datapoints in the test set to give the average inference score. The horizontal black line represents the default classification threshold of 0.5. b PDP for baseline VA. c PDP for age. d PDP for intraretinal fluid volume. e PDP for pigment epithelium detachment volume
Fig.6Patient A: true positive case study. a Input feature values for patient A. b VA changes throughout the first year of treatment, as measured at each follow-up appointment. NB: Only baseline information used to train model. c Local feature importance showing how each feature affected the AutoML model’s inference score relative to the baseline score of 0.48. d Individual conditional expectation (ICE) plot showing how the model’s inference score changes as SRF volume is hypothetically varied and other features are kept as shown in Fig. 6a. The indicated point represents patient A’s actual SRF volume at baseline, whilst the horizontal black line represents the default classification threshold of 0.5
Fig.7Patient B: false negative case study. a Input feature values for patient B. b VA changes throughout first year of treatment. c Local feature importance. d ICE plot for baseline VA. The indicated point represents patient B’s actual baseline VA