| Literature DB >> 33986429 |
Edward Korot1, Nikolas Pontikos1, Xiaoxuan Liu1,2,3, Siegfried K Wagner1, Livia Faes1,4, Josef Huemer1,5, Konstantinos Balaskas1, Alastair K Denniston1,2,3,6, Anthony Khawaja7, Pearse A Keane8.
Abstract
Deep learning may transform health care, but model development has largely been dependent on availability of advanced technical expertise. Herein we present the development of a deep learning model by clinicians without coding, which predicts reported sex from retinal fundus photographs. A model was trained on 84,743 retinal fundus photos from the UK Biobank dataset. External validation was performed on 252 fundus photos from a tertiary ophthalmic referral center. For internal validation, the area under the receiver operating characteristic curve (AUROC) of the code free deep learning (CFDL) model was 0.93. Sensitivity, specificity, positive predictive value (PPV) and accuracy (ACC) were 88.8%, 83.6%, 87.3% and 86.5%, and for external validation were 83.9%, 72.2%, 78.2% and 78.6% respectively. Clinicians are currently unaware of distinct retinal feature variations between males and females, highlighting the importance of model explainability for this task. The model performed significantly worse when foveal pathology was present in the external validation dataset, ACC: 69.4%, compared to 85.4% in healthy eyes, suggesting the fovea is a salient region for model performance OR (95% CI): 0.36 (0.19, 0.70) p = 0.0022. Automated machine learning (AutoML) may enable clinician-driven automated discovery of novel insights and disease biomarkers.Entities:
Mesh:
Year: 2021 PMID: 33986429 PMCID: PMC8119673 DOI: 10.1038/s41598-021-89743-x
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Comparison of reported fundus photo sex prediction algorithms.
| Model | AUROC | Dataset source | Training dataset images | Mean training set age | Mean test set age |
|---|---|---|---|---|---|
| CFDL | 0.93 | UK Biobank | 173,819 | 56.8 | 55.7 |
| Poplin et al | 0.97 | UK Biobank + EyePACS | 1,779,020 | 54.1 | 56.6 |
| Yamashita et al | 0.78 | Kagoshima University Hospital | 111* | 25.8* | 25.8* |
*Cross-validation.
Figure 1Precision-recall curve.
Model performance on external validation dataset subgrouped by presence of foveal pathology.
| Percent correct prediction | Odds ratio (95% CI) for correct prediction | P-value | |
|---|---|---|---|
| None | 85.4% | Ref | |
| Present | 69.4% | 0.36 (0.19, 0.70) | 0.0022 |
| Age (years) | 1.00 (0.99, 1.02) | 0.66 | |
Figure 2Region based saliency maps for model prediction: colors represent regions in order of decreasing performance: Yellow, Green, Blue. Images sourced at random from validation set, with the addition of an ungradable image.
Dataset characteristics of UK biobank and moorfields external validation sets.
| UK Biobank development (train + tuning) | UK Biobank validation | Moorfields external validation | |
|---|---|---|---|
| Patients | 84,743 | 728 | 252 |
| Images | 173,819 | 1287 | 252 |
| Mean age | 56.8 | 55.7 | 64.0 |
| St. Dev. age | 8.0 | 8.1 | 17.7 |
| Gender (% female) | 53.6% | 56.0% | 54.2% |
*Numbers reported are post-removal of ungradable images.
Figure 3Representative Fundus Photos. Correct (A) and incorrect (B) cases without (1) and with (2) foveal pathology.