| Literature DB >> 32855846 |
Atalie C Thompson1, Alessandro A Jammal1, Felipe A Medeiros1.
Abstract
Because of recent advances in computing technology and the availability of large datasets, deep learning has risen to the forefront of artificial intelligence, with performances that often equal, or sometimes even exceed, those of human subjects on a variety of tasks, especially those related to image classification and pattern recognition. As one of the medical fields that is highly dependent on ancillary imaging tests, ophthalmology has been in a prime position to witness the application of deep learning algorithms that can help analyze the vast amount of data coming from those tests. In particular, glaucoma stands as one of the conditions where application of deep learning algorithms could potentially lead to better use of the vast amount of information coming from structural and functional tests evaluating the optic nerve and macula. The purpose of this article is to critically review recent applications of deep learning models in glaucoma, discussing their advantages but also focusing on the challenges inherent to the development of such models for screening, diagnosis and detection of progression. After a brief general overview of deep learning and how it compares to traditional machine learning classifiers, we discuss issues related to the training and validation of deep learning models and how they specifically apply to glaucoma. We then discuss specific scenarios where deep learning has been proposed for use in glaucoma, such as screening with fundus photography, and diagnosis and detection of glaucoma progression with optical coherence tomography and standard automated perimetry. Translational Relevance: Deep learning algorithms have the potential to significantly improve diagnostic capabilities in glaucoma, but their application in clinical practice requires careful validation, with consideration of the target population, the reference standards used to build the models, and potential sources of bias. Copyright 2020 The Authors.Entities:
Keywords: deep learning; glaucoma; optical coherence tomography; visual fields
Mesh:
Year: 2020 PMID: 32855846 PMCID: PMC7424906 DOI: 10.1167/tvst.9.2.42
Source DB: PubMed Journal: Transl Vis Sci Technol ISSN: 2164-2591 Impact factor: 3.283
Figure 1.A diagram showing the organization of the classification of machine learning algorithms.
Figure 2.Schematic representation of “neurons” on an artificial neural network. The input data corresponds to the data one it trying to classify. The number of neurons in the input layer will depend on the input data (e.g., number of pixels in an image). These input neurons are then connected to neurons in hidden layers. There may be many hidden layers, which can be quite complex depending on the type of model. For convolutional neural networks, the hidden layers are of the convolutional type, specializing in spatial patterns. Finally, all calculations will converge to a final model prediction in the output layer.
Summary of Studies Using Deep Learning Models in Glaucoma
| Citation | Training/Validation Dataset | Test Dataset | Reference | Network | Data type | Output | Results |
|---|---|---|---|---|---|---|---|
| Ting et al. | Train: 125,189 | Test: 71,896 | Subjective grading of photographs | Custom deep learning system | Color Fundus Photos | “Referable for glaucoma” vs. not | AUC 0.942; Sensitivity 96.4%, Specificity 87.2% |
| Li et al. | Train: 31,745 | 8000 | Subjective grading of photographs | Inception-v3 | Color Fundus Photos | “Referable for glaucoma” vs. not | AUC 0.986; Sensitivity 95.6%, Specificity 92.0% |
| Christopher et al. | 9189 healthy, 5633 GON: divided randomly into multiple folds for 10-fold cross-validation. | 10% test | Subjective grading of photographs | VGG6, Inception-v3, ResNet50 | Color Fundus Photos | “GON” vs. healthy | ResNet50 AUC 0.91; Sensitivity 85% at 80% Specificity |
| Liu et al. | Train: 29,865 GON, 11,046 probable GON, 200,121 unlikely GON | Validation: 4514 GON, 571 Probable GON, 23,484 unlikely GON | Subjective grading of photographs | ResNet | Color Fundus Photos | “Referable GON” vs. not | AUC 0.996, Sensitivity 96.2%, Specificity 97.7% |
| Ahn et al. | Train: 228 Advanced glaucoma, 131 Early glaucoma, 385 Normal; Validation: 98 Advanced glaucoma, 61 Early glaucoma, 165 Normal | Test: 141 Advanced glaucoma, 87 Early glaucoma, 236 Normal | Subjective grading of visual field, OCT and RNFL photographs | Inception-v3; Custom 3-layer CNN | Color Fundus Photos | Glaucoma vs. Normal | Inception-v3 model: AUC 0.93; Average accuracy 84.5%; Custom 3-layer CNN: AUC 0.94, Average accuracy 87.9% |
| Phene et al. | Train: 35,877 Non-glaucomatous, 20,740 Low-risk GS, 13,180 High-risk GS, 5307 Likely glaucoma, 18,487 Referable glaucoma; Tuning: 849 Non-glaucomatous, 259 Low-risk GS, 268 High-risk GS, 110 Likely glaucoma, 378 Referable glaucoma | Validation set A: 687 Non-glaucomatous, 290 Low-risk GS, 170 High-risk GS, 48 Likely glaucoma, 218 Referable glaucoma; Validation set B: 8753 Non-glaucomatous, N/A Low-risk GS, N/A High-risk GS, 890 Likely glaucoma, 890 Referable glaucoma; Validation set C: 63 Non-glaucomatous, N/A Low-risk GS, 175 High-risk GS, 108 Likely glaucoma, 283 Referable glaucoma | Validation set A: Referable GON based on subjective gradings of photographs; Validation set B: Referable GON based on glaucoma-related International Classification of Diseases codes; Validation set C: referable GON based on full glaucoma workup by glaucoma specialists including clinical exam, history, VF assessment, and OCT | Inception-v3 | Color Fundus Photos | “Referable glaucoma” vs. Not | Validation set A: AUC 0.945; Validation set B: AUC 0.855; Validation set C: AUC 0.881 |
| Shibata et al. | Train: 1364 glaucomatous appearance vs. 1768 not glaucomatous appearance; 3-fold cross-validation | Test: 33 non-highly myopic glaucoma, 28 highly myopic glaucoma, 27 non-highly myopic normal, 22 highly myopic normal | Train: Subjective gradings of photographs Test: Subjective gradings of photographs and categorization of RNFL and macular inner retinal thickness measurements based on OCT normative database | ResNet | Color Fundus Photos | Glaucomatous vs. Not | AUC 96.5 |
| Li et al. | Train 20,793/Validation 2,311:11,176 GON-confirmed, 599 GON-suspected, 11,329 Normal; 10-fold cross-validation with a random selection of 9:1 for participants within each fold | Test: 1442 GON-confirmed, 515 GON-suspected, 1524 Normal | Subjective grading of photographs | ResNet101 | Color Fundus Photos | GON-confirmed vs. GON-suspected vs. Normal; Referrals (GON-confirmed and GON-suspected) vs. Observation (Normal) | Comparison of GON-confirmed vs. GON-suspected vs. Normal: Accuracy 0.941, Sensitivity 0.957, Specificity 0.929. AUC 0.992 for Referrals (GON-confirmed and GON-suspected) vs. Observation (Normal) |
| Medeiros et al. | Train + validation (80% train, 20% validation), 9,136 Glaucoma, 13,410 Suspect, 3982 Healthy | Test: 2070 Glaucoma,3345 Suspect, 877 Healthy | SDOCT global RNFL value; Abnormal (Glaucoma) vs. Normal (Normal + Bordeline) RNFL based on classification of global RNFL by SDOCT normative database | ResNet34 | Color Optic Disc Photos paired to SDOCT global RNFL | SDOCT global RNFL value; Abnormal (Glaucoma) vs. Normal RNFL | Pearson |
| Thompson et al. | Train + validation (80% train, 20% validation): 4,570 Glaucoma, 1924 Suspect, 1046 Healthy | Test: 970 Glaucoma, 432 Suspect, 340 Healthy | Global and sector BMO-MRW thickness values; Abnormal (Glaucoma) vs. Normal (Suspect + Normal) based on classification of BMO-MRW global and sector values by SDOCT normative database | ResNet34 | Color Optic Disc Photos paired to SDOCT global BMO-MRW | Global and sector BMO-MRW thickness values; Abnormal (Glaucoma) vs. Normal | Global BMO-MRW Pearson r = 0.88 ( |
| Devalla et al. | 40 control/60 glaucoma; training on datasets of 10, 20, 30 or 40 B-scans, with equal number of glaucoma and healthy scans in each cross-validation experiment | Cross-validation experiments with test sets of 90, 80, 70, or 60. | Manual segmentation of ONH OCT | Custom eight-layer CNN | Horizontal B-scan through ONH | Digital stain of RNFL+prelamina, RPE, all other retinal layers, choroid, peripapillary sclera, lamina cribrosa | Dice coefficient 0.84, Sensitivity 92%, specificity 99%, accuracy 94% |
| Mariottoni et al. | Train 10,520/Validation 2742 | Test Set 1 (images without segmentation errors or artifacts) 11,010; Test Set 2 (low-quality images with segmentation errors) 237; Test Set 3 (images with other artifacts) 776 | Global RNFL thickness value | ResNet34 | SDOCT raw B-scans of peripapillary RNFL | Global RNFL thickness value | Test set 1: Pearson |
| Thompson et al. | Train + Validation (50%+20%): 4828 Glaucoma, 9638 Normal | Test (30%): 3897 Glaucoma, 2443 Normal | Glaucoma (based on GON and reproducible glaucomatous visual field defects) vs. Healthy | ResNet34 | SDOCT raw B-scans of peripapillary RNFL | Glaucoma vs. Healthy | AUC 0.96 for DL algorithm vs. AUC 0.87 for global RNFL thickness ( |
| Maetschke et al. | Train (80%): 672 POAG, 216 Healthy; Validation (10%): 30 Healthy, 82 POAG | Test (10%): 93 POAG, 17 Healthy | Glaucoma (based on glaucomatous VF defects on 2 consecutive tests) vs. Healthy | Custom 5-layer CNN | OCT of the ONH | Glaucoma vs. Healthy | AUC 0.94 |
| Asaoka et al. | Pretraining: 1371 Open angle glaucoma, 193 Healthy; Training: 94 Open angle glaucoma, 84 Healthy | Test: 114 Open angle glaucoma and MD >−5 dB, 82 Healthy | Glaucoma (based on GON and glaucomatous VF defects) vs. Healthy | Custom 6-layer CNN | 8 × 8 macular grid | Glaucoma vs. Healthy | AUC 0.937 |
| Xu et al. | Cross-validation (85%: 80% training/20% validation) 1632 OAG, 1764 closed | Test (15%): 311 open, 329 closed | Angle Closed vs. Open based on gonioscopic grade | ResNet18; Inception v3 | Anterior Segment-OCT | Angle closed vs. Open | AUC 0.928 |
| Fu et al. | 7375 open angle, 895 angle closure: 5-fold cross-validation - four groups, each with 1654 angle closure tests for training, and one group of 1654 angle closure for testing | 1654 angle closure for testing within each fold | Angle closed vs. open based on gonioscopic grade | VGG-16 | Anterior Segment-OCT | Angle closed vs. open | AUC 0.96, sensitivity 90%, specificity 92% |
| Mariottoni et al. | Training/Validation:3980 Glaucoma, 3732 Normal | Test:1061 Glaucoma, 1057 Normal | GON vs. GON suspects vs. Normal based on SAP and OCT objective criteria (see | ResNet50 | Optic Disc Photos | GON vs. Normal | AUC 0.92, Sensitivity 77% at Specificity 95% |
| Li et al. | Overall: 2389 Glaucoma, 1623 Non-glaucoma: Train: 3712 | Test: 300 | Glaucoma (based on glaucomatous damage to ONH and reproducible glaucomatous VF defects) vs. Healthy | VGG | Pattern Deviation plots from Humphrey Field Analyzer 30-2 or 24-2 visual field tests | Glaucoma vs. Healthy | AUC 0.966, Sensitivity 93.2%, Specificity 82.6% |
| Kucur et al. | 1979 control (Rotterdam 244; Budapest 1735), 2811 Early glaucoma (Rotterdam 2,279; Budapest 532)– 10-fold cross- validation | 10-fold cross-validation; unclear if separate test and validation datasets were used | Early glaucoma (based on glaucomatous neuroretinal rim loss, reproducible VF defects, and IOP) vs. Healthy | Custom 7-layer CNN | OCTOPUS 101 G1 and Humphrey Field Analyzer 24-2 visual field tests | Early Glaucoma vs. Healthy | Average Precision: Rotterdam 87.4%, Budapest 98.6% |
| Asaoka et al. | 171 Preperimetric glaucoma vs. 108 Normal and 63 artificially generated Normal—leave one out cross-validation | Leave one out cross-validation; a separate test dataset was not used | Preperimetric OAG (based on ONH changes, VF preceding perimetric field changes) vs. Healthy | Custom DL feed-forwardneural network | Humphrey Field Analyzer 24-2 | Preperimetric glaucoma vs. Healthy | AUC 0.926 |
| Berchuck et al. | Train (81%): 768 Glaucoma, 1793 Glaucoma suspects, 547 Normal Validation (9%): 83 Glaucoma, 222 Glaucoma suspect, 58 Normal5-fold cross-validation | Test (9%): 93 Glaucoma, 206 Glaucoma suspect, 62 Normal | Glaucoma (repeatable glaucomatous VF defect and corresponding optic nerve damage) vs. Glaucoma suspect (high IOP or suspicious optic nerve but no VF defect) vs. Normal (No visual field or optic nerve defect) | Deep variational autoencoder | Humphrey Field Analyzer 24-2 | Rates of VF progression compared to SAP MD; Prediction of future VF compared to point-wise regression predictions | Rate of progression significantly higher for VAE than MD at 2 years (25% vs. 9%) and 4 years (35% vs. 15%) from baseline. MAE for prediction of 4th, 6th, and 8th visits significantly smaller for VAE than PW ( |
| Wen et al. | Train + validation (80%): 25,723 and 10-fold cross-validation | Test (20%): 6720 | Actual HFA points and Mean Deviation from HVF | CascadeNet- 5 | Humphrey Field Analyzer 24-2 | HFA points and Mean Deviation | PMAE 2.47; Mean difference in MD between predicted and actual MD = 0.41 dB, Pearson |
BAE, best available estimate; DL, deep learning; GON, glaucomatous optic neuropathy; VF, visual field; HFA, Humphrey Field Analyzer; HVF, Humphrey Visual Field; IOP, intraocular pressure; MAE, mean absolute error; PMAE, point-wise mean absolute error; POAG, primary open angle glaucoma; OAG, open angle glaucoma; ONH, optic nerve head; RPE, retina pigmented epithelium.
Figure 3.Examples of optic disc photographs and corresponding actual SDOCT measurements of average RNFL. Above each photo are also shown the DL prediction of average RNFL thickness from the optic disc photograph by the M2M algorithm. Note that the predictions from the DL algorithm can be quite close to actual SDOCT RNFL thickness measurements for a variety of photos. Adapted from Medeiros et al..
Figure 4.Class activation maps (CAM) for several examples of deep learning models. (A) Gradient-weighted CAM from the M2M model to predict RNFL thickness from fundus photographs. It can be seen that the heatmap correctly highlights the area of the optic nerve and adjacent RNFL as most relevant for the predictions. (adapted from Medeiros et al.). (B) Gradient-weighted CAM map from the M2M model used to predict rim width in an eye with glaucoma. Note that the heatmap strongly highlights the cup and rim regions. (adapted from Thompson et al.). (C) CAM showing the regions in a spectral-domain optical coherence tomography volume identified as the most important for the classification of the scan into healthy versus glaucoma. For glaucoma eyes the map generally highlighted regions that agree with established clinical markers for glaucoma diagnosis, such as the optic disc cup and neuroretinal rim. It should be noted, however, that the highlighted areas are often very broad, sometimes extending even to the vitreous (adapted from Maetschke et al.).
Summary of Proposed Objective Criteria for Definition of GON
| SDOCT | SAP | |
|---|---|---|
| GON | ||
| Global loss | Global RNFL thickness outside normal limits | GHT outside normal limits or PSD, |
| Localized loss | RNFL thickness outside normal limits in at least one superior sector (temporal superior and/or nasal superior) | Inferior MD, |
| RNFL thickness outside normal limits in at least one inferior sector (temporal inferior and/or nasal inferior) | Superior MD, | |
| Normal | RNFL thickness within normal limits for all sectors and global | PSD probability not significant ( |
To be considered glaucomatous optic neuropathy, it was necessary to meet the criteria for global or localized loss. To be considered normal, it was required that both SDOCT and SAP results were normal. SDOCT-SAP pairs that do not meet the criteria for GON or normal are considered suspects. GHT, glaucoma hemifield test; PSD, pattern standard deviation; GON, glaucomatous optic neuropathy; SDOCT, spectral-domain optical coherence tomography; SAP, standard automated perimetry.