Literature DB >> 32855846

A Review of Deep Learning for Screening, Diagnosis, and Detection of Glaucoma Progression.

Atalie C Thompson¹, Alessandro A Jammal¹, Felipe A Medeiros¹.

Abstract

Because of recent advances in computing technology and the availability of large datasets, deep learning has risen to the forefront of artificial intelligence, with performances that often equal, or sometimes even exceed, those of human subjects on a variety of tasks, especially those related to image classification and pattern recognition. As one of the medical fields that is highly dependent on ancillary imaging tests, ophthalmology has been in a prime position to witness the application of deep learning algorithms that can help analyze the vast amount of data coming from those tests. In particular, glaucoma stands as one of the conditions where application of deep learning algorithms could potentially lead to better use of the vast amount of information coming from structural and functional tests evaluating the optic nerve and macula. The purpose of this article is to critically review recent applications of deep learning models in glaucoma, discussing their advantages but also focusing on the challenges inherent to the development of such models for screening, diagnosis and detection of progression. After a brief general overview of deep learning and how it compares to traditional machine learning classifiers, we discuss issues related to the training and validation of deep learning models and how they specifically apply to glaucoma. We then discuss specific scenarios where deep learning has been proposed for use in glaucoma, such as screening with fundus photography, and diagnosis and detection of glaucoma progression with optical coherence tomography and standard automated perimetry. Translational Relevance: Deep learning algorithms have the potential to significantly improve diagnostic capabilities in glaucoma, but their application in clinical practice requires careful validation, with consideration of the target population, the reference standards used to build the models, and potential sources of bias. Copyright 2020 The Authors.

Entities: Chemical Disease Gene Species

Keywords: deep learning; glaucoma; optical coherence tomography; visual fields

Mesh：

Year: 2020 PMID： 32855846 PMCID： PMC7424906 DOI： 10.1167/tvst.9.2.42

Source DB: PubMed Journal: Transl Vis Sci Technol ISSN： 2164-2591 Impact factor: 3.283

Introduction

Despite the availability of effective treatments, glaucoma remains the leading cause of irreversible blindness worldwide. Current projections estimate that 111.8 million people will have glaucoma by the year 2040, with people in Asia and Africa disproportionately affected. Early detection and intervention can help prevent vision loss from glaucoma, but a majority of patients do not know they have the disease, because it is generally asymptomatic in early stages.– Thus early detection of glaucoma is important and may be improved by introducing novel approaches for screening, diagnosis, and detection of change over time. Recent progress in artificial intelligence (AI) and the collation of large medical datasets have spurred great interest in the development of deep learning algorithms that would more quickly and accurately identify glaucomatous damage on diagnostic tests compared to subjective evaluation and other traditional methods.– The purpose of this article is to critically review recent applications of deep learning models in glaucoma, discussing their advantages but also focusing on the challenges inherent to the development of such models for screening, diagnosis and detection of progression. After a brief general overview of deep learning and how it compares to traditional machine learning classifiers, we discuss issues related to the training and validation of deep learning models and how they specifically apply to glaucoma. We then discuss specific scenarios where deep learning has been used in glaucoma, such as screening with fundus photography and diagnosis and detection of glaucoma progression with optical coherence tomography (OCT) and standard automated perimetry (SAP).

Artificial Intelligence, Machine Learning and Deep Learning

Artificial intelligence (AI) is a branch of computer science dealing with the simulation of intelligent behavior in computers, but in practice, and particularly in the popular press, “AI” has been used to describe any cutting-edge machine capability. Machine learning is a subset of AI that is concerned with setting up computer algorithms to recognize patterns in data, without human programmers having to dictate all aspects of this recognition. In its most traditional form, machine learning algorithms still require human-designed code to transform raw data into input features, as these algorithms are not particularly good at learning features directly from raw data. Examples of these more traditional algorithms include logistic regression, k-Nearest Neighbor, decision trees, random forests, support vector machines (SVMs), among others (Fig. 1). The process of creating these initial features, however, can be a highly specialized task, requiring substantial subject-matter expertise, and there is no guarantee that the human-extracted features are optimal for use by the classifier. As an example, previous studies have used SVMs to improve detection of glaucoma damage from imaging data.– The SVMs used features such as global and sectoral parameters of retinal nerve fiber layer (RNFL) thickness, and measurements such as rim and cup area, cleverly combining them to reach a final glaucoma classification. However, although satisfactory performance has been reported for SVMs and other traditional algorithms in this scenario, there is no guarantee that the parameters used as initial features make the best use of the vast information produced by imaging. In fact, although many of these traditional machine learning techniques have been applied to enhance the diagnostic performance of imaging and perimetry in glaucoma,– they have not been widely incorporated into clinical practice. It should be noted, however, that depending on the type of data and application, traditional machine learning techniques may still provide the best solution to a particular problem.

Figure 1.

A diagram showing the organization of the classification of machine learning algorithms.

A diagram showing the organization of the classification of machine learning algorithms. In contrast to the traditional techniques described above, deep learning belongs to a class of machine learning algorithms that use “representation learning” (Fig. 1). These algorithms learn features (or representations) from data automatically, as long as enough data are given to them. A primary benefit of deep learning is that it eases the requirement for subject matter expertise. Instead of manually trying to curate relevant features from the data, one can feed the raw data directly to a deep leaning model, which will then automatically learn the most relevant features from the data. These features may be more subtle and comprehensive than those that would have been manually curated. As a trade-off, however, these automatically learned features may not be as straightforward to understand or explain, leading to the perception that deep learning models are a “black-box.” Deep learning models are a type of artificial neural network composed of several layers of artificial “neurons.” These neurons are simple algorithms inspired by biological brain cells, in the sense that they receive input from other neurons, perform computations, and produce an output (Fig. 2). An artificial neural network is a collection of interconnected artificial neurons. Data are fed to the network and processed in some way with the goal of producing a desired outcome. Neural networks have been known for decades. Goldbaum et al. used them to interpret perimetry results in glaucoma almost 30 years ago. However, only recently the advances in computational power have allowed the buildup of networks of several layers, that is, deep learning networks, which are able to process much more complex data, resulting in far better performance compared to the shallow artificial neural networks.

Figure 2.

Schematic representation of “neurons” on an artificial neural network. The input data corresponds to the data one it trying to classify. The number of neurons in the input layer will depend on the input data (e.g., number of pixels in an image). These input neurons are then connected to neurons in hidden layers. There may be many hidden layers, which can be quite complex depending on the type of model. For convolutional neural networks, the hidden layers are of the convolutional type, specializing in spatial patterns. Finally, all calculations will converge to a final model prediction in the output layer. A type of deep learning network called convolutional neural network (CNN) has been the main one responsible for the explosion of deep learning applications in computer vision, with performance sometimes surpassing that of humans for a variety of tasks. Nowadays, these networks are ubiquitous in their applications ranging from face recognition in smartphones to self-driving cars. CNNs have one or more convolutional layers, which consist of sets of filters, and are ideally suited to process spatial patterns and perform tasks such as image classification and object detection. These filters can be used to automatically extract features from images, obviating the need for manual labor in curating relevant features, a major limitation of traditional machine learning algorithms, as described above. As one of the medical fields that is highly dependent on ancillary imaging tests, ophthalmology has been a prime area to witness the application of CNN algorithms to help analyze data coming from these tests.

Training, Validation, and Testing of Deep Learning Models

Before a deep learning network can be used for a specific task, it needs to be trained, so that the specific computations needed at each artificial neuron (i.e., weights) and their interconnections can be determined to produce the desired result. In general terms, this training process involves feeding the network with data, observing the results, making modifications to the model, and repeating the process iteratively, until a desired level of accuracy is achieved. There are essentially three ways to train a deep learning algorithm: supervised learning, unsupervised learning, and semisupervised learning. Supervised learning entails training of the algorithm with a completely labeled dataset. For example, if an algorithm is being purposed to identify glaucoma on fundus photographs, it can be initially trained by feeding the network with labeled photos of glaucoma and normal eyes. The network then “learns” the best features and the combination of them that will lead to the best discrimination of a glaucomatous from a normal photo. This learning process is done by comparing the algorithm's predictions to the actual labels and readjusting the weights of the artificial neurons, in a process known as backpropagation. Numerous studies have been published using supervised learning algorithms to improve glaucoma detection.–,– Unsupervised learning, on the other hand, involves training the algorithm with unlabeled data. The goal is to have the model discover some hidden underlying structure or pattern in the data, without being told a priori what the task should be. For example, one can train a model to identify patterns of visual field damage in glaucoma with a large unlabeled set of visual fields from patients with the disease, in the hope that the model will “learn” the different patterns in those fields. This approach has previously been used to classify fields in glaucoma, as well as to detect progressive change over time.–,– Finally, semisupervised learning uses a combination of the two approaches, where one has a relatively small set of labeled data and generally a much larger amount of unlabeled data. The labeled dataset is used to obtain a reasonable initial model, which is then used to perform predictions on the unlabeled dataset. Such new predictions can then be used to retrain the model and the process is repeated until a final satisfactory model is obtained. This situation can occur, for example, when the process of data labeling is time-consuming or expensive. Application of semisupervised models in glaucoma has been rare, but this is a promising approach that deserves more investigation. The process of training requires the investigator to have an amount of data that will be used to train the model and a separate dataset that will be used to check the model's predictions (i.e, validation). If the predictions are unsatisfactory, then certain parameters of the model can be changed—for example, the number of hidden layers of neurons—and the training is repeated. It is important to note that this process of training and validating the model is highly iterative and time-consuming and still requires substantial human input. Therefore the notion of AI as being a fully self-programming intelligent algorithm is still not the day-to-day reality in deep learning. Importantly, the validation dataset needs to be independent from the training dataset, so that the predictions of the model are unbiased. Cross-validation may also be used for parameter search during training, notably in the presence of relatively small samples. Even more importantly, because the results of predictions on the validation dataset are used to fine-tune the model, there is a need for yet another set of data, independent of the training and validation sets, to be used for final testing of the model. This test set should be used only at the very end of the process, when the final model has been obtained and one needs to obtain the final estimates of accuracy. There is considerable confusion in the literature in the naming of these datasets, with the test set sometimes called the validation set. However, the important point is to acknowledge that the final estimate of accuracy of the model needs to be obtained on a test sample that has never been used in any part of the development of the model. Also, it is important to prevent “leakage” among the datasets. For example, if multiple datapoints or images are acquired on the same patient, then images from that patient should not exist in both the training and test dataset. Otherwise the predictions would be biased and would severely overestimate the algorithm's performance. In addition to ensuring that there is no data leakage between the test and training datasets, it is also important to ensure that the test set contains a representative sample of the target population to which the test is planned to be applied. Very often one sees that the test sample has characteristics (e.g., prevalence and severity of glaucoma) that are quite different than those of the target population. One challenge to the development of a deep learning algorithm is the general requirement of very large datasets for training, which can be on the order of thousands or even millions of images. This occurs because of the very large number of parameters in these models. Some state-of-the-art CNNs have dozens of layers, resulting in millions of parameters that need to be trained. Ophthalmic image datasets of this size are not typically available, especially labeled datasets. However, transfer learning techniques have been applied to overcome this limitation. In transfer learning, a CNN (e.g., ResNet, Inception), previously trained on a very large general image dataset can be used as a general feature extractor and undergo additional training so it can be refined to perform a more specific task (e.g., distinguishing glaucoma from nonglaucoma) using a much smaller dataset. Transfer learning techniques are now ubiquitously applied to train CNNs that detect glaucoma on imaging datasets of more limited size.

Deep Learning Models in Glaucoma

There have been several publications using deep learning models for screening, diagnosis, and detection of progression in glaucoma (Table 1). To fully appreciate their impact, it is important not only to concentrate on the benefits afforded by advanced methods such as deep learning networks, but also to have a critical understanding of the context in which the diagnostic test is going to be applied.

Table 1.

Summary of Studies Using Deep Learning Models in Glaucoma

Citation	Training/Validation Dataset	Test Dataset	Reference	Network	Data type	Output	Results
Ting et al.²⁷	Train: 125,189	Test: 71,896	Subjective grading of photographs	Custom deep learning system	Color Fundus Photos	“Referable for glaucoma” vs. not	AUC 0.942; Sensitivity 96.4%, Specificity 87.2%
Li et al.⁹	Train: 31,745	8000	Subjective grading of photographs	Inception-v3	Color Fundus Photos	“Referable for glaucoma” vs. not	AUC 0.986; Sensitivity 95.6%, Specificity 92.0%
Christopher et al.³⁶	9189 healthy, 5633 GON: divided randomly into multiple folds for 10-fold cross-validation.	10% test	Subjective grading of photographs	VGG6, Inception-v3, ResNet50	Color Fundus Photos	“GON” vs. healthy	ResNet50 AUC 0.91; Sensitivity 85% at 80% Specificity
Liu et al.²⁸	Train: 29,865 GON, 11,046 probable GON, 200,121 unlikely GON	Validation: 4514 GON, 571 Probable GON, 23,484 unlikely GON	Subjective grading of photographs	ResNet	Color Fundus Photos	“Referable GON” vs. not	AUC 0.996, Sensitivity 96.2%, Specificity 97.7%
Ahn et al.²⁹	Train: 228 Advanced glaucoma, 131 Early glaucoma, 385 Normal; Validation: 98 Advanced glaucoma, 61 Early glaucoma, 165 Normal	Test: 141 Advanced glaucoma, 87 Early glaucoma, 236 Normal	Subjective grading of visual field, OCT and RNFL photographs	Inception-v3; Custom 3-layer CNN	Color Fundus Photos	Glaucoma vs. Normal	Inception-v3 model: AUC 0.93; Average accuracy 84.5%; Custom 3-layer CNN: AUC 0.94, Average accuracy 87.9%
Phene et al.³¹	Train: 35,877 Non-glaucomatous, 20,740 Low-risk GS, 13,180 High-risk GS, 5307 Likely glaucoma, 18,487 Referable glaucoma; Tuning: 849 Non-glaucomatous, 259 Low-risk GS, 268 High-risk GS, 110 Likely glaucoma, 378 Referable glaucoma	Validation set A: 687 Non-glaucomatous, 290 Low-risk GS, 170 High-risk GS, 48 Likely glaucoma, 218 Referable glaucoma; Validation set B: 8753 Non-glaucomatous, N/A Low-risk GS, N/A High-risk GS, 890 Likely glaucoma, 890 Referable glaucoma; Validation set C: 63 Non-glaucomatous, N/A Low-risk GS, 175 High-risk GS, 108 Likely glaucoma, 283 Referable glaucoma	Validation set A: Referable GON based on subjective gradings of photographs; Validation set B: Referable GON based on glaucoma-related International Classification of Diseases codes; Validation set C: referable GON based on full glaucoma workup by glaucoma specialists including clinical exam, history, VF assessment, and OCT	Inception-v3	Color Fundus Photos	“Referable glaucoma” vs. Not	Validation set A: AUC 0.945; Validation set B: AUC 0.855; Validation set C: AUC 0.881
Shibata et al.⁸	Train: 1364 glaucomatous appearance vs. 1768 not glaucomatous appearance; 3-fold cross-validation	Test: 33 non-highly myopic glaucoma, 28 highly myopic glaucoma, 27 non-highly myopic normal, 22 highly myopic normal	Train: Subjective gradings of photographs Test: Subjective gradings of photographs and categorization of RNFL and macular inner retinal thickness measurements based on OCT normative database	ResNet	Color Fundus Photos	Glaucomatous vs. Not	AUC 96.5
Li et al.³⁰	Train 20,793/Validation 2,311:11,176 GON-confirmed, 599 GON-suspected, 11,329 Normal; 10-fold cross-validation with a random selection of 9:1 for participants within each fold	Test: 1442 GON-confirmed, 515 GON-suspected, 1524 Normal	Subjective grading of photographs	ResNet101	Color Fundus Photos	GON-confirmed vs. GON-suspected vs. Normal; Referrals (GON-confirmed and GON-suspected) vs. Observation (Normal)	Comparison of GON-confirmed vs. GON-suspected vs. Normal: Accuracy 0.941, Sensitivity 0.957, Specificity 0.929. AUC 0.992 for Referrals (GON-confirmed and GON-suspected) vs. Observation (Normal)
Medeiros et al.²⁴	Train + validation (80% train, 20% validation), 9,136 Glaucoma, 13,410 Suspect, 3982 Healthy	Test: 2070 Glaucoma,3345 Suspect, 877 Healthy	SDOCT global RNFL value; Abnormal (Glaucoma) vs. Normal (Normal + Bordeline) RNFL based on classification of global RNFL by SDOCT normative database	ResNet34	Color Optic Disc Photos paired to SDOCT global RNFL	SDOCT global RNFL value; Abnormal (Glaucoma) vs. Normal RNFL	Pearson r = 0.832, P < 0.001 between DL predicted and actual SDOCT value; Mean absolute error 7.39 µm; AUC 0.944 for DL vs 0.94 for SDOCT (P = 0.72); 90% sensitivity at 80% specificity for both.
Thompson et al.²⁶	Train + validation (80% train, 20% validation): 4,570 Glaucoma, 1924 Suspect, 1046 Healthy	Test: 970 Glaucoma, 432 Suspect, 340 Healthy	Global and sector BMO-MRW thickness values; Abnormal (Glaucoma) vs. Normal (Suspect + Normal) based on classification of BMO-MRW global and sector values by SDOCT normative database	ResNet34	Color Optic Disc Photos paired to SDOCT global BMO-MRW	Global and sector BMO-MRW thickness values; Abnormal (Glaucoma) vs. Normal	Global BMO-MRW Pearson r = 0.88 (P < 0.001) between DL predicted and actual SDOCT value; Mean absolute error 27.8; AUC for DL 0.945 vs. actual SDOCT 0.933 (P = 0.59).
Devalla et al.⁶⁷	40 control/60 glaucoma; training on datasets of 10, 20, 30 or 40 B-scans, with equal number of glaucoma and healthy scans in each cross-validation experiment	Cross-validation experiments with test sets of 90, 80, 70, or 60.	Manual segmentation of ONH OCT	Custom eight-layer CNN	Horizontal B-scan through ONH	Digital stain of RNFL+prelamina, RPE, all other retinal layers, choroid, peripapillary sclera, lamina cribrosa	Dice coefficient 0.84, Sensitivity 92%, specificity 99%, accuracy 94%
Mariottoni et al.⁶⁵	Train 10,520/Validation 2742	Test Set 1 (images without segmentation errors or artifacts) 11,010; Test Set 2 (low-quality images with segmentation errors) 237; Test Set 3 (images with other artifacts) 776	Global RNFL thickness value	ResNet34	SDOCT raw B-scans of peripapillary RNFL	Global RNFL thickness value	Test set 1: Pearson r 0.983 (P < 0.001) between predicted segmentation-free and actual SDOCT global RNFL; MAE 2.41; Test set 2: DL correlation with BAE r = 0.972 vs. with conventional algorithm 0.94, P < 0.001. Test set 3: DL correlation with BAE r = 0.94 vs. with conventional algorithm r = 0.64, P < 0.001.
Thompson et al.²⁵	Train + Validation (50%+20%): 4828 Glaucoma, 9638 Normal	Test (30%): 3897 Glaucoma, 2443 Normal	Glaucoma (based on GON and reproducible glaucomatous visual field defects) vs. Healthy	ResNet34	SDOCT raw B-scans of peripapillary RNFL	Glaucoma vs. Healthy	AUC 0.96 for DL algorithm vs. AUC 0.87 for global RNFL thickness (P < 0.001)
Maetschke et al.⁶⁶	Train (80%): 672 POAG, 216 Healthy; Validation (10%): 30 Healthy, 82 POAG	Test (10%): 93 POAG, 17 Healthy	Glaucoma (based on glaucomatous VF defects on 2 consecutive tests) vs. Healthy	Custom 5-layer CNN	OCT of the ONH	Glaucoma vs. Healthy	AUC 0.94
Asaoka et al.⁷	Pretraining: 1371 Open angle glaucoma, 193 Healthy; Training: 94 Open angle glaucoma, 84 Healthy	Test: 114 Open angle glaucoma and MD >−5 dB, 82 Healthy	Glaucoma (based on GON and glaucomatous VF defects) vs. Healthy	Custom 6-layer CNN	8 × 8 macular grid	Glaucoma vs. Healthy	AUC 0.937
Xu et al.⁶⁹	Cross-validation (85%: 80% training/20% validation) 1632 OAG, 1764 closed	Test (15%): 311 open, 329 closed	Angle Closed vs. Open based on gonioscopic grade	ResNet18; Inception v3	Anterior Segment-OCT	Angle closed vs. Open	AUC 0.928
Fu et al.⁷⁰	7375 open angle, 895 angle closure: 5-fold cross-validation - four groups, each with 1654 angle closure tests for training, and one group of 1654 angle closure for testing	1654 angle closure for testing within each fold	Angle closed vs. open based on gonioscopic grade	VGG-16	Anterior Segment-OCT	Angle closed vs. open	AUC 0.96, sensitivity 90%, specificity 92%
Mariottoni et al.⁷⁶	Training/Validation:3980 Glaucoma, 3732 Normal	Test:1061 Glaucoma, 1057 Normal	GON vs. GON suspects vs. Normal based on SAP and OCT objective criteria (see Table 2)	ResNet50	Optic Disc Photos	GON vs. Normal	AUC 0.92, Sensitivity 77% at Specificity 95%
Li et al.⁷¹	Overall: 2389 Glaucoma, 1623 Non-glaucoma: Train: 3712	Test: 300	Glaucoma (based on glaucomatous damage to ONH and reproducible glaucomatous VF defects) vs. Healthy	VGG	Pattern Deviation plots from Humphrey Field Analyzer 30-2 or 24-2 visual field tests	Glaucoma vs. Healthy	AUC 0.966, Sensitivity 93.2%, Specificity 82.6%
Kucur et al.⁷²	1979 control (Rotterdam 244; Budapest 1735), 2811 Early glaucoma (Rotterdam 2,279; Budapest 532)– 10-fold cross- validation	10-fold cross-validation; unclear if separate test and validation datasets were used	Early glaucoma (based on glaucomatous neuroretinal rim loss, reproducible VF defects, and IOP) vs. Healthy	Custom 7-layer CNN	OCTOPUS 101 G1 and Humphrey Field Analyzer 24-2 visual field tests	Early Glaucoma vs. Healthy	Average Precision: Rotterdam 87.4%, Budapest 98.6%
Asaoka et al.¹⁰	171 Preperimetric glaucoma vs. 108 Normal and 63 artificially generated Normal—leave one out cross-validation	Leave one out cross-validation; a separate test dataset was not used	Preperimetric OAG (based on ONH changes, VF preceding perimetric field changes) vs. Healthy	Custom DL feed-forwardneural network	Humphrey Field Analyzer 24-2	Preperimetric glaucoma vs. Healthy	AUC 0.926
Berchuck et al.⁸⁹	Train (81%): 768 Glaucoma, 1793 Glaucoma suspects, 547 Normal Validation (9%): 83 Glaucoma, 222 Glaucoma suspect, 58 Normal5-fold cross-validation	Test (9%): 93 Glaucoma, 206 Glaucoma suspect, 62 Normal	Glaucoma (repeatable glaucomatous VF defect and corresponding optic nerve damage) vs. Glaucoma suspect (high IOP or suspicious optic nerve but no VF defect) vs. Normal (No visual field or optic nerve defect)	Deep variational autoencoder	Humphrey Field Analyzer 24-2	Rates of VF progression compared to SAP MD; Prediction of future VF compared to point-wise regression predictions	Rate of progression significantly higher for VAE than MD at 2 years (25% vs. 9%) and 4 years (35% vs. 15%) from baseline. MAE for prediction of 4^th, 6^th, and 8^th visits significantly smaller for VAE than PW (P < 0.001)
Wen et al.⁹¹	Train + validation (80%): 25,723 and 10-fold cross-validation	Test (20%): 6720	Actual HFA points and Mean Deviation from HVF	CascadeNet- 5	Humphrey Field Analyzer 24-2	HFA points and Mean Deviation	PMAE 2.47; Mean difference in MD between predicted and actual MD = 0.41 dB, Pearson r = 0.92, P < 0.001

BAE, best available estimate; DL, deep learning; GON, glaucomatous optic neuropathy; VF, visual field; HFA, Humphrey Field Analyzer; HVF, Humphrey Visual Field; IOP, intraocular pressure; MAE, mean absolute error; PMAE, point-wise mean absolute error; POAG, primary open angle glaucoma; OAG, open angle glaucoma; ONH, optic nerve head; RPE, retina pigmented epithelium.

Summary of Studies Using Deep Learning Models in Glaucoma BAE, best available estimate; DL, deep learning; GON, glaucomatous optic neuropathy; VF, visual field; HFA, Humphrey Field Analyzer; HVF, Humphrey Visual Field; IOP, intraocular pressure; MAE, mean absolute error; PMAE, point-wise mean absolute error; POAG, primary open angle glaucoma; OAG, open angle glaucoma; ONH, optic nerve head; RPE, retina pigmented epithelium.

Glaucoma Screening with Fundus Photography

An important misconception concerns what constitutes early glaucoma diagnosis from a screening standpoint. This is often meant to imply diagnosis at a very early stage, before any significant visual field loss is detectable by perimetry or sometimes even before the appearance of clear signs of optic nerve damage. However, focusing on this early stage of disease for screening is not only unnecessary, but also leads to problems related to the uncertainty in diagnosis. From a public health standpoint, an early diagnosis means diagnosing a patient at a stage earlier than the patient would have presented symptomatically. Given that symptomatic presentation of glaucoma generally occurs only at a late stage, almost any stage of glaucoma is early disease from the point of view of screening. Given the relatively low prevalence of glaucoma and the difficulties related to discriminating early glaucoma from normal variation, attempting to focus on screening programs for detecting very early disease will likely lead to failure. However, moving the focus to well-established cases of glaucoma that would still be asymptomatic will lead to much improved diagnostic accuracy and effectiveness. With this concept in mind, we can review recent studies proposing the use of deep learning methods to screen for glaucoma using a variety of diagnostic tests, from fundus photographs to OCT. Fundus photography represents a relatively low-cost option for screening of certain eye diseases and has been successfully incorporated into teleophthalmology programs to detect diabetic retinopathy. There are a number of inexpensive, portable nonmydriatic fundus cameras that can be used to acquire imaging in a low-resource setting, making this method particularly attractive for community-based or opportunistic screening. Once a deep learning model is successfully trained to recognize the presence of disease on fundus photographs, it can then be easily deployed to provide gradings on previously unseen photos in real time. Ting and colleagues proposed that a deep learning algorithm could be developed to screen for glaucoma in existing teleretinal imaging. Using a large database of 494,661 teleretinal photographs acquired in diabetics, 125,189 of which had been labeled by human graders in the training set, they developed an algorithm capable of detecting images that were considered “referable” for glaucoma, based on subjective grading of the photographs by ophthalmologists or professional graders. In the test dataset, their algorithm detected “referable” glaucoma on photographs with an area under the receiver operating characteristic (ROC) curve of 0.942, sensitivity of 96.4%, and specificity of 87.2%. It is important to note that, although Ting and colleagues proposed that their approach could be used to screen for glaucoma, such application would not be appropriate at the level of specificity reported. A specificity of 87.2% would translate into approximately 13% of false-positive results. When applied in the context of screening, this would result in an enormous number of healthy individuals being unnecessarily referred for evaluation, if all those with positive tests were to be referred. As a matter of fact, simple calculations of disease probability based on diagnostic likelihood ratios– show that their proposed model would generally be of little utility if applied in the context of screening. Their estimates of sensitivity and specificity would result in a positive likelihood ratio of 7.5. So, for example, if one were to suppose a prevalence of 5% for glaucoma, a positive test result would bring the post-test probability of disease (i.e., the new probability after the test result is known) to 40%. Therefore, even if an individual were to test positive, his/her chance of disease would still be relatively low. A negative test result would only serve to decrease an already very low pre-test probability of disease of 5%. Therefore both positive, as well as negative test results would not do much to change the probability of glaucoma if applied in a screening setting. In another study, Li et al. were able to derive somewhat better results. Using a similar approach, they labeled 48,116 color fundus images as “referable” (yes vs. no) for glaucoma based on human graders, then trained a deep learning algorithm using 31,745 of the images and applied the algorithm to a random subset of 8000 images that had been separated for final testing. The ROC curve area, sensitivity, and specificity were 0.986, 95.6%, and 92%, respectively. However, it is not clear from the methodology whether the same subject could have had an image in both the training and validation datasets. It was also not clear in their study whether a completely separate test set was used for final evaluation of the model. As discussed before, this could lead to biased estimates of accuracy. In contrast to diabetic retinopathy, the approach of training deep learning models to replicate human gradings of fundus photographs for glaucoma raises numerous potential problems. Previous studies have shown that human gradings have limited reproducibility– and poor interrater reliability.– Ophthalmologists tend to undercall glaucoma in small optic discs but overcall it in physiologically enlarged cups. Thus, if human graders are used as the reference standard, then the algorithms can only perform as well as the human gradings and will essentially learn to replicate these common mistakes. For example, in the study by Li et al., the deep learning algorithm tended to underdiagnose glaucoma in high myopes, thus increasing the false-negative rate, but overcalled glaucoma in physiologically enlarged cups, thus inflating the false-positive rate. If such models are to be used in the context of screening for the disease, the graders should be trained to detect cases of well-established nerve damage, not dubious, potentially “referable” or suspect cases. As described before, by targeting well-defined cases, diagnostic accuracy could be improved, leading to more effective screening tests. An alternative approach for training deep learning models for evaluation of fundus photographs in glaucoma has been proposed by Medeiros et al. and called machine-to-machine (M2M). In the M2M model, a deep learning algorithm was trained on color fundus photographs that were labeled with an objective quantitative reference standard, the corresponding global retinal nerve fiber layer thickness measurement from spectral-domain optical coherence tomography (SDOCT). Because of its high reproducibility and accuracy,, SDOCT has become the de facto standard for objective quantification of structural damage in glaucoma. However, unlike color fundus photographs, SDOCT technology is expensive and not easily portable, which limits the feasibility of widespread adoption in screening efforts. By training the M2M deep learning algorithm to predict the RNFL thickness value when assessing a color fundus photograph, the degree of glaucomatous damage could be quantified rather than just “qualified.” There was a strong correlation between the predicted RNFL value from the deep learning algorithm's interpretation of the fundus image and the actual RNFL thickness value from the corresponding SDOCT (r = 0.832, P < 0.001), with a mean absolute error of approximately 7 µm (Fig. 3). The M2M model had a similar performance to that of SDOCT RNFL thickness to discriminate glaucomatous from normal eyes, as defined based on visual field loss, with ROC curve areas of 0.940 versus 0.944, respectively (P = 0.724). The authors used class activation maps or heatmaps to highlight the areas of the photographs that were most important to the deep learning model's predictions, and, as shown in Figure 4A, these maps showed that the model was correctly targeting the area of the optic disc and adjacent RNFL.

Figure 3.

Figure 4.

Class activation maps (CAM) for several examples of deep learning models. (A) Gradient-weighted CAM from the M2M model to predict RNFL thickness from fundus photographs. It can be seen that the heatmap correctly highlights the area of the optic nerve and adjacent RNFL as most relevant for the predictions. (adapted from Medeiros et al.). (B) Gradient-weighted CAM map from the M2M model used to predict rim width in an eye with glaucoma. Note that the heatmap strongly highlights the cup and rim regions. (adapted from Thompson et al.). (C) CAM showing the regions in a spectral-domain optical coherence tomography volume identified as the most important for the classification of the scan into healthy versus glaucoma. For glaucoma eyes the map generally highlighted regions that agree with established clinical markers for glaucoma diagnosis, such as the optic disc cup and neuroretinal rim. It should be noted, however, that the highlighted areas are often very broad, sometimes extending even to the vitreous (adapted from Maetschke et al.).

Examples of optic disc photographs and corresponding actual SDOCT measurements of average RNFL. Above each photo are also shown the DL prediction of average RNFL thickness from the optic disc photograph by the M2M algorithm. Note that the predictions from the DL algorithm can be quite close to actual SDOCT RNFL thickness measurements for a variety of photos. Adapted from Medeiros et al.. Class activation maps (CAM) for several examples of deep learning models. (A) Gradient-weighted CAM from the M2M model to predict RNFL thickness from fundus photographs. It can be seen that the heatmap correctly highlights the area of the optic nerve and adjacent RNFL as most relevant for the predictions. (adapted from Medeiros et al.). (B) Gradient-weighted CAM map from the M2M model used to predict rim width in an eye with glaucoma. Note that the heatmap strongly highlights the cup and rim regions. (adapted from Thompson et al.). (C) CAM showing the regions in a spectral-domain optical coherence tomography volume identified as the most important for the classification of the scan into healthy versus glaucoma. For glaucoma eyes the map generally highlighted regions that agree with established clinical markers for glaucoma diagnosis, such as the optic disc cup and neuroretinal rim. It should be noted, however, that the highlighted areas are often very broad, sometimes extending even to the vitreous (adapted from Maetschke et al.). Thompson et al. published a follow-up study using a similar approach in which the SDOCT Bruch's membrane opening-minimum rim width (BMO-MRW) parameter served as a reference standard for labeling optic disc photographs. BMO-MRW may be particularly useful in images where the optic disc is difficult to grade, such as cases of high myopia. The DL predictions were again highly correlated with the actual BMO-MRW values (Pearson's r = 0.88, P < 0.001), and the ROC curve areas for discriminating between glaucomatous and healthy eyes were 0.945 for the DL predictions and 0.933 for the actual measurements (P = 0.587). Similarly, class activation maps confirmed that the neuroretinal rim was critical to the algorithm's classification (Fig. 4B). In a subsequent study, Jammal et al. demonstrated that the M2M DL algorithm performed at least as well as and often better than human graders for detecting eyes with reproducible glaucomatous visual field loss. Compared to training using subjective human labeling as the reference standard, the M2M approach may offer a distinct advantage, because the output is quantitative rather than qualitative, of allowing cut-offs to be established in order to optimize its application in a screening setting. Also, the ability to have quantification of the amount of neural loss raises the possibility that fundus photographs could be used to detect change over time in low-resource settings where SDOCT is unavailable, although this still needs confirmation. It should be acknowledged that notable challenges remain before any of these algorithms are ready for application in real-world settings. For example, color fundus images can exhibit a wide range of photographic quality, especially when acquired in less controlled settings. It is possible that some deep learning algorithms may underperform if applied to images captured on different cameras from those used in the training dataset. The impact of co-morbid pathologies on the diagnostic performance of these algorithms is also uncertain, since they have so far been mostly trained and tested on datasets that eliminated images with other ocular pathologies (e.g., retinal diseases, high myopia). In that sense, the Pegasus (Visulytix Ltd, London, UK) is a cloud-based AI system for evaluation of fundus photography that uses a collection of CNNs, each specializing on a different task as part of the image assessment, such as identification of key landmarks (optic disc, macula), clinical features, and pathology classification. The system is designed to generalize to any fundus photograph that contains the optic disc, by first using a CNN to find and extract the optic nerve and then feeding a standardized image to another CNN that performs the classification. In a study by Rogers and colleagues, the Pegasus AI system was compared to 243 European Ophthalmologists and 208 British optometrists in grading photographs for the presence of glaucomatous damage, achieving an overall accuracy of 83.4% and an area under the receive operating characteristic curve of 0.871, which was comparable to that of average ophthalmologists and optometrists.

Glaucoma Diagnosis with Optical Coherence Tomography

SDOCT has become the most widespread diagnostic tool for detecting glaucomatous structural damage., Measurements of the optic nerve head, macula, and RNFL are routinely used in clinical practice for disease diagnosis and detection of progression.,– However, conventional assessment of structural damage with SDOCT requires that the structures of interest be segmented, so that appropriate measurements, such as RNFL thickness, can be extracted. This segmentation process is done automatically by the machine's software, but it is still largely imperfect. Studies have documented segmentation errors and artifacts on 19.9% to 46.3% of SDOCT scans of the RNFL.– Manual review and correction of segmentation errors, although possible, are time-consuming and difficult to perform in a busy clinical practice. Another difficulty in the interpretation of SDOCT scans arises from the multiple parameters and regions that are analyzed. It can be difficult for the clinician to integrate all the information derived from global and sectoral RNFL thickness measurements, as well as topographic optic nerve head parameters and macular assessment. The large number of parameters increases the chance of committing what is known as a type I error, in which an abnormality is found just by chance. This has led to the concept of “red disease,” in which a diagnosis of glaucoma is erroneously made based solely on the finding of a “red” result on one or a few of the parameters given in the SDOCT printout, without other corroborating clinical features. Given these limitations in the interpretation of OCT, deep learning models may provide alternative ways to quantify structural damage without reliance on previously defined features derived from the automated segmentation software. As noted before, deep learning algorithms can learn features from data automatically, as long as enough data are given to them. Therefore these models can make use of raw SDOCT images without requiring the input of pre-defined features. Along those lines, Mariottoni et al. recently demonstrated that a segmentation-free deep learning algorithm could be trained to predict RNFL thickness when assessing a raw OCT B-scan. The segmentation-free predictions were highly correlated with the conventional RNFL thickness (r = 0.983, P < 0.001), with mean absolute error of approximately 2 µm in good-quality images. Most importantly, in images where the conventional segmentation failed, the deep learning model still extracted reliable RNFL thickness information. In a more general approach, Thompson et al. showed that a deep learning algorithm could be trained using the raw SDOCT B-scan to directly discriminate glaucomatous from healthy eyes. The proposed algorithm achieved a better diagnostic performance than the conventional RNFL thickness parameters from the instrument's printout, with area under the ROC curve of 0.96 vs. 0.87 for the global peripapillary RNFL thickness (P < 0.001). Another study by Maetschke et al. similarly developed a deep learning algorithm that could distinguish between glaucomatous and healthy eyes using raw, unsegmented OCT volumes of the optic nerve head. The algorithm also performed better than conventional SDOCT parameters, with an area under the ROC curve of 0.94 versus 0.89 for a logistic regression model combining SDOCT parameters. As illustrated in Figure 4C, the class activation maps (heatmaps) appeared to highlight regions in the OCT volume that have been clinically well established as important to glaucoma diagnosis, particularly the neuroretinal rim, optic disc cupping, and the lamina cribrosa and its surrounding area. Heatmaps can help us better understand a CNN by highlighting the most relevant pixels in the image used for the predictions. Highlighted regions can thus be subjected to more detailed analysis. It should be noted, however, that class activation maps usually do not have enough resolution to be able to precisely pinpoint small areas that were relevant for the classification. This lack of precision occurs because of the way deep learning models with convolutional layers are built, leading to a down-sampling of the final layers from which the maps are created. Also, the efficiency of a heatmap largely depends on the model used and the amount and quality of available training data. As such, one can see from the heatmaps shown in Figure 4C that they highlight very broad areas, which sometimes seem to include even the vitreous as relevant to the discrimination of glaucoma from normal. Although the deep learning algorithm may indeed be capturing information that is not yet clear to human eyes, the resolution limitations of these heatmaps need to be kept in mind. In addition to RNFL and optic nerve head scans,, deep learning models have also been used to investigate macular scans., Asaoka et al. showed that a deep learning model built from an 8 × 8 macular grid showed superior performance for detecting glaucoma damage compared to macular RNFL thickness or ganglion cell complex measurements. Of interest, the deep learning model also performed better than traditional techniques of SVM and random forest applied to the macular measurements. In another study, Muhammad and colleagues attempted to build a “hybrid” deep learning system to detect glaucoma from wide-field swept-source OCT. In their approach, a pretrained CNN was initially used to extract features from probability map images, which were then used as input to a random forest model for classification. Their model performed better than conventional summary OCT parameters. However, the study included a very small sample of only 57 glaucoma patients and 45 healthy subjects. Even though the authors claim to have used cross-validation to assess the performance of the model on a different sample than that used for training, such a small sample is unlikely to allow for enough variation and generalizability. In addition to analysis of posterior segment OCT, deep learning models have also been applied to anterior segment OCT images for diagnosing narrow angles or angle closure., Fu et al. found an area under the ROC curve of 0.96 for a deep learning system trained to detect angle closure from Visante OCT images, with sensitivity of 90% and specificity of 92%, compared to clinician gradings of the same images as reference standard. In another work, Xu et al. tested three different multiclass CNNs in Chinese-American eyes, and the best-performing classifier (ResNet18 architecture) detected gonioscopic angle closure with an area under the ROC curve of 0.928 in the test dataset. Given the difficulties in the subjective interpretation of anterior segment OCT images, such models offer great promise in automating the evaluation of those images for detecting the presence of narrow angles.

Glaucoma Diagnosis with Standard Automated Perimetry

Visual field data have also been harnessed to train various deep learning algorithms to detect glaucomatous damage, often showing a similar, if not better, level of performance relative to expert graders.,, Li et al showed that a deep learning algorithm trained with the probability map of the pattern deviation image was better able to distinguish normal from glaucomatous visual fields (accuracy 87.6%) than either human graders (62.6%), the Glaucoma Staging System 2 (52.3%), or the Advanced Glaucoma Intervention Study criteria (45.9%). Work by Asaoka and colleagues suggested that deep learning algorithms may be preferable to other traditional machine learning classifiers for diagnosis of glaucoma in visual fields that still appear to be normal based on standard parameters. However, their study did not show convincing data on how much earlier the deep learning model could detect damage before conventional parameters. CNNs have also been shown to discriminate between controls and early glaucoma on visual fields with a higher accuracy than use of standard perimetry mean deviation (MD) or neural networks without convolutional features. Several machine learning approaches have been used to attempt to classify visual field data following unsupervised approaches, with methods such as independent component analysis and its variations.–,, In a more recent study, Elze et al proposed a technique of “archetypal analysis” to classify patterns of visual field loss in glaucoma. The authors showed that the patterns detected by their technique, such as arcuate, partial arcuate, etc., corresponded well to classification by human graders in the Ocular Hypertension Treatment Study. In a follow-up study, Wang et al. proposed to use archetypal analysis to classify central visual field patterns in glaucoma. It should be noted, however, that archetypal analysis is a statistical technique closely resembling traditional factor analysis and bearing no relationship to deep learning artificial neural networks. This fact, however, does not negate its potential benefit and future studies should evaluate whether this technique may provide clinically relevant information to be used in practice. An important consideration of applying DL models to SAP interpretation is that these models are usually trained only with reliable tests and may not be able to identify unreliable exams which are often seen in clinical practice. Thus, before such approaches can be applied in real-world settings, DL models may also need to be trained to learn which tests are of poor or good quality and reliability.

Glaucoma Diagnosis with Structure and Function

Previous studies have shown that machine learning classifier models trained with both structural and functional tests may have improved discriminatory power compared to those trained with either structure or function alone.,,,, In a similar way, it is likely that deep learning models trained with a combination of structural and functional tests may also show improved performance. It should be noted, however, that there are challenges related to the development of such models. Most notable is the lack of a perfect reference standard, or “gold standard,” for glaucoma diagnosis. Given the fact that the reference standard may use a combination of structural and functional tests, it becomes difficult to set up a proper unbiased study to evaluate the diagnostic accuracy of a new diagnostic method proposing to also use a combination of structure and function. In these circumstances, it is important to understand the clinical purpose and the settings to which the new method is intended for application. For example, suppose that one wishes to develop a deep learning model that can replicate in a clinical setting the performance of glaucoma experts in diagnosing the disease. It is then reasonable to set up a study where the experts will produce the reference standard by grading a combination of SDOCT images and visual field printouts, perhaps accompanied by other clinical information, and a deep learning model will be trained to attempt to replicate such standard as much as possible, based on all available information as well. Such a model could have tremendous impact in clinical practice by bringing general practitioners to a level comparable to those of experts in diagnosing the disease in a clinical setting. In a recent study, Mariottoni and colleagues proposed a set of relatively simple structural and functional parameters that could be combined in an objective way to be used as a robust reference standard for the development of AI models for glaucoma diagnosis. The criteria proposed that a diagnosis of glaucomatous optic neuropathy should involve corresponding structural and functional damage, based on RNFL assessment by SDOCT and visual field assessment by standard automated perimetry. The set of criteria are summarized in Table 2 and uses well-established global and localized parameters with the requirement for topographic correspondence between structural and functional damage, which greatly enhances specificity. The authors then developed a deep learning model that used fundus photographs to discriminate glaucoma from normal eyes, which had been classified based on the objective reference standard. The model achieved an overall area under the ROC curve of 0.92. Of note, an objective reference standard combining SDOCT and SAP data may obviate the need for laborious and time-consuming expert gradings, and may increase the comparability of diagnostic studies across devices and populations.

Table 2.

Summary of Proposed Objective Criteria for Definition of GON

	SDOCT	SAP
GON
Global loss	Global RNFL thickness outside normal limits	GHT outside normal limits or PSD, P < 5%
Localized loss	RNFL thickness outside normal limits in at least one superior sector (temporal superior and/or nasal superior)	Inferior MD, P < 5%
	RNFL thickness outside normal limits in at least one inferior sector (temporal inferior and/or nasal inferior)	Superior MD, P < 5%
Normal	RNFL thickness within normal limits for all sectors and global	PSD probability not significant (P > 5%) and GHT within normal limits

To be considered glaucomatous optic neuropathy, it was necessary to meet the criteria for global or localized loss. To be considered normal, it was required that both SDOCT and SAP results were normal. SDOCT-SAP pairs that do not meet the criteria for GON or normal are considered suspects. GHT, glaucoma hemifield test; PSD, pattern standard deviation; GON, glaucomatous optic neuropathy; SDOCT, spectral-domain optical coherence tomography; SAP, standard automated perimetry.

Summary of Proposed Objective Criteria for Definition of GON To be considered glaucomatous optic neuropathy, it was necessary to meet the criteria for global or localized loss. To be considered normal, it was required that both SDOCT and SAP results were normal. SDOCT-SAP pairs that do not meet the criteria for GON or normal are considered suspects. GHT, glaucoma hemifield test; PSD, pattern standard deviation; GON, glaucomatous optic neuropathy; SDOCT, spectral-domain optical coherence tomography; SAP, standard automated perimetry.

Glaucoma Progression

Diagnosis of glaucoma progression remains a considerable challenge in clinical practice. The tests used to assess change over time, such as SDOCT and SAP, suffer from considerable test-retest variability, making it difficult to discriminate true change from variability. In addition, there is no consensus on specific criteria to diagnose visual field or structural progression in glaucoma. Such a lack of consensus has hampered progress in the field and has made it difficult to compare the results of different approaches. Despite these challenges, several groups have applied traditional machine learning methods to attempt to improve the ability to diagnose glaucomatous progression on clinical tests.–,– In one of the earliest works of this type, Brigatti et al. trained a shallow artificial neural network using visual field indices (i.e., mean defect, threshold points, corrected loss, variance, false-positive ratio, false-negative ratio), along with patient age, and showed a sensitivity of 73% and specificity of 88% when human gradings were used as the reference standard. Given the lack of a perfect reference standard, several authors have proposed using unsupervised techniques to attempt to detect visual field progression. Sample et al. proposed the use of independent component analysis to identify patterns of glaucoma damage and their change over time. Subsequent studies along the same line used variations of the methodology, including variational Bayesian independent component mixture model and Gaussian mixture-model with expectation maximization., Several other approaches have used Bayesian modeling in some way to improve prediction and detection of glaucoma progression and also to combine structural and functional measurements.,– Of note, although promising, none of these techniques have been widely incorporated in clinical practice. The reason may rely on difficulties of implementation to clinical workflow, which may eventually be overcome with widespread adoption of electronical health records and clinical decision support systems. It is important to note, however, that not all of these methods have shown consistent and substantial advantages compared to relatively simple, well-established, and intuitive methods such as guided progression analysis (GPA; Carl-Zeiss Meditec, Inc, Jena, Germany) or trend-based analysis of mean deviation over time. As for deep learning applications in detecting progression, there have been only very few studies. Berchuck et al. proposed a deep learning variational autoencoder (VAE) model to learn a low-dimensional representation of SAP visual fields using 29,161 fields from 3832 patients. The model was then applied to predict rates of change and future visual field observations. The authors found that at four years of follow-up, the model identified 35% of the eyes as progressing versus only 15% for MD. In another study, Park et al. used a recurrent neural network and showed that it achieved better prediction of future visual field observations compared to ordinary least squares linear regression. Wen et al. also attempted to set up a deep learning model to predict future visual field observations based on the first visual field test only. Their model confirmed the patterns of progression that we know from clinical practice (e.g., a nasal step becomes an arcuate) based on a single Humphrey Visual Field. However, because only the baseline field was used in the prediction, the model was not able to provide information regarding when or how quickly progression would occur.

Conclusions

Deep learning is an exciting technique that holds enormous promise in glaucoma. Deep learning models have consistently been shown to detect and quantify glaucomatous damage using simple color fundus photographs, opening the potential for low-cost screening tests for the disease. In addition, deep learning has been shown to improve assessment of damage on raw SDOCT images and visual field data, which could improve the use of these tests in clinical practice. However, it should be noted that no matter how exciting AI technologies can be, validation of new diagnostic tests should be based on rigorous methodology with particular attention paid to how the reference standards are defined and the settings where the tests are going to be applied in practice. This is especially true for a disease such as glaucoma, where no litmus test exists for diagnosis or detection of change over time. The reference standards to be used may differ significantly, depending on how the test is going to be applied and its purpose. Similarly, the requirements for diagnostic accuracy may vary considerably depending on whether the test is being considered for community-based or opportunistic screening versus detection or monitoring of disease in a tertiary care center. Although significant progress has been made with AI and deep learning in glaucoma, a lot of work remains to be done.

86 in total

1. Combining structural and functional measurements to improve estimates of rates of glaucomatous progression.

Authors: Felipe A Medeiros; Linda M Zangwill; Christopher A Girkin; Jeffrey M Liebmann; Robert N Weinreb
Journal: Am J Ophthalmol Date: 2012-02-07 Impact factor: 5.258

2. Bayesian machine learning classifiers for combining structural and functional measurements to classify healthy and glaucomatous eyes.

Authors: Christopher Bowd; Jiucang Hao; Ivan M Tavares; Felipe A Medeiros; Linda M Zangwill; Te-Won Lee; Pamela A Sample; Robert N Weinreb; Michael H Goldbaum
Journal: Invest Ophthalmol Vis Sci Date: 2008-03 Impact factor: 4.799

3. Evaluation of an AI system for the automated detection of glaucoma from stereoscopic optic disc photographs: the European Optic Disc Assessment Study.

Authors: Thomas W Rogers; Nicolas Jaccard; Francis Carbonaro; Hans G Lemij; Koenraad A Vermeer; Nicolaas J Reus; Sameer Trikha
Journal: Eye (Lond) Date: 2019-07-02 Impact factor: 3.775

4. Machine learning classifiers-based prediction of normal-tension glaucoma progression in young myopic patients.

Authors: Jinho Lee; Young Kook Kim; Jin Wook Jeoung; Ahnul Ha; Yong Woo Kim; Ki Ho Park
Journal: Jpn J Ophthalmol Date: 2019-12-17 Impact factor: 2.447

5. Development and Validation of a Deep Learning System for Diabetic Retinopathy and Related Eye Diseases Using Retinal Images From Multiethnic Populations With Diabetes.

Authors: Daniel Shu Wei Ting; Carol Yim-Lui Cheung; Gilbert Lim; Gavin Siew Wei Tan; Nguyen D Quang; Alfred Gan; Haslina Hamzah; Renata Garcia-Franco; Ian Yew San Yeo; Shu Yen Lee; Edmund Yick Mun Wong; Charumathi Sabanayagam; Mani Baskaran; Farah Ibrahim; Ngiap Chuan Tan; Eric A Finkelstein; Ecosse L Lamoureux; Ian Y Wong; Neil M Bressler; Sobha Sivaprasad; Rohit Varma; Jost B Jonas; Ming Guang He; Ching-Yu Cheng; Gemmy Chui Ming Cheung; Tin Aung; Wynne Hsu; Mong Li Lee; Tien Yin Wong
Journal: JAMA Date: 2017-12-12 Impact factor: 56.272

6. Patterns of functional vision loss in glaucoma determined with archetypal analysis.

Authors: Tobias Elze; Louis R Pasquale; Lucy Q Shen; Teresa C Chen; Janey L Wiggs; Peter J Bex
Journal: J R Soc Interface Date: 2015-02-06 Impact factor: 4.118

7. Long-term reproducibility of cirrus HD optical coherence tomography deviation map in clinically stable glaucomatous eyes.

Authors: Kyu Hwa Roh; Jin Wook Jeoung; Ki Ho Park; Beong Wook Yoo; Dong Myung Kim
Journal: Ophthalmology Date: 2013-02-09 Impact factor: 12.079

8. Comparison of machine learning and traditional classifiers in glaucoma diagnosis.

Authors: Kwokleung Chan; Te-Won Lee; Pamela A Sample; Michael H Goldbaum; Robert N Weinreb; Terrence J Sejnowski
Journal: IEEE Trans Biomed Eng Date: 2002-09 Impact factor: 4.538

9. Development of a deep residual learning algorithm to screen for glaucoma from fundus photography.

Authors: Naoto Shibata; Masaki Tanito; Keita Mitsuhashi; Yuri Fujino; Masato Matsuura; Hiroshi Murata; Ryo Asaoka
Journal: Sci Rep Date: 2018-10-02 Impact factor: 4.379

10. Automatic differentiation of Glaucoma visual field from non-glaucoma visual filed using deep convolutional neural network.

Authors: Fei Li; Zhe Wang; Guoxiang Qu; Diping Song; Ye Yuan; Yang Xu; Kai Gao; Guangwei Luo; Zegu Xiao; Dennis S C Lam; Hua Zhong; Yu Qiao; Xiulan Zhang
Journal: BMC Med Imaging Date: 2018-10-04 Impact factor: 1.930

15 in total

Review 1. Machine Learning and Deep Learning Techniques for Optic Disc and Cup Segmentation - A Review.

Authors: Mohammed Alawad; Abdulrhman Aljouie; Suhailah Alamri; Mansour Alghamdi; Balsam Alabdulkader; Norah Alkanhal; Ahmed Almazroa
Journal: Clin Ophthalmol Date: 2022-03-11

2. Glaucoma diagnosis using multi-feature analysis and a deep learning technique.

Authors: Nahida Akter; John Fletcher; Stuart Perry; Matthew P Simunovic; Nancy Briggs; Maitreyee Roy
Journal: Sci Rep Date: 2022-05-16 Impact factor: 4.996

3. Rationale and Development of an OCT-Based Method for Detection of Glaucomatous Optic Neuropathy.

Authors: Jeffrey M Liebmann; Donald C Hood; Carlos Gustavo de Moraes; Dana M Blumberg; Noga Harizman; Yocheved S Kresch; Emmanouil Tsamis; George A Cioffi
Journal: J Glaucoma Date: 2022-02-28 Impact factor: 2.290

4. Association between visual field damage and corneal structural parameters.

Authors: Alexandru Lavric; Valentin Popa; Hidenori Takahashi; Rossen M Hazarbassanov; Siamak Yousefi
Journal: Sci Rep Date: 2021-05-24 Impact factor: 4.379

5. Student becomes teacher: training faster deep learning lightweight networks for automated identification of optical coherence tomography B-scans of interest using a student-teacher framework.

Authors: Julia P Owen; Marian Blazes; Niranchana Manivannan; Gary C Lee; Sophia Yu; Mary K Durbin; Aditya Nair; Rishi P Singh; Katherine E Talcott; Alline G Melo; Tyler Greenlee; Eric R Chen; Thais F Conti; Cecilia S Lee; Aaron Y Lee
Journal: Biomed Opt Express Date: 2021-08-02 Impact factor: 3.732

6. Meibography Phenotyping and Classification From Unsupervised Discriminative Feature Learning.

Authors: Chun-Hsiao Yeh; Stella X Yu; Meng C Lin
Journal: Transl Vis Sci Technol Date: 2021-02-05 Impact factor: 3.283

7. Eye-Movement-Based Assessment of the Perceptual Consequences of Glaucomatous and Neuro-Ophthalmological Visual Field Defects.

Authors: Rijul Saurabh Soans; Alessandro Grillini; Rohit Saxena; Remco J Renken; Tapan Kumar Gandhi; Frans W Cornelissen
Journal: Transl Vis Sci Technol Date: 2021-02-05 Impact factor: 3.283