Literature DB >> 32879754

Evidence Based Prediction and Progression Monitoring on Retinal Images from Three Nations.

Lutfiah Al Turk¹, Su Wang², Paul Krause², James Wawrzynski³, George M Saleh³, Hend Alsawadi⁴, Abdulrahman Zaid Alshamrani⁵, Tunde Peto⁶, Andrew Bastawrous⁷, Jingren Li⁸, Hongying Lilian Tang².

Abstract

Purpose: The aim of this work is to demonstrate how a retinal image analysis system, DAPHNE, supports the optimization of diabetic retinopathy (DR) screening programs for grading color fundus photography. Method: Retinal image sets, graded by trained and certified human graders, were acquired from Saudi Arabia, China, and Kenya. Each image was subsequently analyzed by the DAPHNE automated software. The sensitivity, specificity, and positive and negative predictive values for the detection of referable DR or diabetic macular edema were evaluated, taking human grading or clinical assessment outcomes to be the gold standard. The automated software's ability to identify co-pathology and to correctly label DR lesions was also assessed.
Results: In all three datasets the agreement between the automated software and human grading was between 0.84 to 0.88. Sensitivity did not vary significantly between populations (94.28%-97.1%) with specificity ranging between 90.33% to 92.12%. There were excellent negative predictive values above 93% in all image sets. The software was able to monitor DR progression between baseline and follow-up images with the changes visualized. No cases of proliferative DR or DME were missed in the referable recommendations. Conclusions: The DAPHNE automated software demonstrated its ability not only to grade images but also to reliably monitor and visualize progression. Therefore it has the potential to assist timely image analysis in patients with diabetes in varied populations and also help to discover subtle signs of sight-threatening disease onset. Translational Relevance: This article takes research on machine vision and evaluates its readiness for clinical use. Copyright 2020 The Authors.

Entities: Chemical Disease Gene Species

Keywords: AI algorithm; deep learning; diabetes; diabetic retinopathy; lesion detection

Mesh：

Year: 2020 PMID： 32879754 PMCID： PMC7443119 DOI： 10.1167/tvst.9.2.44

Source DB: PubMed Journal: Transl Vis Sci Technol ISSN： 2164-2591 Impact factor: 3.283

Introduction

Diabetic retinopathy (DR) is a common complication of diabetes mellitus. Among patients with diabetes, DR prevalence is approximately 28.5% in the United States, 34.08% in China, and 34.6% in Saudi Arabia. The diagnosis of DR early in the presymptomatic phase through screening is critical to the eventual visual outcome and relies on a detailed analysis of fundus photographs taken regularly (e.g., often annually) within DR screening programs. At present, photographs are most commonly analyzed by ophthalmologists, optometrists, and professional graders. Several classification systems have been developed and adopted to guide DR screening frequency and ophthalmic referral based on a population's needs or the resources available in different parts of the world. Two commonly used systems are the International Clinical Diabetic Retinopathy and Diabetic Macular Oedema Severity Scale (ICDRS) and the one defined by the UK National Screening Committee (NSC)., Guidance on screening intervals, investigation, and treatment are also incorporated into these guidelines.– In England and Wales, where more than 80% of those with diabetes undergo DR screening at least annually, DR is no longer the leading cause of blindness in the working-age population. However, the majority of countries around the world have no such established screening program; one of the barriers remains the need to have sufficient trained staff to manually grade every fundus image captured. The availability of automated image grading might become a facilitator to support DR screening services in resource limited settings. Over the past two decades, automated retinal image analysis for DR detection and grading has been studied extensively. Computer vision and machine learning methods have been proposed.– The rise of deep learning,, typically implemented as convolutional neural networks (CNNs),– has given a significant boost to the field of automated DR detection. This facilitated the usage of large datasets of fundus images to improve the accuracy and scalability of DR recognition and classification., However, such systems continue to suffer from significant limitations: The inability of many deep learning systems to provide detail or evidence to support their “black-box” DR classification; Absence of the capacity to detect and grade DR within the broader context of other possible diagnoses such as age-related macular degeneration (AMD) and retinal vein occlusion, to minimize false-positive results and indicate the presence of non-DR pathologies.

Hypothesis of the Study

This study aims to evaluate the ability of a retinal image analysis software system to provide effective detection of referable DR and surrogate markers of diabetic macular edema (DME) and monitor the progression of DR and DME based on either the ICDRS or NSC grading criteria. The measure of its performance is based on its agreement with human graders and clinical assessment. Validations were carried out on external testing data collected from three geographic locations, Kenya, Saudi Arabia, and China, with different camera types and settings. The software also provides evidence of its prediction through visualizing relevant lesions and identifies the presence of copathology while not confusing it with DR.

Methods

DAPHNE Automated Software

DAPHNE is an automated system for retinal image analysis developed by the University of Surrey, UK. DAPHNE was originally developed as a software system for diabetic retinopathy filtering of normal images. Over the years it has evolved with the addition of a range of components with the aim of supporting a holistic reading of retinal images on key pathologic manifestations of DR and other disorders. Two major components are evaluated in this work; one is an image-based classifier, the other is an object detector. The DAPHNE classifier is an end to end CNN architecture with multiple output layers for different classifications based on the training samples annotated on their quality, as well as DR grade in either ICDRS or NSC (see below, in the Retinal Images Datasets section of this article). Images were first preprocessed to subtract local average color to reduce differences in lighting. These were then augmented to increase spatial, rotational, and scale variance. To speed up the “learning” process, batch normalization and pre-initialization were used. Pre-initialization also improved performance of the CNN network. The CNN framework was trained to provide multiple outputs, including (1) quality assessment; (2) ICDRS DR grades (e.g., 0—no DR, 1—mild, 2—moderate, 3—severe, or 4—proliferative); (3) UK NSC DR grades (e.g., R0—no DR, R1—background, R2—preproliferative, or R3—proliferative); and (4) presence of referable DR. The DAPHNE object detector has a number of components. A set of U-net and CNN-based detectors were trained to detect retinal anatomic structures, such as the optic disc and macula; the DR lesions, such as microaneurysms, hemorrhages, exudates, and other lesions, such as intraretinal microvascular abnormality (IRMA), new vessel on disc, new vessel elsewhere, cotton wool, drusen, and venous beading. The rest of this section will first describe the prediction workflow after the algorithms are trained, followed by further information on training and validation datasets, prediction output categories as well as how the prediction accuracy is reported.

DAPHNE Workflow

In the first stage of the processing, raw data in any image format are cropped by removing any black mask borders around the retina region then passed through the classifier network to obtain a prediction on both quality and DR scales. To minimize the throwing away of those low readability images caused by the presence of certain pathologies, the probability outputs on both quality measure and disease measures are fed into a logistic regression model parametrized by some samples of images with pure quality issues and those with conditions such as cataract and retinal detachment. This filtering process does not intend to achieve 100% accuracy but aims to pick up some portion of the low quality images caused by different pathological conditions, if any, so they could be processed further. Otherwise those images with low quality scores are marked as ungradable. All the images that are deemed to have a quality score indicating adequate quality, are passed to the next stage. In the second stage, DAPHNE detectors output the locations of anatomical structures in the fundus image, as well as the likelihood that a certain region is pathologic. This works together with the DAPHNE classifier that outputs the DR severity grading. The detected pathological regions that are consistent with the predicted grading level are visualized as evidence for the predicted grade. With regard to DME analysis, the DAPHNE detector detects and visualizes the location of the optic disc, fovea, and any exudate around the macula region, as well as any appearance of microaneurysms or hemorrhages within 1-disc diameter of the fovea.

Analysis for Progression

DR is a progressive disease, and UK data suggest that it may be possible to stratify patients for risk, using grading outcomes only, into groups with low and high risk of progressing to proliferative DR. Subsequently, screening intervals for such diverse groups of patients could then safely be modified according to their risk stratification. In our work the DR progression monitoring was carried out by combining the results from individual image classification on disease level and then adding in the DAPHNE detectors to visualize the changes between time-points. When analyzing a set of images taken for DR screening of the same patients at different examination time points, the system first applies detected anatomical structures to register between baseline and follow-up images of the same eye. It then computes any change in the severity of DR by comparing the grading results of these images. After registration, the lesion detectors are used to extract and visualize the following morphologic changes in pathology (see Fig. 1):

Figure 1.

Top row: images from the same eye taken on October 26, 2015 (baseline), November 3, 2015 (dot hemorrhage), and March 8, 2016 (MA, hemorrhage and preretinal hemorrhage). Bottom row: the comparison of morphological changes for DR signs between a baseline image (October 26, 2015) and a follow-up retinal image (March 8, 2016).

Any new lesion; Any disappearing lesion; Any change of existing lesions (smaller or bigger compared with baseline images). Top row: images from the same eye taken on October 26, 2015 (baseline), November 3, 2015 (dot hemorrhage), and March 8, 2016 (MA, hemorrhage and preretinal hemorrhage). Bottom row: the comparison of morphological changes for DR signs between a baseline image (October 26, 2015) and a follow-up retinal image (March 8, 2016).

Retinal Image Datasets

Training Sets

The development of the DAPHNE classifier was undertaken partly using 35,124 macula-centered retinal fundus images generated by EyePACS and available at Kaggle. These images were already labeled by EyePACS using the ICDRS scheme. We also annotated these images using the NSC grading scheme and used 28,100 of them as part of our training and testing dataset. The remaining 20% (7024 images) served as an internal validation test set. Cameras used to capture the images include the Optovue iCam, Centervue DRS, Topcon NW and Canon CR1/DGi/CR2 using 45° fields-of-view. As with any typical real-world dataset, this dataset included photographs that contained artefacts or could be out of focus, underexposed or overexposed. Another 4980 images were selected from our own collections, from different cameras and ethnic groups, with the annotations in both ICDRS and NSC agreed by three trained graders. Together these form 33,800 training sample images for learning each grading standard (ICDRS and NSC). For lesion detection in the DAPHNE detector, we first used a public database (DiaRetDB1). All images were obtained using the same 50° field-of-view fundus camera with varying imaging settings. It contains 89 digital retinal images and a human-expert annotated ground truth for several common DR lesions, including microaneurysms, hemorrhages and exudates. We then added a further 1952 images annotated by three trained graders and a medical retina specialist on retinal pathologic regions. After random sampling and preprocessing, 50,000 sub-images in smaller patch sizes are sampled from 2041 (89 + 1952) images for training. These subimages include those regions with or without pixels where the lesions are located to form negative and positive samples. For negative samples, normal patches on various locations of the retina without any pathology can be used. There are many possibilities to sample such subimages with the lesions appearing at different positions of the patch for positive samples. Positive samples include those features appearing in DR, as well as those non-DR but pathological. Fifty thousand patch samples allowed the algorithm to be trained under different scenarios. As a general strategy, during the training, all the training data were divided randomly into training and testing sets on the basis of an 80/20 split strategy. These data were not used for any internal or external validation. There was no patient-level overlap in the training and testing sets in either the internal or external validation.

External Validation Test Sets

To test the generalizability and reproducibility of the software, we used three distinct datasets for external validation acquired with varying imaging settings and cameras from the three countries: 15,000 from China, 10,026 from Saudi Arabia, and 24,700 from Kenya.

China

Images were obtained from DR screening, fully anonymized locally and with appropriate permissions in place. The gold-standard for DR grading using the NSC grading criteria was carried out by trained and certified graders. Two fundus images were taken in each eye; one optic disc centered and the other macula centered. No follow up images or clinical assessments were received.

Saudi Arabia

Images were collected at an Eye Clinic after appropriate approvals were put in place. Zeiss Visucam 500 cameras were used once eyes were dilated using pharmacological dilation. As this was a clinic based population, most patients had eye conditions but not necessarily DR. Multiple fundus images were acquired from the same patients with varied examination intervals between image capture (ranging from one month to one year). The ground truth on images was extracted based on clinical assessment and converted to DR grades using NSC grading including features of DME being noted.

Kenya

Data were from a population-based survey undertaken in 2007 to 2008 in Nakuru district, Kenya, as the baseline using a Topcon NW6S Non Mydriatic camera model (Topcon, Tokyo, Japan), then in 2013 to 2014 as follow-up in the same population using a DRS Digital Fundus Camera (Haag-Streit, Köniz, Switzerland). Two 45° fundus photographs were taken in each eye; one optic disc centered and the other macula centered. The gold-standard for DR grading of DR was carried out using ICDRS grading criteria. In addition, age related macular degeneration (AMD) and optic disc changes based on retinal photographs were completed by trained graders at the Moorfields Eye Hospital Reading Centre, London, UK. This was the most complex image and grading set of the 3 but also the one with the most complete grading. A public database, Messidor-2, was also included in the external validation, consisting of 874 subjects with diabetes (1748 digital retinal color images, one fovea-centered image per eye). These subjects were imaged, without pharmacological dilation, using a Topcon TRC NW6 non-mydriatic fundus camera with a 45-degree field of view, centered on the fovea, at varying imaging settings. Two categories of disease have been provided by the medical experts for each image; the ICDR scales and a definition of DME risk based on the distance between macula and hard exudates. Table 1 provides an overview of the training, internal and external validation test sets.

Table 1.

An Overview of the Training and Internal and External Validation Test Sets

	0	1	2	3	4
ICDRS
Training samples on DAPHNE classifier using 28,100 from Kaggle	20647	1955	4234	698	566
Internal validation (7024 from Kaggle)	5161	488	1058	175	142
External validation datasets
Kenya	11479	9463	3395	329	34
NSC	R0	R1	R2	R3	—
Additional training samples on DAPHNE classifier (only its distribution in NSC is shown for simplicity)	3659	346	750	224	—
External validation datasets
NSC	R0	R1	R2	R3	—
China	9986	3279	1240	495	—
Saudi Arabia	7451	1854	582	139	—

0, no DR; 1, mild; 2, moderate; 3, severe; 4, proliferative; R0, no DR; R1; background; R2, preproliferative; R3, proliferative.

An Overview of the Training and Internal and External Validation Test Sets 0, no DR; 1, mild; 2, moderate; 3, severe; 4, proliferative; R0, no DR; R1; background; R2, preproliferative; R3, proliferative.

The Grading Categories and Definitions

Different countries adopt different grading schemes depending on their healthcare resources and policies. The DAPHNE software is trained to predict the probabilities of the following categories: Image quality (gradable or non-gradable): The quality grading standard is based on using the UK National Screening Programme for Diabetic Retinopathy's guidelines for the definition of acceptable quality. Diabetic retinopathy severity: The DAPHNE classifier grades images based on either ICDRS or UK NSC classification schemes; whichever is suitable for the country's needs. A modified definition of DME (0–1): Fundus photography does not reliably identify DME but allows for surrogate markers to be identified. These surrogate markers of edema such as presence of exudates, or microaneurysms within one-disc diameter of the macula, are identified by the DAPHNE software as well. According to the UK NSC guidelines, diabetic maculopathy (M1) is defined as follows: “A group of exudates is an area of exudates that is greater than or equal to half the disc area and this area is all within the macular area,” whereas the macula is defined as “that part of the retina which lies within a circle centered on the center of the fovea whose radius is the distance between the center of the fovea and the temporal margin of the disc.” The detection of DME markers is carried out by the DAPHNE object detector and subsequently classified as a referable disease. Non-referable DR versus referable DR: In ICDRS level 0 or 1 and in UK NSC R0 or R1 are nonreferable DR, whilst ICDRS level 2, 3, 4 and UK NSC R2 or R3 are Referable DR.

Statistical Analysis of Performance

Evaluation was conducted by measuring sensitivity (SN), specificity (SP), positive and negative predictive values (PPV and NPV), and their 95% confidence intervals (CIs). We also calculated the agreement of the DR and DME grading results between the DAPHNE system and human experts by using the quadratic weighted kappa. These analyses were measured through the StatsModels version 0.8.0 and SciPy version 1.0.0 python packages.

Results

The DAPHNE system is being evaluated in its intended stage in the care pathway: reading of referable cases with evidence, noting any progression changes.

On External Validation Datasets

This dataset was graded as 95% gradable by our system. Referable DR prevalence was 42.5% (21,133 images). According to the NSC and ICDRS grading standards, DAPHNE classifies data into referral (R2 and above in NSC or moderate DR and above in ICDRS) and nonreferral cases (R0/R1 in NSC or no apparent retinopathy and Mild NPDR in ICDRS). Because there is no overlapping in images graded in NSC and ICDRS, we report here the combined calculation on referable retinopathy. Any image with detected DME markers is also referable. The kappa scores to measure the agreement between the ground truth and the software on referable diseases in China, Saudi Arabia and Kenya datasets are as follows: 0.85, 0.88, and 0.84, respectively. The performance of DAPHNE with regard to the detection of referable retinopathy at high sensitivity operating points was as follows: sensitivity, 94.1% (95% CI: 92.3%–95.6%); specificity, 87.0% (95% CI: 84.9%–88.9%); negative predictive value, 93.9% (95% CI: 93.9%–96.3%); and positive predictive value, 85.3% (95% CI: 82.1%–86.1%). At high-specificity operating points, the sensitivity of our system was 88.2% (95% CI: 85.9%–90.3%), specificity was 93.0% (95% CI: 91.4%–94.5%), the negative predictive value was 91.5% (95% CI: 89.9%–92.8%), and the positive predictive value was 90.4% (95% CI: 88.3%–92.1%). Tables 2A through 2C show the detail of the software performance on each of these three populations.

Table 2A.

Disease Level	Daphne Predicted Results	Sensitivity	Specificity
Referral vs Non- Referral	Referral	94.28% (93.1%–95.22%)	92.12% (88.27%–93.33%)
	PDR	100% (95.5%–100%)	—
	DME	—
PDR vs Non-PDR	PDR	97.35% (92.3%–99.7%)	85.78% (83.2%–87.81%)

Table 2C.

Disease Level	Daphne Predicted Results	Sensitivity	Specificity
Referral vs. Non- Referral	Referral	95.51% (93.1%–97.50%)	91.11% (85.11%–92.63%)
	PDR	100% (95.8%–100%)	—
	DME	—
PDR vs. Non-PDR	PDR	97.18% (91.2%–99.6%)	87.77% (85.3%–88.80%)

DAPHNE's Performance on External Validations: (a) Sensitivity, Specificity and Corresponding 95% CIs for Referral Level Output to Detect Referral, PDR and DME, and PDR Level Output to Detect PDR on the Kenya Dataset DAPHNE's Performance on External Validations: Sensitivity, Specificity and Corresponding 95% CIs for Referral Level Output to Detect Referral, PDR and DME, and PDR Level Output to Detect PDR on the Saudi Arabian Dataset The ground truth of DME was obtained from eye clinic, to assess the detection of DME markers by the DAPHNE detector. DAPHNE's Performance on External Validations: Sensitivity, Specificity and Corresponding 95% CIs for Referral Level Output to Detect Referral, PDR and DME, and PDR Level Output to Detect PDR on the China Dataset We chose 3548 eyes with baseline and follow-up images. The measure of changes on their disease levels were calculated. The kappa score is 0.827 when comparing those changes assessed by human graders. Figures 2 and 3 show the detected DR signs across baseline and follow-up images using the DAPHNE lesion detector.

Figure 2.

Figure 3.

The detected results of DR progression changes within one month (between images columns 1 and 2) and two to four months (between images in columns 2 and 3). Each row shows images from one patient.

The detected results of DR progression changes over a five-year period: First and second rows: from normal images (R0) to background retinopathy (R1); last row: from preproliferative retinopathy (R2) to stable treated proliferative retinopathy (R3s). First column: baseline images; second column: follow-up fundus images. The detected results of DR progression changes within one month (between images columns 1 and 2) and two to four months (between images in columns 2 and 3). Each row shows images from one patient. We also carried out an evaluation based on the consistency between the detected features by the software and the DR severity level for the whole image annotated by human graders. If the software detects sufficient features that can be mapped to the same level of DR severity graded by human graders, it is considered as an agreement. The DAPHNE system achieved a weighted kappa score of 0.87 on these external validation sets. The software, however, detected some of the other non-DR lesions and individual artefacts as DR related. This showed that further work is still needed to refine the detection. On the other hand, this will aid flaging up any non-DR pathology.

On External Public Dataset Messidor-2

This consisted of 1748 retinal images from 874 subjects. This dataset was assessed as 100% gradable by our system. Two hundred sixty-four images are Referable DR and 125 DME. Table 2D shows the performance of the algorithm for detecting the different levels of diabetic retinopathy.

Table 2D.

Disease Level	Daphne Predicted Results	Sensitivity	Specificity
Referral vs. Nonreferral	Referral	95.8% (94%–97.42%)	91.32% (86.7%–93.53%)
	PDR	100% (96.5%–100%)	—
	DME	100% (95.8%–100%)	—
PDR vs. Non-PDR	PDR	98.55% (91.3%–99.6%)	86.78% (84.2%–88.8%)

DAPHNE's Performance on External Validations: Sensitivity, Specificity and Corresponding 95% CIs for Referral Level Output to Detect Referral, PDR and DME, and PDR Level Output to Detect PDR on the Messidor-2 Dataset At the high sensitivity operating point of detecting referable DR levels (according to ICDR scales and DME scales), the sensitivity of our system was 97.1% (95% CI: 94.8%–98.6%) and specificity was 88.3% (95% CI: 86.5%–90.0%), with a negative predictive value of 99.1% (95% CI: 98.5%–99.5%), and positive predictive value of 69.8% (95% CI: 66.6%–72.8%). At the high specificity operating point, the sensitivity of our system was 89.2% (95% CI: 85.7%–92.2%), and specificity was 95.6% (95% CI: 94.4%–96.6%), with a negative predictive value of 97.0% (95% CI: 96.0%–97.7%) and a positive predictive value of 85.0% (95% CI: 81.5%–87.9%). Sensitivity, based on referral level prediction when proliferative diabetic retinopathy (PDR) cases are in the referable category, was 100% (95% CI: 96.5%–100%, which means no cases of PDR cases were missed), and sensitivity for detecting DME was also 100% using the referral threshold (95% CI: 95.8%–100%; i.e., no cases of DME were missed). The AUC for detecting the referral DR was 0.983 (95% CI: 0.969%–0.993%). On the other hand, if a threshold is set to separate PDR and non-PDR, there were cases of PDR classified as severe DR but in the referable category as shown in Table 2.

On Internal Validation Dataset

As a part of the internal validation on 7024 images from the Kaggle dataset, two operating points were selected for the detection of referable DR levels (according to the ICDR scales) for fully gradable images; one for high sensitivity and another for high specificity. At the high sensitivity operating point, the sensitivity of our system was 94.2% (95% CI: 93.7%–94.7%), and specificity was 76.5% (95% CI: 76.1%–76.9%), a negative predictive value of 98.2% (95% CI: 98.1%–98.4%), and positive predictive value of 48.7% (95% CI: 48.2%–49.1%). At the high-specificity operating point, the sensitivity of our system was 80.1% (95% CI: 80.0%–81.5%) and specificity was 92.6% (95% CI: 92.4%–92.9%), with a negative predictive value of 95.3% (95% CI: 95.1%–95.5%) and positive predictive value of 72.1% (95% CI: 71.4%–72.8%). The AUC for detected referable DR level was 0.985 (95% CI: 0.969%–0.993%). For grading against the ICDR scales, our proposed system obtained a quadratic weighted kappa score of 0.857, which is slightly lower than the winner of the DR competition but higher than other published methods.

Evidence-based Visualization

The current DAPHNE lesion detectors can visualize lesions such as MAs, hemorrhage, exudate, drusen, IRMA, and new vessels explicitly when the prediction probability confidence of these is high. A general category of “abnormal region” is used in the visualization when explicit labeling of the region is of low probability but high as abnormal. Once the CNN outputs any grade indicating the image is not normal (not 0 in ICDRS or not R0 in NSC), the lesion detectors will visualize at least one of the above pathologic regions that are within the definition of the particular grade. On the other hand, if CNN grades an image as normal, the lesion detectors will search for any missed lesion/abnormal regions using higher prediction probability values.

Discussion

This study demonstrates that all components for the DAPHNE software, such as image quality assessment, image grading, lesion detection, and visualization performed well. These results show good generalizability of the DAPHNE software results to detect gradable quality images, the relevant abnormalities to identify referable DR and DME and to visualize the relevant changes that happened over time. The study was carried out on large sets of images captured from a diverse population with varying camera types and settings. Therefore we believe that so far the evaluation of the DAPHNE software showed sufficiently promising results for it to be useful in the intended stage in the DR screening and imaging analysis care pathway. In many DR screening programs, human graders do not have access to any other information on the patient but the fundus images they read. Therefore the way the algorithm learns and generates the results needs to mimic the ground truth based on how human graders read the images and come to the conclusion of referral being required. Our purpose was to determine whether the algorithm can learn to perform at an acceptable level of reading fundus images to safely determine quality of the images and then subsequently place patients in the correct referral pathway. Gradeability of the images can determine the quality of the program and so first of all, DAPHNE looked into any significant impact on software performance when images are from different cameras, with either dilated, or undilated eyes. Testing the data collected from three nations with variation of these factors showed that, as long as the quality assessment is in place in the workflow, the performance of the algorithm is consistent. This is in line with other studies,, where AI was evaluated in coordination with either assessing the quality and protocol adherence of images, or data imaged with mydriasis through a high-quality imaging platform. Moreover, an effective algorithm should learn about the true pattern across a large set of images, even if there may be a certain level of noise or variation in some individual samples. Because the DAPHNE software's agreement with human grading was above 84% in all validation datasets, it shows potential for further testing with regard to how it might be incorporated into clinical practice. The software is very sensitive to sight-threatening disease. DAPHNE copes well across different datasets with a varying proportion of normal versus abnormal cases. When patients do not have DR or DME but do have another pathology, the lesion detectors in DAPHNE are able to recognize them as having abnormal regions. DAPHNE thus has the ability to identify the existence of many common ocular comorbidities in eyes without diabetic retinopathy. The software, however, still needs to be refined to differentiate DR from other visually similar diseases or images with abnormal regions. The software also shows an ability to monitor DR progression changes between baseline and follow-up images. Because the changes of condition can be measured quantitatively, this progression monitoring may potentially benefit patient care management. Furthermore, it might potentially assist with decision making for optimal screening intervals for patients with diabetes in varied populations. This automated interpretation addresses only one of the several challenges involved in implementing a successful screening program. This work, however, is not trying to redesign any screening program but rather just to focus on how an automated software system may assist the analysis of the images produced within a functioning screening program. Further work is required to understand a holistic interaction and integration of an AI software into clinical workflows within a given health care system.

Table 2B.

Disease Level	Daphne Predicted Results	Sensitivity	Specificity
Referral vs. Non- Referral	Referral	97.1% (95.1%–97.25%)	90.33% (85.71%–92.17%)
	PDR	100% (94.5%–100%)	—
	DME	100% (94.5%–100%)	—
PDR vs. Non-PDR	PDR	98.23% (93.3%–99.6%)	83.78% (82.12%–88.87%)

The ground truth of DME was obtained from eye clinic, to assess the detection of DME markers by the DAPHNE detector.

18 in total

1. Grading diabetic retinopathy from stereoscopic color fundus photographs--an extension of the modified Airlie House classification. ETDRS report number 10. Early Treatment Diabetic Retinopathy Study Research Group.

Authors:
Journal: Ophthalmology Date: 1991-05 Impact factor: 12.079

2. Automated analysis of retinal images for detection of referable diabetic retinopathy.

Authors: Michael D Abràmoff; James C Folk; Dennis P Han; Jonathan D Walker; David F Williams; Stephen R Russell; Pascale Massin; Beatrice Cochener; Philippe Gain; Li Tang; Mathieu Lamard; Daniela C Moga; Gwénolé Quellec; Meindert Niemeijer
Journal: JAMA Ophthalmol Date: 2013-03 Impact factor: 7.389

3. Optimal wavelet transform for the detection of microaneurysms in retina photographs.

Authors: Gwénolé Quellec; Mathieu Lamard; Pierre Marie Josselin; Guy Cazuguel; Béatrice Cochener; Christian Roux
Journal: IEEE Trans Med Imaging Date: 2008-09 Impact factor: 10.048

4. Diagnostic accuracy of diabetic retinopathy grading by an artificial intelligence-enabled algorithm compared with a human standard for wide-field true-colour confocal scanning and standard digital retinal images.

Authors: Abraham Olvera-Barrios; Tjebo Fc Heeren; Konstantinos Balaskas; Ryan Chambers; Louis Bolter; Catherine Egan; Adnan Tufail; John Anderson
Journal: Br J Ophthalmol Date: 2020-05-06 Impact factor: 4.638

5. Improved Automated Detection of Diabetic Retinopathy on a Publicly Available Dataset Through Integration of Deep Learning.

Authors: Michael David Abràmoff; Yiyue Lou; Ali Erginay; Warren Clarida; Ryan Amelon; James C Folk; Meindert Niemeijer
Journal: Invest Ophthalmol Vis Sci Date: 2016-10-01 Impact factor: 4.799

6. Prevalence and causes of blindness and diabetic retinopathy in Southern Saudi Arabia.

Authors: Saad Hajar; Ali Al Hazmi; Mustafa Wasli; Ahmed Mousa; Mansour Rabiu
Journal: Saudi Med J Date: 2015-04 Impact factor: 1.484

7. Prevalence of diabetic retinopathy among 13473 patients with diabetes mellitus in China: a cross-sectional epidemiological survey in six provinces.

Authors: Yan Liu; Yifan Song; Liyuan Tao; Weiqiang Qiu; Huibin Lv; Xiaodan Jiang; Mingzhou Zhang; Xuemin Li
Journal: BMJ Open Date: 2017-01-09 Impact factor: 2.692

Review 8. The English National Screening Programme for diabetic retinopathy 2003-2016.

Authors: Peter H Scanlon
Journal: Acta Diabetol Date: 2017-02-22 Impact factor: 4.280

9. Pivotal trial of an autonomous AI-based diagnostic system for detection of diabetic retinopathy in primary care offices.

Authors: Michael D Abràmoff; Philip T Lavin; Michele Birch; Nilay Shah; James C Folk
Journal: NPJ Digit Med Date: 2018-08-28

10. The Nakuru eye disease cohort study: methodology & rationale.

Authors: Andrew Bastawrous; Wanjiku Mathenge; Tunde Peto; Helen A Weiss; Hillary Rono; Allen Foster; Matthew Burton; Hannah Kuper
Journal: BMC Ophthalmol Date: 2014-05-01 Impact factor: 2.209

1 in total

1. Systematic Bibliometric and Visualized Analysis of Research Hotspots and Trends on the Application of Artificial Intelligence in Ophthalmic Disease Diagnosis.

Authors: Junqiang Zhao; Yi Lu; Shaojun Zhu; Keran Li; Qin Jiang; Weihua Yang
Journal: Front Pharmacol Date: 2022-06-08 Impact factor: 5.988

1 in total