Literature DB >> 32794618

THEIA™ development, and testing of artificial intelligence-based primary triage of diabetic retinopathy screening images in New Zealand.

E Vaghefi^1,2, S Yang^2,3, L Xie², S Hill⁴, O Schmiedel⁵, R Murphy⁶, D Squirrell^1,4.

Abstract

AIM: To develop and evaluate an artificial intelligence triage system with high sensitivity for detecting referable diabetic retinopathy and maculopathy, while maintaining high specificity for non-referable disease, for clinical implementation within the New Zealand national diabetic retinopathy screening programme.
METHODS: The THEIA™ artificial intelligence system for retinopathy and maculopathy screening, was developed at Toku Eyes using routinely collected retinal screening datasets from two of the largest district health boards in Auckland, New Zealand: the Auckland District Health Board and the Counties Manukau District Health Board. All retinal images from consecutive individuals receiving retinal screening between January 2009 and December 2018 were used. Images were labelled as non-sight-threatening, potentially referable or sight-threatening for New Zealand implementation, or as referable (potentially referable + sight-threatening)/non-referable (non-sight-threatening) for global comparison.
RESULTS: Data from 32 354 unique people with diabetes (63 843 when including multiple visits) were available, of which 95-97%, 0.9-2.4% and 1.1-3.1% were categorized as non-sight-threatening, potentially referable and sight-threatening, respectively. Using the referable/non-referable categories, THEIA achieved overall sensitivity of 94% (95% CI 92-95) in the Auckland District Health Board and 95% (95% CI 92-97) in the Counties Manukau District Health Board datasets, while preserving specificity of 63% (95% CI 62-64) for the Auckland District Health Board and 61% (95% CI 60-62) for the Counties Manukau District Health Board. Implementing THEIA into a New Zealand national diabetic screening programme could significantly reduce the manual grading load.
CONCLUSION: THEIA, an artificial intelligence tool to assist in clinical decision-making, tailored to the needs of the New Zealand national diabetic screening programme, delivered high sensitivity for detecting referable retinopathy within the multi-ethnic New Zealand population with diabetes.

Entities: Chemical

Mesh：

Year: 2020 PMID： 32794618 PMCID： PMC8048953 DOI： 10.1111/dme.14386

Source DB: PubMed Journal: Diabet Med ISSN： 0742-3071 Impact factor: 4.359

We have developed and validated an artificial intelligence (AI) system to detect referable diabetic retinopathy and maculopathy, using independent screening datasets covering 25% of the New Zealand population. We have evaluated the clinical load‐saving capacity of this AI system within the New Zealand national diabetic retinopathy screening programme. This system provides an automated decision rule to ensure rapid, accurate classification of the large proportion of normal images from the few with abnormal features for prompt, accurate clinical grading, but is not designed to replicate a screening programme.

INTRODUCTION

Artificial intelligence (AI) has progressed rapidly during the past decade with the advent of deep learning. The field of ophthalmology has been an early adopter of these technologies , , , and possibly the most promising application of AI in ophthalmology is as a screening tool for detecting diabetic retinopathy (DR). The accuracy of AI‐based models for detecting DR has been demonstrated in many previous studies. The results from a landmark prospective study evaluating the performance of a DR diagnostic system in a primary care setting represented an important clinical milestone, as these results were used to form the basis of the first fully autonomous AI‐based system approved by the US Food and Drug Administration. However, there are still many clinical and technical challenges with regard to clinical implementation. Firstly, there is a problem with generalizability as this and many other studies have used training datasets from relatively homogenous populations. Secondly, many studies focus on training algorithms to simply distinguish between non‐referable and referable DR and do not distinguish between retinopathy and maculopathy. Thirdly, there is a concern about the ‘black box’ phenomenon of many AI systems, a term which refers to the inability of the user to determine how the AI derived its output. Arguably, it is difficult to ask people with diabetes, clinicians and regulators to trust a system when its workings and thus its inherent biases are unknown. , Finally, besides the technical challenges, there are also a number of legal and ethical issues that need to be addressed before AI is implemented within healthcare environments. In addressing these issues, we have firstly combined historic data of the two largest New Zealand (NZ) district health boards, covering 22% of the country’s population and ensuring that our AI (named THEIA) was trained and tested on data that are representative of the NZ general population. Secondly, we have created THEIA to be compatible with NZ Ministry of Health standards for diabetic eye screening (Table S1), and, thirdly, we have designed THEIA so that it generates ‘attention maps’ for its grading decisions, allowing clinicians to examine the basis of the AI‐generated grades. With these challenges in mind, we designed THEIA as a primary triage tool, in effect, allowing the New Zealand screening programme to transition to a semi‐automated model of care, with THEIA being used to safely and rapidly triage two groups: (1) people with diabetes with none or minimal disease who could be issued their results at the time of screening and (2) those with sight‐threatening disease whose images needed urgent review by the human grading team. This then left a third group of images, comprising people who had mild to moderate disease which may require onward referral but did not require urgent review by the secondary human grading team (Figure 1). Thus, the anticipated position of THEIA, relative to the current DR screening pathway, was as an initial triage tool, designed to reduce the number of images being reviewed manually.

Figure 1

Flow chart of outputs generated by THEIA. PR, potentially referable; NST, non‐sight‐threatening; ST, sight‐threatening

Flow chart of outputs generated by THEIA. PR, potentially referable; NST, non‐sight‐threatening; ST, sight‐threatening Specifically, our aims were to: (1) train AI (THEIA) to detect referable DR and maculopathy then validate it using two independent DR screening datasets—the Auckland District Health Board (ADHB) and Counties Manukau District Health Board (CMDHB) dataset—each with different cameras, disease profiles and patient demographics, and (2) to determine the diagnostic performance of THEIA for the automated detection of non‐referable, and referable diabetic eye disease.

METHODS

Study population

This was a retrospective study using all consecutive retinal screening images acquired as part of routine diabetes photo‐screening between January 2009 and December 2018, from two district health boards within the Auckland region: ADHB and CMDHB. These two organizations provide public‐funded services to approximately 1 108 850 people (2018/2019 data), which represents approximately 68% of Auckland’s population or 22% of the entire New Zealand population.

Ethics

The study protocol was approved by the Health and Disability Ethics Committee at New Zealand Health, and the Disability Ethics Committees at the ADHB (Eye‐AI 18/CEN/124 and Eye‐AI A+8335) and at the CMDHB (Eye‐AI 947). Appropriate protocols were embedded within THEIA to comply with both Māori data sovereignty and local data privacy regulations. , , ,

Study design, sample size and power

To address any class imbalance issue of our dataset, the weighted loss function strategy was adopted. , An initial 5000 images, 500 images from each of the five DR severity categories (according to NZ grading standards ) and 500 images from each of the five maculopathy severity categories, were randomly selected for AI training. These 5000 images were then re‐graded by a retinal specialist (D.S.) and, where a discrepancy arose between the original grade, the image was sent for adjudication before a 'final' training grade was issued. These data were then split in a 70%, 15% and 15% ratio for training, validation and testing, respectively. THEIA’s performance was then evaluated on both the ADHB and CMDHB datasets to determine its accuracy, specificity and sensitivity, and generalizability. In NZ, every citizen is assigned a unique National Health Index number. The training and test datasets were separated at the patient level, using this index to avoid data leakage between sets. There were no duplicate images in the datasets used for training or testing.

Data capture and reference standard retinal image grading/classification

The NZ Ministry of Health standard mandates that two fundus images per eye be acquired from people with type 2 diabetes (macular and disc‐centred) and four fundus images are taken per eye for people with type 1 diabetes (as above, plus inferior and superior to the optic disc). Neither of the two validation datasets from the ADHB or CMDHB were manually curated before being presented to THEIA. Both datasets were collected independently and the images were obtained from multiple different models of fundus camera. All images were graded according to the NZ Ministry of Health standard by primary and secondary grading teams at their respective district health board, and audited by the lead ophthalmologist of the respective screening programmes. Based on the outcome of screening, the patient is either directly re‐enrolled into screening, sent to the tertiary grader for adjudication, or referred to the eye clinic directly (Table 1).

Table 1

New Zealand diabetic eye screening standard for primary grading and patient outcome

	Retinopathy	Maculopathy	Patient outcome
Healthy/Non‐sight‐threatening	R0, R1, R2	M0, M1, M2	Re‐enrolled into screening
Potentially referable	R3	M3	Sent for adjudication
Referable (Sight‐threatening)	R4, R5	M4, M5	Referred to the eye clinic

Retinopathy

Maculopathy

Patient outcome

Healthy/Non‐sight‐threatening

R0, R1, R2

M0, M1, M2

Re‐enrolled into screening

Potentially referable

Sent for adjudication

Referable

(Sight‐threatening)

R4, R5

M4, M5

Referred to the eye clinic

New Zealand diabetic eye screening standard for primary grading and patient outcome Referable (Sight‐threatening)

Image processing and development of the THEIA algorithm

The fundus images were first cropped and resized to 800×800 pixel size. These were then enhanced by using a Gaussian blur technique before being passed into the THEIA algorithm. All image pre‐processing steps were fully automated. THEIA comprised a series of AI tools, the first of which included a quality‐check convolutional neural network trained to identify that the image is of the retina and is of sufficient quality to be graded. Ungradable images were automatically identified and excluded from the analysis. The same convolutional neural network also classified whether the image belonged to the left or the right eye and whether the image was centred on the macula or optic nerve. Having passed through these steps, the image was passed onto an ensemble of grading a series of AI tools which were trained to grade retinopathy and maculopathy as separate entities. The resultant grades were combined to produce a grading report 'per patient' (Figure 2). The grading AIs were designed to classify each image into one of three categories (Table 1): non‐sight‐threatening retinopathy; potentially referable disease; and sight‐threatening diabetic eye disease.

Figure 2

Flow chart of image processing pathway through THEIA

Flow chart of image processing pathway through THEIA In designing THEIA, an ensemble of convolutional neural networks was created and trained to find the optimal design. The best‐performing architecture was then selected and went through another hyperparameter optimization process. THEIA was trained for 200 EPOCHS and the area under the receiver‐operating curve was monitored. Our basic premise in designing this primary triage AI system was that no individual with referable disease should be missed. When a human grader is reading a sequence of retinal images they issue a result based on the disease load visible across all the images acquired per eye; thus, on occasions, there will be only minimal disease visible in the individual macular and disc‐centred images but the total load of disease across the two images represents referable disease. In contrast to the way the human grading team approaches a set of images, an AI system is only able to read and issue results one image at a time. Consequently, it is possible that referable disease would be missed by the AI system if the individual images of the sequence had minimal disease but the combined set had a total disease load that was greater than any of the individual images. Thus, to ensure that no disease was missed, an 'add‐up' function was also selectively applied if the images recorded the same retinopathy scores. For example, if two images from the same eye were issued with an R2 grade, it would result in an R3 grade being issued for the eye. Issuing the maculopathy grade was more straightforward, being generated solely from the macula‐centred image. Finally, and in recognition of the fact that, ultimately, it is the combined retinopathy and maculopathy grade across the two eyes that dictates the overall patient outcome, THEIA was designed to produce a per‐patient outcome: non‐sight‐threatening, potentially referable, or sight‐threatening (Figure 2). In NZ, we have incorporated optical coherence tomography (OCT) into our community screening pathways; however, in the process of screening, small flecks of exudate can be easily overlooked, and consequently in those cases when it would most useful to have an OCT image, it is often not performed. To avoid this scenario, THEIA has been designed to alert the technician that sight‐threatening maculopathy may be present and the technician is then invited to acquire an OCT whenever the AI detects that potentially sight‐threatening maculopathy. This step was designed to ensure that an OCT is always available to the human grading team when the images are being manually graded.

Assessing performance of the THEIA algorithm

As THEIA generates three possible outputs, namely, non‐sight‐threatening, potentially referable and sight‐threatening disease, its performance is best described by way of 3×3 confusion matrices for both district health board datasets. To disclose the full performance of THEIA, these data are presented in Tables S1 to S9. However, the traditional metrics for describing the performance of screening methods are the sensitivity, specificity, negative predictive value (NPV) and positive predictive value (PPV) of the system to detect referable diabetic eye disease (combining both maculopathy and retinopathy together as a single entity) per patient. In order to create the bimodal outcome model needed to calculate these data, the outcomes of all images issued with a potentially referable (retinopathy and/or maculopathy) grade were combined with those issued with a sight‐threatening (retinopathy and/or maculopathy) grade. This created two groups for the bimodal data analysis: non‐referable disease (non‐sight‐threatening) and referable disease (potentially referable + sight‐threatening). It should be noted that NPV and PPV depend on the prevalence of the diseased and healthy cases, but since the prevalence of referable disease (potentially referable + sight‐threatening) was approximately 10% in our overall dataset, we do not anticipate this to be a statistical issue.

THEIA clinical integration plan

THEIA’s typical work cycle was <10 s per eye. Our proposed clinical implementation of THEIA into the NZ diabetic screening programme is shown in Figure 3.

Figure 3

Proposed implementation of THEIA within the New Zealand diabetic screening programme, compared with the current model of care. AI, artificial intelligence; OCT, optical coherence tomography; PR, potentially referable; NST, non‐sight‐threatening; ST, sight‐threatening

RESULTS

After excluding those images in which the either the maculopathy or retinopathy grade clinical reference standard was missing, the ADHB dataset included 160 585 images from 75 469 eyes, derived from 40 160 visits of unselected consecutive people with diabetes between 2009 and 2018, while the CMDHB dataset included 62 192 images from 37 147 eyes, derived from 23 683 visits of unselected consecutive people with diabetes between 2012 and 2018. In this project, only those eyes that had retinopathy and maculopathy grade results were included. Hence, there were 780 people with diabetes where the reference standard grading was missing within the ADHB dataset and 7858 people with diabetes where the reference standard grading was missing within the CMDHB dataset. The first component of the THEIA ensemble was designed to monitor the quality of the input image, and only when approved was an image allowed to process through the rest of the ensemble. During this process, images from just 2482 people with diabetes (4%) were rejected, the majority because of media opacities (cataract or corneal disease) or lid‐generated artefacts (Fig. S1). This is very close to the current NZ clinical rates (image rejection rate based on CMDHB data: 2019/2020: 4.02%; 2018/2019: 5.66%). The baseline demographics of the two analysed district health board datasets are shown in Table 2. The retinopathy and maculopathy grades of the individuals whose photographs were included in this study are shown in Table 3. As is common with all DR screening programmes, over 50% (>30 000) of individuals in the CMDHB and ADHB datasets had no retinopathy. Fewer than 5% of individuals in the study population had sight‐threatening DR. The two district health boards had slightly different distributions of disease severity, with a greater number of people with diabetes in the CMDHB population graded as having severe disease. A paired t‐test on the ADHB and CMDHB dataset’s disease distribution was performed and it was found that they were not significantly different (P = 0.15).

Table 2

Demographics of the Auckland District Health Board and Counties Manukau District Health Board datasets

	ADHB	CMDHB	Combined
Number of images	160 585	62,192	222,777
Number of eyes	75 469	37 147	112 616
Number of visits	40 160	23 683	63 843
Number of unique cases	18 070	14 284	32 354
Mean (range) age, years	56 (7–98)	61 (7–104)	59 (7–104)
Women, %	46	48	47
Ethnicity, %
NZ European	31	17	28
Māori/Cook Island Māori	12	20	16
Pacific	19	30	25
Other	36	28	28
Missing	2	5	3
Mean ± sd diabetes duration, years	9.1 ± 11.3	8.2 ± 7.3	8.8 ± 8.6

It should be noted that some images included missing labels (e.g. the reference standard for the maculopathy result) and were therefore not included in the analysis. Hence the number of eyes will be lower than expected (assuming that most people with diabetes will have two eyes in the screening programme).

Table 3

Distribution of retinopathy and maculopathy grades (per eye) for Auckland District Health Board and Counties Manukau District Health Board datasets

ADHB
Retinopathy, MoH grade	Allocation, % (n)	THEIA grade, % (n)	Maculopathy, MoH grade	Allocations, % (n)	THEIA grade, % (n)
R0	53 (84 427)	Non‐sight‐threatening	M0	68 (10 8633)	Non‐sight‐threatening
R1	33 (53 213)	97.1 (155 825)	M1	13 (20 204)	96.5 (154865)
R2	11 (18 185)		M2	16 (25 999)
R3	2 (2955)	Potentially referable	M3	2 (2374)	Potentially referable
		1.8 (2995)			1.5 (2374)
R4	1 (1515)	Sight‐threatening	M4	2 (3064)	Sight‐threatening
R5	0.2 (250)	1.1 (1765)	M5	0.1 (141)	2.0 (3205)

Abbreviations: ADHB, Auckland District Health Board; CMDHB, Counties Manukau District Health Board. Numbers in brackets are indicative of the number of images in each category: R0, no retinopathy; R1, minimal non‐proliferative retinopathy; R2, mild non‐proliferative retinopathy; R3, moderate non‐proliferative retinopathy; R4, severe non‐proliferative retinopathy; R5, proliferative retinopathy; M0, no maculopathy; M1, at least one microaneurysm beyond 1 disc diameter of the fovea but within the macular arcades; M2 at least one microaneurysm within 1 disc diameter of the fovea; M3, exudate beyond 1 disc diameter of the fovea, but within the macular arcades; M4, exudate within 1 disc diameter of the fovea; M5, similar to M4, but with vision loss.

Demographics of the Auckland District Health Board and Counties Manukau District Health Board datasets It should be noted that some images included missing labels (e.g. the reference standard for the maculopathy result) and were therefore not included in the analysis. Hence the number of eyes will be lower than expected (assuming that most people with diabetes will have two eyes in the screening programme). Distribution of retinopathy and maculopathy grades (per eye) for Auckland District Health Board and Counties Manukau District Health Board datasets Abbreviations: ADHB, Auckland District Health Board; CMDHB, Counties Manukau District Health Board. Numbers in brackets are indicative of the number of images in each category: R0, no retinopathy; R1, minimal non‐proliferative retinopathy; R2, mild non‐proliferative retinopathy; R3, moderate non‐proliferative retinopathy; R4, severe non‐proliferative retinopathy; R5, proliferative retinopathy; M0, no maculopathy; M1, at least one microaneurysm beyond 1 disc diameter of the fovea but within the macular arcades; M2 at least one microaneurysm within 1 disc diameter of the fovea; M3, exudate beyond 1 disc diameter of the fovea, but within the macular arcades; M4, exudate within 1 disc diameter of the fovea; M5, similar to M4, but with vision loss.

Assessing the performance of THEIA per patient evaluated

In both validation datasets, the performance of THEIA to detect sight‐threatening DR per patient was evaluated, based on retinopathy and maculopathy being treated as a single entity, compared to the reference standard (the result issued by the grading team at the time of screening). The calculated sensitivity, specificity and NPV for THEIA’s performance to detect sight‐threatening disease was 94%, 63% and 99.4%, respectively, in the ADHB validation dataset, and 95%, 61% and 99.4%, respectively, in the CMDHB validation dataset. Very similar results were observed if the data were analysed per eye rather than per patient. The full set of results are provided in Figs S2 and S3 and Tables S2 to S9.

Grading load‐saving estimate

A total of 65–70% of images screened with THEIA were issued with a non‐sight‐threatening grade at the time screening was performed. Arguably, the very high NPV means that images issued with a non‐sight‐threatening grade would no longer need review by the grading team. Hence, THEIA would reduce the image grading workload by approximately 65%.

Audit of discordant grading

Using retrospective data for those designing large healthcare AI systems requires use of very large datasets. This approach ensures that, although discordant images will exist by random chance, this difference will be equally distributed across all outcomes. In order to fully understand the performance of THEIA, we undertook an audit of those cases where the result generated by THEIA and the reference standard differed (Fig. S4). These errors fell into two groups: cases where THEIA genuinely produced the incorrect result and cases where THEIA generated the correct result and the original grading was incorrect. As expected by the inherent design of the system, the majority of instances where THEIA genuinely produced the incorrect result were false‐positive errors. Of these, the most frequent in the maculopathy datasets were the foveal reflex and drusen being mislabelled as exudate and the most frequent false‐positive errors in the retinopathy datasets were hypertensive flame haemorrhages being interpreted as blots, and small patches of resolving blot haemorrhage being interpreted as intraretinal microvascular abnormalities. The rate of false‐negative errors was very low (0.2% and 0.5% in the two datasets); the commonest example being a small isolated patch of intraretinal microvascular abnormalities in the absence of any other DR being mislabelled by THEIA as non‐sight‐threatening. However, there were examples in both the maculopathy and retinopathy datasets where the original grading (reference standard) was wrong and THEIA was correct. The types of inconsistencies were very similar to those listed above.

DISCUSSION

THEIA recorded a high sensitivity for detecting sight‐threatening retinopathy of 95% and a very high NPV of 99.4% in the CMDHB and AHDB validation datasets, respectively. As THEIA categorized 65–70% of the images as non‐sight‐threatening disease, the time saved by removing this volume of cases from the human grading team’s workload represents a significant cost saving for the programme. It would also ensure the consistency of grading nationally. While direct comparison of different AI systems' performance is challenging as the data distribution and disease severity will be different between datasets used for training/validation, providing such comparison of the differing AI systems will give the reader a frame of reference by which to judge THEIA’s performance. In comparison to recently published data, by design, THEIA exceeded EyeArt™ V2.0 in sensitivity (91%) and NPV (98%), but not in specificity (91%) and PPV (73%). Similarly, THEIA exceeded the Pegasus™ system in sensitivity (84–93%) but not in specificity (85–94%). The difference in the reported specificities and PPVs between THEIA, EyeArt™ and Pegasus™ is explained by the differing intended role for AI in DR screening. In contrast to other DR AI developers, we designed THEIA to maximize NPV so that it could safely perform primary triage, rapidly identifying the majority of people with diabetes with non‐referable eye disease and thus reducing the burden on the human grading team. However, in order to minimize missed disease, we accepted a relatively low specificity, therefore, THEIA generates a number of false‐positives. Whilst acknowledging that false‐positives in any screening programme can cause significant anxiety to patients, THEIA was not designed to be a fully automated, stand‐alone, DR screening platform. Instead, it has been designed to work within an established screening programme, allowing the NZ DR screening programme to transition to a semi‐automated model, much as described by Xie et al. recently. Within this model, THEIA is designed to perform the role of primary triage tool rapidly identifying those individuals with no disease who could be issued their results at the time of screening and those with sight‐threatening disease whose images need urgent review by the human grading team. Whereas currently all patients have to wait a number of weeks for their screening result to be issued, implementing THEIA into the NZ screening programme will enable the majority of patients to be safely issued with a 'no significant disease present' result on the day of screening (Figure 3). The remainder will be passed to the grading team for a result to be issued. As the grading team no longer has to grade all the 'normal' images, individuals will have their results issued more rapidly than was previously possible. Moreover, the design of THEIA means that all individuals who may have sight‐threatening referable disease are automatically flagged, allowing these images to be reviewed urgently by the human grading team. Ultimately, only a small number of patients will need ongoing referral to an ophthalmology clinic for review and no patient will be inconvenienced by being asked to attend for ophthalmology review on the basis of THEIA generating a false‐positive result. Internationally, there are very few trained neural networks that have (1) used publicly available datasets for training, (2) externally validated their AI, and (3) issued a grade for both retinopathy and maculopathy. , , , , Verbraak et al. recently published results on the performance of the IDX–DR‐EU 2.1, where they assessed multiple images grading the eye disease into retinopathy and maculopathy, and then collapsed the grades down into a binary 'vision‐threatening' versus 'non‐vision‐threatening' matrix. The reported performance of this device is very similar to THEIA. A confounding factor which makes direct comparisons difficult between AI systems is that many use highly curated data and this ultimately risks reducing both the utility and impact of the AI system. , , , , , , , , IDX‐DR‐EU2.1 was unable to issue a grade in approximately 20% of cases, compared to THEIA which was able to analyse 96% of all people with diabetes in our uncurated datasets. Crucially, THEIA also includes a broad age range of people with diabetes (aged 7–104 years) of various ethnicities. Where demographic characteristics have been reported, many developers have excluded individuals who were aged <40 years and this risks the AI being unable to deal with the highly reflective internal limiting membrane within the macular, a feature which is very prominent in retinal images obtained from young individuals. To date, most published DR screening AI systems have only graded a single macular‐centred image. , , , , , , , However, it is recognized that a single image cannot be relied on for accurate grading and an AI system that is trained to read only a single image per eye risks under‐grading the peripheral retinopathy. In contrast, THEIA reads all images from any given patient. It then collates the combined maculopathy and retinopathy grades to confer the ultimate patient outcome. As previously outlined, THEIA also uses a novel post‐image analysis processing 'add‐up function' to ensure that no disease is missed. Whilst this is approach will reduce the likelihood of referable disease being missed, it will reduce the specificity of the system by generating a number of false‐positives. However, as stated above, we are confident that a high false‐positive rate in an AI system that is designed to perform a specific role within an established screening programme will not ultimately disadvantage the patient. Finally, to address the 'black box' issue of other AI systems, THEIA generates ‘attention maps’ for retinopathy and maculopathy which can be viewed by a clinician post‐diagnosis (Fig. S3). While there has been significant interest in developing DR grading AI systems, very few have been trained to specifically grade diabetic maculopathy as a separate entity; , , , , , , this is despite diabetic maculopathy being the commonest reason for ophthalmology referral. One of the reasons for this anomaly lies in the intrinsically challenging nature of detecting and grading maculopathy with surrogate markers of tissue oedema, macular exudates and perifoveal micro‐aneurysms, being used to identify disease. To improve the detection of suspected maculopathy, if THEIA registers that sight‐threatening maculopathy may be present, it prompts the technician taking the photographs to perform an OCT scan before completing the imaging sequence. A dilemma that is inherent in all healthcare AI systems that use large prospective datasets for training and internal validation is the possibility that the original results issued at the timing of grading are incorrect. Such errors will introduce systemic bias into the system which is one of the reasons why AI systems trained on small datasets often fail to perform when tested on external datasets. Whilst the presence of a similar bias in THEIA will only be revealed by a prospective double‐blind clinical trial, we believe that the results of the present study remain valid for three reasons. Firstly, the reference standards used for the original set of training images were carefully re‐graded by the lead ophthalmologist. Secondly, to mitigate against the possibility that one of the district health board's grading processes had a consistent bias, we used two very large datasets derived from district health boards that have different grading teams. Thirdly, during the development of THEIA, all discordant results between the AI and the reference standard were constantly audited and the AI retrained accordingly. Nevertheless, despite all these safeguards, human error will still occur and thus one would still expect there to be some discordance between the final results generated by the AI system and the reference standard. The results of our internal audit suggest that the rate of these errors was low at <0.5%. Whilst many of the AI systems that have been developed to date have been designed to be fully automated, stand‐alone screening systems, we have taken a different approach and have instead designed a bespoke AI system that will allow the NZ diabetic eye screening programme to transition to a semi‐automated system, with the AI system playing a specific role as a rapid primary triage tool. Xie et al. have recently published a model‐based cost‐minimization analysis of three DR screening models in Singapore, comparing the current human‐centric screening model to both a semi‐automated model 'backed up' by secondary human confirmatory diagnoses, and a fully automated model without any human assessment. Their findings appear to vindicate our approach as the authors concluded that the model utilising a semi‐automated AI system was the most cost‐effective, costing ~6% less than the fully automated model, and 20% less than the manual model. The present study has several limitations. Perhaps the most important clinical limitation of THEIA currently is that it cannot identify other common eye diseases, such as glaucomatous optic neuropathy and age‐related macular degeneration. However, from the inception of our screening programme, we have routinely recorded all non‐diabetic eye disease detected during screening. A recent analysis of these data by the authors (manuscript submitted) revealed that, of these, only hypertensive retinopathy and macular degeneration are sufficiently important to justify detection during routine diabetic eye screening. Although, in this current iteration of THEIA, these diseases cannot be identified by name, the 'miss no disease' design of THEIA means that most of the signs of hypertensive retinopathy and most cases of macular degeneration will be picked up and flagged up as potentially referable or sight‐threatening. From the perspective of an AI system, the low number of cases with referable DR is problematic as this creates an imbalanced dataset which can affect the quality of trained AI. The retrospective nature of the study is also an issue, but this approach is necessary in unselected, unbiased, real‐world datasets. Finally, although this first iteration of THEIA has been developed from the two largest and demographically diverse district health boards in NZ, its generalizability remains to be tested in a prospective study. In conclusion, THEIA provides high sensitivity for detecting sight‐threatening retinopathy and a corresponding NPV of 99.4% in primary grading of retinal images within the context of the NZ Ministry of Health guidelines for DR screening. This first iteration of THEIA accurately identifies people with diabetes with non‐referable disease, removing such people with diabetes from the manual grading workload, a group which represents over 65% of the images. As such, it will probably generate significant efficiencies for the national screening programme, and will deliver greater consistency among regions. Although this first iteration of THEIA has been developed for the NZ context, with appropriate tailoring, this system has potential for application in other healthcare systems which have, or intend to develop, a structured DR screening programme.

COMPETING INTERESTS

E.V. and D.S. are co‐founders of Toku Eyes®, which is a start‐up from the University of Auckland, looking into commercialization of this artificial intelligence system (THEIA™) in NZ. The remaining authors have no conflicts of interest to declare. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file.

23 in total

1. Diagnostic Accuracy of a Device for the Automated Detection of Diabetic Retinopathy in a Primary Care Setting.

Authors: Frank D Verbraak; Michael D Abramoff; Gonny C F Bausch; Caroline Klaver; Giel Nijpels; Reinier O Schlingemann; Amber A van der Heijden
Journal: Diabetes Care Date: 2019-02-14 Impact factor: 19.112

2. Towards implementation of AI in New Zealand national diabetic screening program: Cloud-based, robust, and bespoke.

Authors: Li Xie; Song Yang; David Squirrell; Ehsan Vaghefi
Journal: PLoS One Date: 2020-04-10 Impact factor: 3.240

3. Health data research in New Zealand: updating the ethical governance framework.

Authors: Angela Ballantyne; Rochelle Style
Journal: N Z Med J Date: 2017-10-27

4. Grader Variability and the Importance of Reference Standards for Evaluating Machine Learning Models for Diabetic Retinopathy.

Authors: Jonathan Krause; Varun Gulshan; Ehsan Rahimy; Peter Karth; Kasumi Widner; Greg S Corrado; Lily Peng; Dale R Webster
Journal: Ophthalmology Date: 2018-03-13 Impact factor: 12.079

Review 5. Prevalence of diabetic retinopathy in Type 2 diabetes in developing and developed countries.

Authors: L M Ruta; D J Magliano; R Lemesurier; H R Taylor; P Z Zimmet; J E Shaw
Journal: Diabet Med Date: 2013-04 Impact factor: 4.359

6. Performance of a Deep-Learning Algorithm vs Manual Grading for Detecting Diabetic Retinopathy in India.

Authors: Varun Gulshan; Renu P Rajan; Kasumi Widner; Derek Wu; Peter Wubbels; Tyler Rhodes; Kira Whitehouse; Marc Coram; Greg Corrado; Kim Ramasamy; Rajiv Raman; Lily Peng; Dale R Webster
Journal: JAMA Ophthalmol Date: 2019-09-01 Impact factor: 7.389

7. Evaluation of an AI system for the detection of diabetic retinopathy from images captured with a handheld portable fundus camera: the MAILOR AI study.

Authors: T W Rogers; J Gonzalez-Bueno; R Garcia Franco; E Lopez Star; D Méndez Marín; J Vassallo; V C Lansingh; S Trikha; N Jaccard
Journal: Eye (Lond) Date: 2020-05-07 Impact factor: 3.775

8. Pivotal trial of an autonomous AI-based diagnostic system for detection of diabetic retinopathy in primary care offices.

Authors: Michael D Abràmoff; Philip T Lavin; Michele Birch; Nilay Shah; James C Folk
Journal: NPJ Digit Med Date: 2018-08-28

9. Reproduction study using public data of: Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs.

Authors: Mike Voets; Kajsa Møllersen; Lars Ailo Bongo
Journal: PLoS One Date: 2019-06-06 Impact factor: 3.240

10. THEIA™ development, and testing of artificial intelligence-based primary triage of diabetic retinopathy screening images in New Zealand.

Authors: E Vaghefi; S Yang; L Xie; S Hill; O Schmiedel; R Murphy; D Squirrell
Journal: Diabet Med Date: 2020-09-27 Impact factor: 4.359

1 in total

1. THEIA™ development, and testing of artificial intelligence-based primary triage of diabetic retinopathy screening images in New Zealand.

Authors: E Vaghefi; S Yang; L Xie; S Hill; O Schmiedel; R Murphy; D Squirrell
Journal: Diabet Med Date: 2020-09-27 Impact factor: 4.359

1 in total