Literature DB >> 33087340

Artificial intelligence-enabled screening for diabetic retinopathy: a real-world, multicenter and prospective study.

Yifei Zhang¹, Juan Shi¹, Ying Peng¹, Zhiyun Zhao¹, Qidong Zheng², Zilong Wang³, Kun Liu⁴, Shengyin Jiao³, Kexin Qiu³, Ziheng Zhou^3,5, Li Yan⁶, Dong Zhao⁷, Hongwei Jiang⁸, Yuancheng Dai⁹, Benli Su¹⁰, Pei Gu¹¹, Heng Su¹², Qin Wan¹³, Yongde Peng¹⁴, Jianjun Liu¹⁵, Ling Hu¹⁶, Tingyu Ke¹⁷, Lei Chen¹⁸, Fengmei Xu¹⁹, Qijuan Dong²⁰, Demetri Terzopoulos^21,22, Guang Ning¹, Xun Xu⁴, Xiaowei Ding^23,5, Weiqing Wang²⁴.

Abstract

INTRODUCTION: Early screening for diabetic retinopathy (DR) with an efficient and scalable method is highly needed to reduce blindness, due to the growing epidemic of diabetes. The aim of the study was to validate an artificial intelligence-enabled DR screening and to investigate the prevalence of DR in adult patients with diabetes in China. RESEARCH DESIGN AND METHODS: The study was prospectively conducted at 155 diabetes centers in China. A non-mydriatic, macula-centered fundus photograph per eye was collected and graded through a deep learning (DL)-based, five-stage DR classification. Images from a randomly selected one-third of participants were used for the DL algorithm validation.
RESULTS: In total, 47 269 patients (mean (SD) age, 54.29 (11.60) years) were enrolled. 15 805 randomly selected participants were reviewed by a panel of specialists for DL algorithm validation. The DR grading algorithms had a 83.3% (95% CI: 81.9% to 84.6%) sensitivity and a 92.5% (95% CI: 92.1% to 92.9%) specificity to detect referable DR. The five-stage DR classification performance (concordance: 83.0%) is comparable to the interobserver variability of specialists (concordance: 84.3%). The estimated prevalence in patients with diabetes detected by DL algorithm for any DR, referable DR and vision-threatening DR were 28.8% (95% CI: 28.4% to 29.3%), 24.4% (95% CI: 24.0% to 24.8%) and 10.8% (95% CI: 10.5% to 11.1%), respectively. The prevalence was higher in female, elderly, longer diabetes duration and higher glycated hemoglobin groups.
CONCLUSION: This study performed, a nationwide, multicenter, DL-based DR screening and the results indicated the importance and feasibility of DR screening in clinical practice with this system deployed at diabetes centers. TRIAL REGISTRATION NUMBER: NCT04240652. © Author(s) (or their employer(s)) 2020. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ.

Entities: Chemical Disease Species

Keywords: clinical study; diabetic retinopathy; diagnostic techniques and procedures; epidemiology

Mesh：

Year: 2020 PMID： 33087340 PMCID： PMC7580048 DOI： 10.1136/bmjdrc-2020-001596

Source DB: PubMed Journal: BMJ Open Diabetes Res Care ISSN： 2052-4897

Previous studies have indicated a high prevalence of diabetes in China; however, the prevalence of diabetes retinopathy (DR) varied and nationwide program for DR screening is lacking. A potential value of automated deep learning (DL) algorithm in DR screening was indicated; however, its feasibility in clinical application in population with great heterogeneity needs further investigation. We currently validated an artificial intelligence (AI)-enabled DR screening in real-world practice at 155 diabetes centers with comparable performance to human specialists. Our study is a large-scale nationwide DR screening program using data from representative cohorts and offered evidence of DR prevalence in patients with diabetes in China. It provided evidence of efficiency and accuracy in DL-based DR screening in clinical practice through a comprehensive survey. DL-based DR screening at diabetes centers is feasible, and with a high prevalence of DR detected, it may provide an optional solution to this public health problem in the future.

Introduction

According to recent estimates, there were 451 million people with diabetes, aged 18–99 years worldwide in 2017, and the number will increase to 693 million by 2045.1 The diabetes epidemic is worse in China.2 3 Per the 2013 national survey, 10.9% of Chinese adults were estimated to suffer from diabetes, and among them, only 36.5% were aware of this diagnosis and 32.2% were treated.3 The higher prevalence and lower treatment rate of diabetes in China will lead to a higher incidence of diabetes related complications nationwide.4 5 Diabetic retinopathy (DR) is one of the common chronic complications of diabetes, which is the leading cause of blindness, although preventable in the working age group.6–8 Early screening and timely referral can delay its progress and effectively prevent vision loss.9 However, relative to the high prevalence of diabetes in China, the ability to screen for DR is inadequate and a nationwide program for DR screening is scarce. The reasons are multifaceted, including the shortage of eye care specialists, the lack of efficient screening methods and the multidisciplinary process from image acquisition to the diagnosis of DR. In real-world clinical settings, a large portion of patients with diabetes receive their first DR diagnosis during their independent ophthalmologist visits in the symptomatic stage of DR, instead of an earlier diagnosis at diabetes centers or referral visits to ophthalmologists in the non-symptomatic stage.10–12 In addition, strategies for managing DR in China are difficult to reproduce due to regional economic barriers and living habit differences. Therefore, it is essential to establish a standardized system for early DR detection and management that is feasible for the whole country. Deep learning (DL), a form of artificial intelligence (AI), has emerged and shown convincing performance in several areas, including medical science.13–15 A recent study by Ting et al16 has revealed a potential value of automated DL system in DR grading using images from multiethnic cohorts of patients with diabetes, together with several other studies has shown a high sensitivity and specificity in identifying DR (especially referable DR), indicating that the proper use of DL technology in clinical settings may help deliver data-driven analytics for better patient outcome.16–23 However, the evidence to confirm the clinical value of DL for DR screening in large-scale healthcare settings is insufficient and most studies have been performed on high-quality image datasets that could hardly represent the variety of image quality and other operational limitations of real-world DR screening applied at diabetes centers.17 18 There are few reports regarding the practical application of AI in clinic-based DR screening, with patient cohorts of 3049 and 1415, respectively.20 24 Its feasibility and quality in real-world use must be further explored using datasets with larger sample sizes and demographic variations. Therefore, in the present study, we conducted a prospective, nationwide DR screening, using a DL algorithm, with a cohort of 47 269 patients at 155 diabetes centers in China. The operational feasibility and accuracy of the DL algorithm was validated and the prevalence of DR, referable DR (moderate non-proliferative DR (NPDR) or worse), and vision-threatening DR (VTDR, severe NPDR or worse, and/or clinically significant macular edema (CSME)) was reported.

Methods

Population

The National Metabolic Management Center (MMC) is a pilot diabetes care system in China, founded in 2016. It aims at establishing a nationwide, standard and reproducible platform based on advanced medical equipment and Internet of Things technology for the diagnosis and management of diabetes and its complications.25 The Diabetic Retinopathy Screening and Prevention Program is an MMC branch project. Its purpose is to develop an efficient workflow for the early detection, timely follow-up and management of DR, and to establish a referral system for future treatment and long-term follow-up. Between June 2018 and August 2019, a total of 47 269 consecutive patients with diabetes aged 18 years or older from 155 MMCs in China were enrolled in the present study. The involved MMCs were in the hospitals with different levels according to tiered medical service system throughout 26 provinces in China. All the participants were screened for DR by the DL-based system, which labeled the fundus images as DR stage or ungradable due to image quality issues. Fundus images obtained from one-third of randomly selected participants were reviewed offline by a two-stage reading performed by a panel of specialists for the purposes of DL algorithm validation on both DR grading and image quality assessment (figure 1).

Figure 1

Fundus image grading work flow and adjudication. DL, deep learning; DR, diabetic retinopathy.

Fundus image grading work flow and adjudication. DL, deep learning; DR, diabetic retinopathy. All the participants underwent a full medical examination at the local MMCs.

Baseline data collection

The eligible participants were those with a diagnosis of diabetes according to the WHO criteria.26 Detailed inclusion and exclusion criteria are summarized in the online supplemental methods. At baseline, all data (including a standardized questionnaire and comprehensive clinical and laboratory examinations) were collected from each participant through an MMC specialized electronic medical record system.25 Data collection was conducted by trained staff according to a standard protocol. Social demographic characteristics, medical history and lifestyle factors were recorded. Height and body weight were measured by a height-weight scale with participants in light clothes without shoes, and body mass index (BMI) was calculated as the weight in kilograms divided by height in meters squared. Blood pressure and heart rate were measured with electronic blood pressure monitors after at least a 5 min rest in the seated position. Waist circumference was measured on standing participants midway between the lower edge of the costal arch and the upper edge of the iliac crest. The participants were required to undergo a standard steamed bread meal test after an overnight fasting, and blood samples were collected at 0 and 2 hours during the test. Detailed data collection procedures are listed in the online supplemental methods.

Fundus photography acquisition

One standard, non-mydriatic, 45° field of view, macula-centered color and non-stereoscopic retinal fundus image was acquired from each eye of each participant. Various models of fundus cameras were used. Topcon TRC-NW400, MiiS DSC-200, Canon CR-2 PLUS AF, Canon CR-2 AF and Zeiss VISUCAM200 cameras were used in >80% of all the centers (online supplemental table 1). At all the centers, trained technicians took only non-mydriatic images, and no pupillary dilation images were additionally acquired. All the participants’ images were anonymized before grading.

Development of the DL algorithms

VoxelCloud Retina, an automated retinal disease screening system, was used to grade fundus images. The VoxelCloud Retina DR system was developed using DL techniques. Two sets of data were used to train the different deep learning networks that form the final ensemble of DR and diabetic macular edema (DME) severity classification modules. The first dataset comprises 143 626 fundus photographs of 37 231 patients obtained from 2005 to 2015 from a large private retinal image database (online supplemental tables 2 and 3). The second dataset comprises 1184 color fundus images from a public hospital in China, which were assigned a DR severity grade based on consensus from three ophthalmologists (online supplemental table 4). These data were chosen to help improve the model performance on confusing cases that could fall on the boundary between two grades. The DR and DME models are an ensemble of six neural networks (online supplemental figure 1). All the six neural networks use the state-of-the-art Inception-ResNet v2 architecture;27 however, several design differences among them are critical for the effective performance of the model ensemble. Details are presented in the online supplemental methods. In addition to the DR and DME models, the system also includes trained independent lesion models that detect the presence of lesions that contribute to DR grade, including fundus hemorrhage, hard exudates and laser scars. These independent lesion models are used to achieve improvements in the DR prediction performance, based on the DR classification rules listed in online supplemental table 5. All color fundus images are normalized to pixel intensity values between 0 and 1 and are resized to a standard resolution of 800 by 800 pixels before being processed by the system. The system was also tested on various private and public datasets. The testing results on the APTOS 2019 Blindness Detection dataset, a public dataset also collected in a real-world scenario close to that of the present study, is reported in the online supplemental methods (https://www.kaggle.com/c/aptos2019-blindness-detection/overview).

DL-based DR grading

The system specified for DR screening comprises the following three modules:

Quality control Module

The quality control (QC) module evaluates the quality of fundus images before the five-stage DR grading. The quality of fundus images is classified as gradable or ungradable. Those assessed as ungradable (low quality) are not sent for further DR grading. Gradable images are further sorted into excellent and adequate quality, while ungradable images are sorted into insufficient information and non-fundus images. The gradeability criteria in the model training phase were: 1) the image must cover at least 45° of the retinal area with the macula and the optic disc visible; 2) at least 80% of the retinal area must be recognizable and 3) no overexposure, underexposure or blur caused by focusing failure and motion.

DR severity classification module

The DR severity classification module provides each fundus image a five-stage DR severity classification that can be further transferred to multiple binary classifications to meet different demands. The severity classification mainly follows the International Clinical Diabetic Retinopathy (ICDR) severity scale,28 which is developed by the International Council of Ophthalmology and adopted by the American Academy of Ophthalmology.29 Slight modifications were made to adapt to the situation, considering that only a single non-mydriatic fundus image was acquired from each eye, covering the posterior pole, instead of seven mydriatic images covering all four quadrants (online supplemental table 5).30 The patient-level DR grade was based on the worse DR grade of the two eyes. If both eye images of a patient are classified as ungradable, then the patient is classified as ungradable. If only one eye image is classified as gradable, then the patient-level DR grading is based on this eye. If a patient has only one eye image, and it is classified as ungradable, then this patient is classified as ungradable.

DME severity classification module

The DME severity classification module provides subjects each fundus image to a three-stage DME severity estimation that can be further transferred to multiple binary classifications. As DME assessment, which requires retinal thickness information is not possible in non-mydriatic fundus images, the presence of hard exudates is regarded as a presumptive diagnosis of DME (online supplemental table 6). DR was defined as presence of mild NPDR or worse; referable DR, moderate NPDR or worse and VTDR, severe NPDR or worse and/or CSME.

Expert ground truth grading

The ground truth for fundus image diagnosis was provided by a two-stage reading by specialist graders. The grading team was led by the Ophthalmology Center of the Shanghai General Hospital (National Clinical Research Center for Eye Diseases). All graders were ophthalmologists from tertiary hospitals with 3 years or more of work experience. Each grader finished two rounds of training and passed a qualification test following ICDR guidelines. Graders were divided into primary graders and reviewers (senior graders) based on their seniority and performance. The grading was conducted in two stages.

Stage 1

Two primary graders read the fundus image and gave image quality grades and DR grades independently. If the two primary graders reached a consensus on both the image quality and DR grades, the grading of this fundus image ended in stage 1 and the grades served as the ground truth.

Stage 2

A reviewer (senior grader) who could access the assessments of both primary graders’ was added to the grading process if the two primary graders disagreed on either the image quality or DR grades. The reviewer’s sole opinion served as the final grade for such cases (figure 1).

Statistical analysis

Statistical analyses were performed with the use of SPSS V.22.0 (Chicago, Illinois, USA). Data were provided in the form of the mean and SD for continuous variables, or the number with the percentage for categorical variables. The prevalence (95% CIs) of DR, referable DR and VTDR were estimated overall and compared within subgroups of sex, age, categories of diabetes duration and glycated hemoglobin (HbA1c) with the χ2 test. The demographic and clinical characteristics were assessed and compared by sex with the χ2 test for categorical variables, and with the Student’s t-test for continuous variables. Fundus images from one-third of randomly selected participants were used for DL algorithm validation. The ground truth of fundus image diagnosis provided by the expert panel is considered as the reference standard. The accuracy of DR grading, image quality and two-category derivatives (one DR grading or worse) of patients with diabetes were evaluated. The consistency and the accuracy among the DL algorithm and reference standard, the primary graders in the expert panel and the primary grader and reference standard were analyzed; 2×2 tables were generated to analyze the sensitivity, speciﬁcity, negative predictive value and positive predictive value of the DL algorithm in detecting DR, referable DR and severe NPDR or worse, as well as the image quality compared with the reference standard at the individual eye level. Consistency evaluations of the five-stage grading confusion matrix by kappa index and quadratic weighted kappa scores were also calculated. All p values were two-tailed and a p value <0.05 was considered statistically significant.

Results

Clinical characteristics of all the participants

In total, 47 269 participants with diabetes from 155 centers were enrolled in the present study, among which 27 110 (57.4%) were men (table 1 and figure 2). The mean (SD) age of all the participants was 54.29 (11.60) years, the mean diabetes duration was 6.80 (6.71) years and the mean HbA1c was 9.06 (2.27) % or 75.45 (24.85) mmol/mol. Since 97.92% of the participants had type 2 diabetes (1.61% type 1 diabetes, 0.32% gestational diabetes and 0.14% others, totaling 99.99% due to rounding), no further analysis was performed based on the diabetes classification.

Table 1

Clinical characteristics of the study participants

	Total(n=47 269)	By sex stratification		P value
	Total(n=47 269)	Male(n=27 110)	Female(n=20 159)	P value
Age, years	54.29±11.60	52.76±11.58	56.35±11.29	<0.001
High school education and above	17 661 (43.0)	12 301 (51.5)	5360 (31.2)	<0.001
Han Chinese ethnicity	37 457 (96.1)	21 788 (95.9)	15 669 (96.4)	0.010
Family history of diabetes	16 294 (40.1)	9271 (39.3)	7023 (41.3)	<0.001
Duration of diabetes, years	6.80±6.71	6.31±6.50	7.47±6.93	<0.001
History of hypertension	16 266 (39.8)	8835 (37.2)	7431 (43.4)	<0.001
History of dyslipidemia	11 361 (27.9)	6825 (28.8)	4536 (26.5)	<0.001
Body mass index, kg/m²	25.62±3.77	25.77±3.63	25.42±3.95	<0.001
Waist circumference, cm	91.07±10.29	92.73±9.83	88.80±10.49	<0.001
Systolic blood pressure, mm Hg	131.40±19.06	130.36±18.23	132.81±20.04	<0.001
Diastolic blood pressure, mm Hg	77.34±11.34	78.70±11.28	75.51±11.16	<0.001
HbA1c, %	9.06±2.27	9.14±2.32	8.94±2.20	<0.001
HbA1c, mmol/mol	75.45±24.85	76.36±25.34	74.18±24.09	<0.001
Fasting blood glucose, mmol/L	9.34±3.74	9.38±3.80	9.27±3.66	0.005
Postprandial blood glucose, mmol/L	16.16±5.44	16.10±5.33	16.24±5.58	0.016
Triglyceride, mmol/L	2.27±2.35	2.39±2.59	2.09±1.96	<0.001
Total cholesterol, mmol/L	4.83±1.36	4.73±1.36	4.98±1.33	<0.001
LDL cholesterol, mmol/L	2.79±1.00	2.74±0.97	2.87±1.03	<0.001
Serum creatine, μmol/L	66.67±30.43	73.69±31.45	56.88±25.96	<0.001
Urinary acid, μmol/L	317.83±96.50	338.38±96.70	289.15±88.57	<0.001

Data are given as a mean±SD or as a number and percentage in parentheses. Comparisons of mean values and proportions by sex were performed using the Student’s t-test and χ2 tests, respectively.

HbA1c, glycated hemoglobin; LDL, low-density lipoprotein.

Figure 2

Geographic distribution of the 155 metabolic management centers in China involved in this study.

Clinical characteristics of the study participants Data are given as a mean±SD or as a number and percentage in parentheses. Comparisons of mean values and proportions by sex were performed using the Student’s t-test and χ2 tests, respectively. HbA1c, glycated hemoglobin; LDL, low-density lipoprotein. Geographic distribution of the 155 metabolic management centers in China involved in this study.

DL algorithm validation

A total of 31 498 images from one-third (No.=15 805) of the randomly selected participants were used for DL algorithm validation (figure 1). For image quality assessment, from these images, 26 698 (84.8%) images were assessed as gradable by the reference standard (online supplemental table 7). Compared with the reference standard, the QC module had a 63.3% (95% CI: 61.9% to 64.7%) sensitivity and 85.0% (95% CI: 84.6% to 85.4%) specificity, with positive predictive value 43.2% (95% CI: 42.0% to 44.3%) and negative predictive value 92.8% (95% CI: 92.4% to 93.1%), respectively. The interobserver variability (setting one grader as reference standard) between two primary expert graders had a sensitivity of 69.6% (95% CI: 68.4% to 70.8%) and a specificity of 86.8% (95% CI: 86.4% to 87.2%), with positive predictive value 53.1% (95% CI: 51.9% to 54.2%) and negative predictive value 93.0% (95% CI: 92.7% to 93.3%). For DR grading, the concordance between the DL algorithm and reference standard was 83.0% for the five-stage DR grading. The corresponding quadratic weighted kappa were 0.72 (95% CI: 0.72 to 0.72) (online supplemental table 8 and online supplemental figure 2). The DL algorithm had an 83.3% (95% CI: 81.9% to 84.6%) sensitivity and 92.5% (95% CI: 92.1% to 92.9%) specificity for detecting referable DR. The positive and negative predictive values were 61.8% (95% CI: 60.3% to 63.3%) and 97.4% (95% CI: 97.2% to 97.7%), respectively. The Youden index was 75.8%. For two-stage manual grading, the concordance for the five-stage DR grading between the two primary graders, and between the primary graders and the reference standard were 84.3% and 91.0%, respectively. The corresponding quadratic weighted kappa were 0.74 (95% CI: 0.74 to 0.74) and 0.87 (95% CI: 0.87 to 0.87), respectively. The concordance between the DL algorithm and primary grader 1, primary grader 2 or one primary grader (combined two primary graders) were 82.8%, 81.8% and 82.3%, respectively. The corresponding quadratic weighted kappa were 0.66 (95% CI: 0.66 to 0.66), 0.67 (95% CI: 0.67 to 0.67) and 0.67 (95% CI: 0.67 to 0.67), respectively (online supplemental table 8). Confusion matrices of the five-stage DR evaluation between the two primary graders, and between the primary graders and the reference standard are reported in online supplemental figures 3 and 4. Typical examples of false negative and false positive cases of DL QC and the grading module are shown in online supplemental figures 5 and 6.

AI-enabled DR screening

In total, 94 199 fundus images from all the participants were graded by the DL algorithm. Among all the images, 22 404 (23.8%) images were assessed as high quality, 49 566 (52.6%) as medium quality and 22 229 (23.6%) as low quality (ungradable) by the QC module (online supplemental table 9). Thus, a total of 71 970 (76.4%) images from 40 665 (86.0%) participants were finally qualified for DR grading by the DL algorithm (online supplemental tables 9 and 10). The ungradable images were mainly due to small pupil size or the presence of cataracts or other rare eye diseases and camera operation problems (online supplemental table 11).19 21 22 31–33 Participants with ungradable images were recommended to the ophthalmology department for further examination. Among the 40 665 gradable participants, the estimated prevalence of DR was 28.8% (95% CI: 28.4% to 29.3%), referable DR was 24.4% (95% CI: 24.0% to 24.8%) and VTDR was 10.8% (95% CI: 10.5% to 11.1%) (table 2). When analyzed by risk factor stratifications, the estimated prevalence of DR was higher in women 29.6% (95% CI: 28.9% to 30.3%), than in men, 28.3% (95% CI: 27.7% to 28.8%) (p=0.0029). The estimated prevalence of DR increased with age and duration of diabetes (both p values for trend <0.0001). Similar results were found in referable DR and VTDR in the stratification of these risk factors. Furthermore, by the HbA1c stratification, when HbA1c was <10.0% (85.77 mmol/mol), the prevalence of DR and referable DR increased with the raise of HbA1c (both p values for trend <0.0001), but decreased slightly without statistical significance when the HbA1c was 10.0% or higher (both p values >0.05). The prevalence of VTDR increased constantly with the raise of HbA1c (p value for trend <0.0001) (table 2, and online supplemental tables 12 and 13, and online supplemental figure 7).

Table 2

Prevalence of diabetic retinopathy (DR), referable DR and vision-threatening DR (VTDR) in total and among different risk factor stratification

	Prevalence % (95% CI)			No. of patients
	DR	Referable DR	VTDR(including CSME)	No. of patients
Total	28.8 (28.4 to 29.3)	24.4 (24.0 to 24.8)	10.8 (10.5 to 11.1)	40 665
Gender
Male	28.3 (27.7 to 28.8)	23.5 (23.0 to 24.1)	10.0 (9.6 to 10.4)	23 686
Female	29.6 (28.9 to 30.3)	25.5 (24.9 to 26.2)	11.9 (11.4 to 12.4)	16 979
Age groups, years
18–29	18.7 (16.6 to 20.8)	12.7 (10.9 to 14.4)	6.2 (4.9 to 7.5)	1375
30–39	22.9 (21.6 to 24.3)	17.0 (15.8 to 18.2)	6.8 (6.0 to 7.6)	3775
40–49	27.9 (26.9 to 29.0)	22.2 (21.4 to 23.1)	8.9 (8.3 to 9.5)	8650
50–59	30.2 (29.5 to 31.0)	25.6 (24.8 to 26.3)	11.0 (10.5 to 11.5)	14 231
60–69	30.2 (29.3 to 31.1)	27.1 (26.3 to 28.0)	12.5 (11.9 to 13.2)	10 304
≥70	33.5 (31.6 to 35.4)	31.8 (29.9 to 33.7)	18.0 (16.5 to 19.6)	2330
Diabetic duration, years
<5	20.0 (19.4 to 20.6)	15.6 (15.0 to 16.1)	5.9 (5.5 to 6.2)	17 175
5–10	30.8 (29.9 to 31.8)	25.7 (24.7 to 26.7)	10.7 (10.0 to 11.4)	7246
10–15	41.4 (40.1 to 42.7)	35.9 (34.6 to 37.1)	16.8 (15.8 to 17.8)	5403
15–20	49.1 (47.1 to 51.1)	44.3 (42.3 to 46.2)	22.1 (20.4 to 23.7)	2426
≥20	52.5 (50.0 to 54.9)	48.0 (45.5 to 50.4)	10.7 (9.9 to 11.4)	1618
HbA1c, %
<6.5	18.4 (17.1 to 19.6)	14.9 (13.7 to 16.0)	6.4 (5.6 to 7.1)	3778
6.5–6.9	21.2 (19.8 to 22.7)	17.2 (15.8 to 18.5)	7.3 (6.4 to 8.2)	3046
7.0–7.9	26.5 (25.4 to 27.6)	21.7 (20.7 to 22.8)	9.5 (8.7 to 10.2)	6208
8.0–8.9	32.9 (31.7 to 34.2)	27.3 (26.1 to 28.4)	11.2 (10.4 to 12.0)	5685
9.0–9.9	34.0 (32.6 to 35.3)	29.4 (28.2 to 30.7)	12.4 (11.5 to 13.3)	4941
≥10.0	33.3 (32.5 to 34.2)	28.4 (27.6 to 29.2)	13.0 (12.4 to 13.6)	11 338

CSME, clinically significant macular edema; DR, diabetic retinopathy; HbA1c, glycated hemoglobin; VTDR, vision-threatening diabetic retinopathy.

Prevalence of diabetic retinopathy (DR), referable DR and vision-threatening DR (VTDR) in total and among different risk factor stratification CSME, clinically significant macular edema; DR, diabetic retinopathy; HbA1c, glycated hemoglobin; VTDR, vision-threatening diabetic retinopathy. The five-stage DR grading and corresponding DME classification results by the DL algorithm for 40 665 gradable participants are shown in online supplemental table 14. The percentage of ungradable images and the DR grading results based on different types of cameras were listed in online supplemental tables 15 and 16.

Discussion

In this large multicenter, real-world DR screening program, a DL-based AI system was deployed at 155 diabetes centers. Our study demonstrated that, in Chinese adults with diabetes, the estimated prevalence for any DR, referable DR and VTDR was 28.8%, 24.4% and 10.8%, respectively. The high prevalence of DR in various stages indicated the importance and urgency of early detection of DR in China. A DL system with comparable sensitivity and specificity to a panel of specialists enabled the efficient screening for DR at diabetes centers nationwide, and it may provide a solution to this problem. Screening for DR in daily clinical work has not yet been well established at diabetes centers in China due to resource, infrastructure and retinal specialist limitations. Therefore, a comprehensive survey on DR prevalence and its actual burden in the whole country remains unaddressed.6 Highly demanded at every diabetes center is the timely diagnosis and treatment of DR in order to achieve better outcomes over the widest diabetic population regardless of geographic and economic barriers. Epidemiological studies published in the recent 10 years have demonstrated the prevalence of DR in China ranged from 5.4% to 44.8% in patients with diabetes.6–8 34 The variability of DR prevalence in different studies was mainly due to the heterogeneity among the studies, including sample size, study design, clinical characteristics of participants, geographic region and DR classification criteria. A recent meta-analysis, which collected data from 31 community-based studies, showed that the pooled prevalence of any DR in DM participants was 18.45%, for NPDR it was 15.06% and for PDR it was 0.99%.6 However, a single survey that reports the actual prevalence of DR in the whole country is lacking. In the present study, a large multicenter DR screening program, implemented with the aid of AI technology, was conducted in 26 provinces in China. The survey has provided the most up-to-date information on DR characteristics in adults with diabetes and has indicated a high prevalence of DR in China. In addition, through stratification, the crude prevalence of DR was higher in older age groups and, together with the societal aging, it increases the burden to the healthcare system. However, since the prevalence of DR was decreased in subgroups with lower degrees of HbA1c, it may predict a better glycemic control with the lessening of eye complications. Most DL-based DR grading studies have focused on the methodology development and validation using high-quality, curated public datasets.17–19 The implementation of automated DL algorithms for DR screening in real-world practice was rare.20 23 24 35 One example was the large community-based, nationwide DR screening program using DL algorithm in Thailand.23 Another two examples in its use in clinical settings were performed by Gulshan et al and van der Heijden et al, respectively.20 24 The former study involved 3049 patients with diabetes in two eye care clinics in India.20 The results demonstrated 88.9% and 92.1% sensitivities, and 92.2% and 95.2% specificities for the detection of moderate or worse DR in the two clinics, respectively. The latter was performed in the Hoorn diabetes center including 1415 patients which reported a 68.0% sensitivity and 86.0% specificity for detecting referable DR by the IDx-DR device based on ICDR standard, compared with adjudicated reference standard by a panel of three experts; the averaged sensitivity and specificity of the three experts against the adjudicated reference standard were 74.7% and 99.7%, respectively; however, the quality of the fundus images collected was unsatisfactory, which may be due to the implementation of the study in the non-ophthalmic specialized clinical setting.24 These studies offered good examples and indicated the feasibility and validity of DL implementation in real-world clinical work flows. However, in these studies, the DL algorithms were deployed only at individual centers with small or moderate sample size. The wide deployment of DL-based systems to multiple non-ophthalmic specialized medical centers or healthcare systems with different resources remains unclear. Therefore, in the present study we applied a DL algorithm for DR screening at 155 diabetes care centers involving 47 269 patients with diabetes in China. A variety of fundus cameras meeting the base requirements for photograph acquisition were used. The DL algorithms provided a five-stage DR severity grading and DME detection in a real-time manner. The DL system was integrated with various fundus camera models used in MMCs, allowing seamless, push-button image QC and DR staging onsite. None of the deep neural networks in the DL system was trained or fine tuned using any MMC images, demonstrating strong domain transfer and generalization capability, as well as robustness and reproducibility on unseen images. The sensitivity for detecting referable DR was 83.3%, and the specificity was 92.5%, with an Youden index of 75.8%. The performance of the DL system is comparable to the interobserver variability of specialists who are limited in availability (1.1 hour/day on average) and have a long response time (1.5 days on average) in real-world practice. The high specificity (92.5%) performance of the DL system in detecting referable DR may be used as a safe and low-false-alarm autonomous referral decision, that is, all patients classified as referable DR by the algorithms are referred to specialists without further manual review. The algorithms were trained on datasets collected from different populations and scenarios, and they show good generalization characteristics. Furthermore, in order to evaluate the effects of the QC module of the DL system, the quality assessment results obtained by the algorithm and by the reference standard were compared. Although low-quality images were inevitable in non-ophthalmic clinical settings, by enabling AI QC feedback in the image acquisition phase, the proportion of qualified images could reach 92.8% of all the fundus images acquired according to the negative predictive value of the QC model, together with strengthening the training process on technician’s operation skills (ie, distinguishing patients with small pupil or cataracts, and improving image contrast or focus issues), the percentage of low-quality images will reduce to the least extent and lead to more reliable subsequent DR grading in the future work. There are several strengths in the present study. First, it was conducted at 155 diabetes centers in China. The study results were representative because the involved MMCs were in the hospitals with different levels according to tiered medical service system and in the regions with different economic and culture background. Furthermore, the study sample size was large and enrolled consecutive patients with proper sex ratio, wide distribution of age, diabetes duration and metabolic control situation which mimics the characteristics of diabetes in the real-world situation. Second, it was a large AI-enabled DR screening program, with comparable performance to specialists. The automated DL system proved to be a scalable solution given the markedly increased diabetes prevalence and relatively inadequate medical resources in China, so as to perform effective screening of patients at diabetes centers that diagnose and manage the majority of patients with diabetes. In addition, the image QC module has significantly increased the validity and accuracy of DR screening, which enables the regular screening of DR in non-ophthalmic clinical settings. The study has several limitations. First, since the study was conducted at multiple clinical centers, even with the large sample size, the DR prevalence was not commensurate to that of the general population. Second, the estimated prevalence of DR (27.57%) and referable DR (16.59%) by the reference standard in one-third of the randomly selected participants were relatively lower than those by the AI screening. The higher negative predictive values, but the lower positive predictive value might lead to an overestimate of DR prevalence by the DL algorithm. While, the other factors, including only one single non-mydriatic fundus photography instead of multifield fundus photography were obtained might underestimate the DR prevalence by the DL algorithm. In addition, there were disagreements between the human graders and the DL QC model. The typical example of false negative result was the out of focus image judged as ungradable by the algorithm but gradable by the graders, while the false positive result was the too dark image judged as ungradable by graders but gradable by the algorithm (online supplemental figure 5). For all the above reasons, one should be cautious in interpreting the current findings. In conclusion, in the present study, we validated the feasibility and accuracy of an automated DL algorithm in DR screening and surveyed the prevalence of DR, referable DR and VTDR at 155 diabetes centers in China. With comparable performance to human specialists and scalability, the automated system may offer an effective, cost-efficient and practical screening in routine diabetes follow-up and retinal complication management. More diabetes centers and primary care facilities are now joining the program to improve and validate the screening and referral procedures, thereby endeavoring to mitigate the public health problem.

33 in total

1. On Deep Learning for Medical Image Analysis.

Authors: Lawrence Carin; Michael J Pencina
Journal: JAMA Date: 2018-09-18 Impact factor: 56.272

2. Metabolic Management Center: An innovation project for the management of metabolic diseases and complications in China.

Authors: Yifei Zhang; Weiqing Wang; Guang Ning
Journal: J Diabetes Date: 2018-10-03 Impact factor: 4.006

3. The influence of age, duration of diabetes, cataract, and pupil size on image quality in digital photographic retinal screening.

Authors: Peter Henry Scanlon; Chris Foy; Raman Malhotra; Stephen J Aldington
Journal: Diabetes Care Date: 2005-10 Impact factor: 19.112

4. Automated detection of diabetic retinopathy: barriers to translation into clinical practice.

Authors: Michael D Abramoff; Meindert Niemeijer; Stephen R Russell
Journal: Expert Rev Med Devices Date: 2010-03 Impact factor: 3.166

5. IDF Diabetes Atlas: Global estimates of diabetes prevalence for 2017 and projections for 2045.

Authors: N H Cho; J E Shaw; S Karuranga; Y Huang; J D da Rocha Fernandes; A W Ohlrogge; B Malanda
Journal: Diabetes Res Clin Pract Date: 2018-02-26 Impact factor: 5.602

6. Prevalence and Ethnic Pattern of Diabetes and Prediabetes in China in 2013.

Authors: Limin Wang; Pei Gao; Mei Zhang; Zhengjing Huang; Dudan Zhang; Qian Deng; Yichong Li; Zhenping Zhao; Xueying Qin; Danyao Jin; Maigeng Zhou; Xun Tang; Yonghua Hu; Linhong Wang
Journal: JAMA Date: 2017-06-27 Impact factor: 56.272

Review 7. Epidemiology of diabetes and diabetic complications in China.

Authors: Ronald C W Ma
Journal: Diabetologia Date: 2018-02-01 Impact factor: 10.122

8. An Automated Grading System for Detection of Vision-Threatening Referable Diabetic Retinopathy on the Basis of Color Fundus Photographs.

Authors: Zhixi Li; Stuart Keel; Chi Liu; Yifan He; Wei Meng; Jane Scheetz; Pei Ying Lee; Jonathan Shaw; Daniel Ting; Tien Yin Wong; Hugh Taylor; Robert Chang; Mingguang He
Journal: Diabetes Care Date: 2018-10-01 Impact factor: 19.112

9. Improved Automated Detection of Diabetic Retinopathy on a Publicly Available Dataset Through Integration of Deep Learning.

Authors: Michael David Abràmoff; Yiyue Lou; Ali Erginay; Warren Clarida; Ryan Amelon; James C Folk; Meindert Niemeijer
Journal: Invest Ophthalmol Vis Sci Date: 2016-10-01 Impact factor: 4.799

10. Sensitivity and specificity of nonmydriatic digital imaging in screening diabetic retinopathy in Indian eyes.

Authors: Vishali Gupta; Reema Bansal; Amod Gupta; Anil Bhansali
Journal: Indian J Ophthalmol Date: 2014-08 Impact factor: 1.848

14 in total

1. The Asian Pacific Association for the Study of the Liver clinical practice guidelines for the diagnosis and management of metabolic associated fatty liver disease.

Authors: Mohammed Eslam; Shiv K Sarin; Vincent Wai-Sun Wong; Jian-Gao Fan; Takumi Kawaguchi; Sang Hoon Ahn; Ming-Hua Zheng; Gamal Shiha; Yusuf Yilmaz; Rino Gani; Shahinul Alam; Yock Young Dan; Jia-Horng Kao; Saeed Hamid; Ian Homer Cua; Wah-Kheong Chan; Diana Payawal; Soek-Siam Tan; Tawesak Tanwandee; Leon A Adams; Manoj Kumar; Masao Omata; Jacob George
Journal: Hepatol Int Date: 2020-10-01 Impact factor: 6.047

2. The Validation of Deep Learning-Based Grading Model for Diabetic Retinopathy.

Authors: Wen-Fei Zhang; Dong-Hong Li; Qi-Jie Wei; Da-Yong Ding; Li-Hui Meng; Yue-Lin Wang; Xin-Yu Zhao; You-Xin Chen
Journal: Front Med (Lausanne) Date: 2022-05-16

3. Application and observation of artificial intelligence in clinical practice of fundus screening for diabetic retinopathy with non-mydriatic fundus photography: a retrospective observational study of T2DM patients in Tianjin, China.

Authors: Zhaohu Hao; Rong Xu; Xiao Huang; Xinjun Ren; Huanming Li; Hailin Shao
Journal: Ther Adv Chronic Dis Date: 2022-05-19 Impact factor: 4.970

4. Dynamic profiles of SARS-Cov-2 infection from five Chinese family clusters in the early stage of the COVID-19 pandemic.

Authors: Xiang-Gen Kong; Jin Geng; Tao Zhang; Bin Wang; An-Zhao Wu; Di Xiao; Zhao-Hua Zhang; Cai-Feng Liu; Li Wang; Xue-Mei Jiang; Yu-Chen Fan
Journal: Sci Rep Date: 2020-12-16 Impact factor: 4.379

5. Stronger association of triglyceride glucose index than the HOMA-IR with arterial stiffness in patients with type 2 diabetes: a real-world single-centre study.

Authors: Shujie Wang; Juan Shi; Ying Peng; Qianhua Fang; Qian Mu; Weiqiong Gu; Jie Hong; Yifei Zhang; Weiqing Wang
Journal: Cardiovasc Diabetol Date: 2021-04-22 Impact factor: 9.951

6. Profile of sight-threatening diabetic retinopathy and its awareness among patients with diabetes mellitus attending a tertiary care center in Kashmir, India.

Authors: Madhurima Kaushik; Shah Nawaz; Tariq Syed Qureshi
Journal: Indian J Ophthalmol Date: 2021-11 Impact factor: 1.848

7. A Classification Tree Model with Optical Coherence Tomography Angiography Variables to Screen Early-Stage Diabetic Retinopathy in Diabetic Patients.

Authors: Hongyan Yao; Shanjun Wu; Zongyi Zhan; Zijing Li
Journal: J Ophthalmol Date: 2022-02-15 Impact factor: 1.909

8. A stratified analysis of a deep learning algorithm in the diagnosis of diabetic retinopathy in a real-world study.

Authors: Na Li; Mingming Ma; Mengyu Lai; Liping Gu; Mei Kang; Zilong Wang; Shengyin Jiao; Kang Dang; Junxiao Deng; Xiaowei Ding; Qin Zhen; Aifang Zhang; Tingting Shen; Zhi Zheng; Yufan Wang; Yongde Peng
Journal: J Diabetes Date: 2021-12-09 Impact factor: 4.530

9. Effects of basal and premixed insulin on glycemic control in type 2 diabetes patients based on multicenter prospective real-world data.

Authors: Ying Peng; Peihong Xu; Juan Shi; Yifei Zhang; Shujie Wang; Qidong Zheng; Yufan Wang; Tingyu Ke; Li Li; Dong Zhao; Yuancheng Dai; Qijuan Dong; Bangqun Ji; Fengmei Xu; Weiqiong Gu; Weiqing Wang
Journal: J Diabetes Date: 2022-01-13 Impact factor: 4.530

10. Real-world artificial intelligence-based opportunistic screening for diabetic retinopathy in endocrinology and indigenous healthcare settings in Australia.

Authors: Jane Scheetz; Dilara Koca; Myra McGuinness; Edith Holloway; Zachary Tan; Zhuoting Zhu; Rod O'Day; Sukhpal Sandhu; Richard J MacIsaac; Chris Gilfillan; Angus Turner; Stuart Keel; Mingguang He
Journal: Sci Rep Date: 2021-08-04 Impact factor: 4.379