| Literature DB >> 31304372 |
Paisan Raumviboonsuk1, Jonathan Krause2, Peranut Chotcomwongse1, Rory Sayres2, Lily Peng2, Dale R Webster2, Rajiv Raman3, Kasumi Widner2, Bilson J L Campana2, Sonia Phene2, Kornwipa Hemarat4, Mongkol Tadarati1, Sukhum Silpa-Archa1, Jirawut Limwattanayingyong1, Chetan Rao3, Oscar Kuruvilla5, Jesse Jung6, Jeffrey Tan7, Surapong Orprayoon8, Chawawat Kangwanwongpaisan9, Ramase Sukumalpaiboon10, Chainarong Luengchaichawang11, Jitumporn Fuangkaew12, Pipat Kongsap13, Lamyong Chualinpha14, Sarawuth Saree15, Srirut Kawinpanitan16, Korntip Mitvongsa17, Siriporn Lawanasakol18, Chaiyasit Thepchatri19, Lalita Wongpichedchai20, Greg S Corrado2.
Abstract
Deep learning algorithms have been used to detect diabetic retinopathy (DR) with specialist-level accuracy. This study aims to validate one such algorithm on a large-scale clinical population, and compare the algorithm performance with that of human graders. A total of 25,326 gradable retinal images of patients with diabetes from the community-based, nationwide screening program of DR in Thailand were analyzed for DR severity and referable diabetic macular edema (DME). Grades adjudicated by a panel of international retinal specialists served as the reference standard. Relative to human graders, for detecting referable DR (moderate NPDR or worse), the deep learning algorithm had significantly higher sensitivity (0.97 vs. 0.74, p < 0.001), and a slightly lower specificity (0.96 vs. 0.98, p < 0.001). Higher sensitivity of the algorithm was also observed for each of the categories of severe or worse NPDR, PDR, and DME (p < 0.001 for all comparisons). The quadratic-weighted kappa for determination of DR severity levels by the algorithm and human graders was 0.85 and 0.78 respectively (p < 0.001 for the difference). Across different severity levels of DR for determining referable disease, deep learning significantly reduced the false negative rate (by 23%) at the cost of slightly higher false positive rates (2%). Deep learning algorithms may serve as a valuable tool for DR screening.Entities:
Keywords: Developing world; Diabetes complications
Year: 2019 PMID: 31304372 PMCID: PMC6550283 DOI: 10.1038/s41746-019-0099-8
Source DB: PubMed Journal: NPJ Digit Med ISSN: 2398-6352
Summary of patient characteristics, including breakdowns by region
| Region | All regions | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Grader type | MD | MD | MD | MD | MD | MD | Nurse | MD | Nurse | Nurse | Nurse | Nurse | Tech | |
| Total patients | 7517 | 100 | 620 | 569 | 440 | 513 | 620 | 680 | 1005 | 750 | 370 | 500 | 250 | 1000 |
| Total images | 29,985 | 764 | 2467 | 2256 | 1760 | 2051 | 2424 | 2720 | 4020 | 2989 | 1582 | 1986 | 968 | 3998 |
| % No/Mild NPDR | 87.83 | 68.30 | 92.32 | 94.02 | 92.95 | 87.85 | 88.81 | 82.85 | 82.10 | 89.62 | 75.24 | 92.45 | 86.67 | 93.81 |
| % Moderate NPDR | 9.80 | 23.84 | 5.65 | 5.35 | 5.17 | 7.40 | 8.77 | 12.53 | 16.21 | 8.14 | 20.06 | 6.15 | 10.75 | 5.39 |
| % Severe NPDR | 0.81 | 3.23 | 0.60 | 0.19 | 0.77 | 0.56 | 0.75 | 1.38 | 0.57 | 1.30 | 1.16 | 0.27 | 1.34 | 0.40 |
| % PDR | 1.57 | 4.63 | 1.43 | 0.43 | 1.12 | 4.18 | 1.67 | 3.24 | 1.11 | 0.94 | 3.55 | 1.14 | 1.23 | 0.40 |
| % Referable DME | 6.23 | 17.41 | 3.08 | 3.50 | 3.30 | 4.00 | 7.47 | 8.68 | 8.81 | 6.62 | 15.42 | 3.83 | 6.20 | 2.30 |
| % Female | 69 | 66 | 61 | 67 | 68 | 65 | 69 | 69 | 77 | 71 | 64 | 75 | 63 | 68 |
| % Male | 31 | 34 | 39 | 33 | 32 | 35 | 31 | 31 | 23 | 29 | 36 | 25 | 37 | 32 |
| Age | 59 (52, 66) | 58 (53, 64) | 57 (50, 63) | 59 (52, 66) | 63 (57, 70) | 62 (54, 70) | 58 (51, 65) | 62 (54, 68) | 59 (53, 66) | 56 (49, 64) | 56 (49, 64) | 63 (56, 71) | 56 (50, 61) | 59 (51, 67) |
| HbA1c (%) | 7.3 (6.5, 8.6) | 7.6 (6.9, 8.5) | 7.0 (6.4, 8.2) | 7.3 (6.4, 8.5) | 7.2 (6.5, 8.4) | 7.2 (6.3, 8.5) | 7.2 (6.5, 8.1) | 7.7 (6.8, 9.3) | 7.6 (6.5, 9.1) | 7.2 (6.3, 8.5) | 8.4 (7.3, 9.8) | 7.2 (6.5, 8.2) | 8.1 (7.1, 9.6) | 7.0 (6.3, 8.0) |
| FBS (mg/dL) | 139 (118, 169) | 130 (110, 172) | 136 (118, 168) | 138 (115, 166) | 133 (114, 156) | 150 (126, 181) | 140 (122, 175) | 140 (118, 199) | 144 (121, 172) | 133 (107, 170) | 149 (122, 188) | 131 (115, 154) | 149 (130, 186) | 136 (118, 163) |
| LDL (mg/dL) | 105 (83, 130) | 117 (107, 147) | 113 (90, 135) | 102 (79, 124) | 101 (81, 128) | 94 (75, 120) | 103 (80, 129) | 102 (93, 122) | 109 (86, 132) | 107 (86, 131) | 104 (85, 124) | 96 (73, 119) | 118 (96, 142) | 108 (88, 132) |
For blood sample measures and visual acuity, values reflect the distribution across patients at first visit. Numeric values indicate the median across a distribution; values in parentheses indicate the 25th and 75th percentiles.
MD ophthalmologist, DME diabetic macular edema, NPDR non-proliferative diabetic retinopathy, PDR proliferative diabetic retinopathy, HbA1c hemoglobin A1c, FBS fasting blood glucose, LDL low-density lipoprotein
Fig. 1Comparison of manual grading and algorithm performance. Receiver operating characteristic (ROC) curve of model (blue line) compared to grading by regional graders (red dot) for varying severities of diabetic retinopathy (DR) and diabetic macular edema (DME). The performance represented by the red dot is a combination of all of the grades from the regional graders on all gradable images, since regional graders only graded images from their own region
Fig. 2Comparison of algorithm and individual regional grader performance. Grader performances are represented as blue diamonds (ophthalmologists) and red dots (nurse or technician) for a moderate or worse non-proliferative diabetic retinopathy (NPDR), b diabetic macular edema (DME), and c severe NPDR, proliferative diabetic retinopathy (PDR), and/or DME. Analysis is performed on all gradable images
Fig. 3Agreement on the image level between the reference standard and regional graders. Comparison of diabetic retinopathy (DR) and diabetic macular edema (DME) performance between the reference standard and a, c regional graders or b, d the algorithm. Adjudication was performed only for images where either the regional grader or the algorithm identified as moderate and above. Thus, for DR, non-referable cases (no/mild) are combined into a non-referable bucket