| Literature DB >> 31957737 |
Payal Shah1, Divyansh K Mishra1, Mahesh P Shanmugam1, Bindiya Doshi1, Hariprasad Jayaraj2, Rajesh Ramanjulu1.
Abstract
Purpose: Deep learning is a newer and advanced subfield in artificial intelligence (AI). The aim of our study is to validate a machine-based algorithm developed based on deep convolutional neural networks as a tool for screening to detect referable diabetic retinopathy (DR).Entities:
Keywords: Deep convolutional neural networks; diabetic retinopathy screening; machine learning; validation of artificial intelligence
Year: 2020 PMID: 31957737 PMCID: PMC7003578 DOI: 10.4103/ijo.IJO_966_19
Source DB: PubMed Journal: Indian J Ophthalmol ISSN: 0301-4738 Impact factor: 1.848
Figure 1Typical convolutional network showing the sequence of transformations – convolution and pooling
Distribution of DR staging between two retina specialists for internal and external validation datasets
| Internal validation set | Weighted Kappa | 0.953 | ||||||
|---|---|---|---|---|---|---|---|---|
| Observer B | Observer A | Total | Standard error | 0.005 | ||||
| No DR | Mild NPDR | Mod NPDR | Severe NPDR | PDR | 95% CI | 0.944-0.962 | ||
| No DR | 121 | 4 | 0 | 0 | 0 | 125 (8.2%) | ||
| Mild NPDR | 8 | 16 | 6 | 0 | 0 | 30 (2.0%) | ||
| Mod NPDR | 1 | 3 | 468 | 21 | 0 | 493 (32.2%) | ||
| Severe NPDR | 0 | 0 | 11 | 130 | 37 | 178 (11.6%) | ||
| PDR | 0 | 0 | 0 | 3 | 704 | 707 (46.1%) | ||
| Total | 130 | 23 | 485 | 154 | 741 | 1533 | ||
| No DR | 538 | 10 | 0 | 0 | 0 | 548 (45.7%) | ||
| Mild NPDR | 6 | 214 | 8 | 2 | 0 | 230 (19.2%) | ||
| Mod NPDR | 1 | 0 | 329 | 5 | 2 | 337 (28.1%) | ||
| Severe NPDR | 0 | 0 | 10 | 50 | 4 | 64 (5.3%) | ||
| PDR | 0 | 0 | 0 | 0 | 21 | 21 (1.8%) | ||
| Total | 545 | 224 | 347 | 57 | 27 | 1200 | ||
The percentage values in brackets indicate the proportion of cases under each stage of DR
Distribution of DR staging between Ground Truth grading and AI results for the internal validation dataset
| AI_Grade | Consensus grade (GT) for internal validation set | Total | |||||
|---|---|---|---|---|---|---|---|
| No DR | Mild NPDR | Mod NPDR | Severe NPDR | PDR | |||
| No DR | 132 | 3 | 1 | 0 | 0 | 136 (8.9%) | |
| Mild NPDR | 1 | 11 | 12 | 0 | 1 | 25 (1.6%) | |
| Mod NPDR | 1 | 7 | 189 | 11 | 10 | 218 (14.2%) | |
| Severe NPDR | 0 | 0 | 273 | 131 | 269 | 673 (43.9%) | |
| PDR | 0 | 0 | 4 | 9 | 468 | 481 (31.4%) | |
| 134 (8.7%) | 21 (1.4%) | 479 (31.2%) | 151 (9.8%) | 748 (48.8%) | 1533 | ||
| Standard error | 0.013 | ||||||
| 95% CI | 0.663-0.713 | ||||||
aLinear weights. The percentage values in brackets indicate the proportion of cases under each stage of DR
Performance of AI in comparison to Ground Truth for internal and external validation sets
| Internal validation set | ||||||
|---|---|---|---|---|---|---|
| Any DR detection | Referable DR detection | Sight threatening DR detection | ||||
| Sensitivity | 99.71% | 99.27%-99.92% | 98.98% | 98.30%-99.44% | 97.55% | 96.32%-98.46% |
| Specificity | 98.50% | 94.71%-99.82% | 94.84% | 90.08%-97.75% | 56.31% | 52.35%-60.21% |
| AUC | 0.991 | 0.985-0.995 | 0.969 | 0.959-0.977 | 0.77 | 0.747-0.790 |
| PPV | 99.86% | 99.44%-99.96% | 99.42% | 98.86%-99.70% | 76.00% | 74.34%-77.58% |
| NPV | 97.06% | 92.53%-98.87% | 91.30% | 86.16%-94.65% | 94.20% | 91.43%-96.10% |
| k | 0.975 | 0.956-0.995 | 0.922 | 0.890-0.954 | 0.572 | 0.532-0.612 |
| Sensitivity | 90.37% | 87.84%-92.52% | 94.68% | 92.12%-96.60% | 91.67% | 83.58%-96.58% |
| Specificity | 91.03% | 88.31%-93.29% | 97.40% | 96.01%-98.40% | 92.92% | 91.26%-94.36% |
| AUC | 0.907 | 0.889-0.923 | 0.96 | 0.948-0.971 | 0.923 | 0.906-0.937 |
| PPV | 92.34% | 90.22%-94.04% | 95.34% | 92.99%-96.93% | 49.36% | 43.84%-54.90% |
| NPV | 88.75% | 86.17%-90.90% | 97.02% | 95.62%-97.98% | 99.33% | 98.65%-99.67% |
| k | 0.812 | 0.779-8.845 | 0.922 | 0.899-0.945 | 0.606 | 0.531-0.680 |
Any DR=Stage 1, 2, 3, 4; Referable DR=Stage 2, 3, 4; Sight threatening DR=Stage 3 and 4
Figure 2AUC of internal validation set
Distribution of DR staging between Ground Truth grading and AI results for the external validation dataset
| AI Grade | Consensus grade (GT) for external validation set | Total | |||||
|---|---|---|---|---|---|---|---|
| No DR | Mild NPDR | Mod NPDR | Severe NPDR | PDR | |||
| No DR | 497 | 62 | 1 | 0 | 0 | 560 (46.7%) | |
| Mild NPDR | 48 | 141 | 22 | 0 | 0 | 211 (17.6%) | |
| Mod NPDR | 1 | 19 | 246 | 7 | 0 | 273 (22.8%) | |
| Severe NPDR | 0 | 0 | 78 | 52 | 8 | 138 (11.5%) | |
| PDR | 0 | 0 | 1 | 1 | 16 | 18 (1.5%) | |
| Total | 546 (45.5%) | 222 (18.5%) | 348 (29.0%) | 60 (5.0%) | 24 (2.0%) | 1200 | |
| Standard error | 0.01013 | ||||||
| 95% CI | 0.80187-0.84159 | ||||||
aLinear weights. The percentage values in brackets indicate the proportion of cases under each stage of DR
Figure 3AUC of external validation set
Figure 4Fundus image with lesion annotations by the AI algorithm
Brief summary of previous studies and their comparison with our results
| Dataset | Sensitivity | Specificity | |
|---|---|---|---|
| Any DR | |||
| Pratt | 5000 images | 30% | 95% |
| Gargeya and Leng[ | Messidor 2 (1748 images) | 93% | 87% |
| Abramoff | 819 images | 87.2% | 90.7% |
| Our study (India) | 1533 images Messidor 1 (1200 images) | 99.7% 90.37% | 98.5% 91.03% |
| Referable DR | |||
| Gulshan | EyePACS-1 (9963) | 90.3% 97.5% | 98.1% 93.4% |
| Messidor 2 (1748) | 87% 96.1% | 98.5% 93.9% | |
| Ting | Multiple sets of images | 90.5% | 91.6% |
| Ramachandran | ODEMS (382) Messidor 1 (1200) | 84.6% 96% | 79.7% 90% |
| Our study (India) | 1533 images Messidor 1 (1200) | 98.98% 94.68% | 94.84% 97.40% |