| Literature DB >> 25623470 |
Paolo Fraccaro1,2,3,4, Massimo Nicolo5, Monica Bonetto6,7, Mauro Giacomini6,7, Peter Weller1, Carlo Enrico Traverso8, Mattia Prosperi2,4, Dympna OSullivan1.
Abstract
BACKGROUND: To investigate machine learning methods, ranging from simpler interpretable techniques to complex (non-linear) "black-box" approaches, for automated diagnosis of Age-related Macular Degeneration (AMD).Entities:
Mesh:
Year: 2015 PMID: 25623470 PMCID: PMC4417241 DOI: 10.1186/1471-2415-15-10
Source DB: PubMed Journal: BMC Ophthalmol ISSN: 1471-2415 Impact factor: 2.209
Population’s characteristics
| Parameter | M | F | Total | Missing |
|---|---|---|---|---|
| Number of patients (%) | 241 (49.5%) | 246 (50.5%) | 487 | / |
| Number of eyes (%) | 444 (48.7%) | 468 (51.3%) | 912 | / |
| Number of healthy eyes (%) | 138 (31.1%) | 150 (32.1%) | 288 (31.6%) | / |
| Age (mean+/−std) | 65.3 +/− 14.9 | 70.5 +/− 12.7 | 68 +/− 14.1 | / |
| Soft drusen positive (%) | 21 (6.4%) | 62 (17.9%) | 83 (12.4%) | 240 (26.3%) |
| Macular scar positive (%) | 19 (5.8%) | 32 (9.2%) | 51 (7.6%) | 237 (26%) |
| RPE defect/pigment mottling positive (%) | 82 (25.2%) | 118 (34.1%) | 200 (29.8%) | 240 (26.3%) |
| Depigmentation area positive (%) | 95 (29.1%) | 134 (38.7%) | 229 (34.1%) | 240 (26.3%) |
| Subretinal fluid positive (%) | 79 (21.8%) | 50 (13.4%) | 129 (17.5%) | 176 (19.3%) |
| Macular tickness (mean+/−std) | 297.4 +/− 64.8 | 277 +/− 54.5 | 286.8 +/− 60.5 | 149 (16.3%) |
| Subretinal fibrosis positive (%) | 18 (5.7%) | 26 (7.5%) | 44 (6.6%) | 248 (27.2%) |
| Subretinal hemorrhage positive (%) | 16 (5.2%) | 19 (5.9%) | 35 (5.5%) | 281 (30.8%) |
| AMD diagnosis (%) | 100 (22.5%) | 147 (31.4%) | 247 (27.1%) | / |
Percentages of attributes are calculated considering the total of eyes with no missing values for the specific attribute in the strata (Male/Female) and total.
Prevalence of diagnoses of retinal diseases in the whole population stratified by soft drusen, depigmentation area and RPE defect/pigment mottling (counting one eye as a single case)
| Disease (or healthy status) | N (%) | Soft drusen positive N (%) | Depigmentation area positive N (%) | RPE defect/pigment mottling positive N (%) |
|---|---|---|---|---|
| AMD | 247 (27.1%) | 76 (30.8%) | 136 (55.1%) | 125 (50.6%) |
| Angioid streaks | 5 (0.5%) | 0 (0%) | 3 (60%) | 1 (20%) |
| Central serous chorioretinopathy | 69 (7.6%) | 0 (0%) | 21 (30.4%) | 17 (24.6%) |
| Choroidal hemangioma | 3 (0.3%) | 0 (0%) | 0 (0%) | 0 (0%) |
| Diabetic retinopathy | 126 (13.8%) | 1 (0.8%) | 18 (14.3%) | 11 (8.7%) |
| Distrophy | 24 (2.6%) | 3 (12.5%) | 10 (41.7%) | 9 (37.5%) |
| Epiretinal membrane | 30 (3.3%) | 0 (0%) | 4 (13.3%) | 4 (13.3%) |
| Inflammatory cystoid macular edema | 4 (0.4%) | 0 (0%) | 0 (0%) | 0 (0%) |
| Macroaneurisma | 1 (0.1%) | 0 (0%) | 0 (0%) | 0 (0%) |
| Pathologic myopia | 66 (7.2%) | 0 (0%) | 22 (33.3%) | 12 (18.2%) |
| Retinal artery occlusion | 3 (0.3%) | 0 (0%) | 1 (33.3%) | 1 (33.3%) |
| Retinal vein occlusion | 41 (4.5%) | 0 (0%) | 1 (2.4%) | 2 (4.9%) |
| Uveitis | 5 (0.5%) | 0 (0%) | 1 (20%) | 0 (0%) |
Performance of the machine learning methods in terms of average (+/− std. dev.) sensitivity, specificity and area under the receiver operating characteristic (AUC), applied to the dataset with different missing value imputation techniques (complete cases, categorical variable encoding the missingness, mean/mode imputation, and random forest imputation)
| Type of imputation on the dataset (N = 444) | Performance | One-rule | Decision tree | Logistic regression | Random forest | AdaBoost | Support vector machine |
|---|---|---|---|---|---|---|---|
|
| AUC | 0.74+/−0.05 | 0.90+/−0.03 | 0.93+/−0.04 |
| 0.92+/−0.02 | 0.92+/−0.03 |
| Sensitivity | 0.87+/−0.10 | 0.88+/−0.07 | 0.92+/−0.03 | 0.90+/−0.03 | 0.91+/−0.02 |
| |
| Specificity | 0.60+/−0.18 | 0.74+/−0.15 | 0.70+/−0.08 |
| 0.71+/−0.06 | 0.67+/−0.07 | |
|
| AUC | 0.73+/−0.04 | 0.88+/−0.02 | 0.91+/−0.01 |
| 0.90+/−0.01 | 0.89+/−0.03 |
| Sensitivity | 0.92+/−0.07 | 0.88+/−0.07 | 0.92+/−0.03 | 0.91+/−0.02 | 0.91+/−0.03 |
| |
| Specificity | 0.42+/−0.05 | 0.61+/−0.18 | 0.60+/−0.07 |
| 0.60+/−0.06 | 0.51+/−0.07 | |
|
| AUC | 0.69+/−0.05 | 0.85+/−0.02 |
| 0.87+/−0.02 | 0.87+/−0.02 | 0.86+/−0.04 |
| Sensitivity | 0.94+/−0.05 | 0.92+/−0.04 | 0.94+/−0.02 | 0.93+/−0.02 | 0.93+/−0.02 |
| |
| Specificity | 0.31+/−0.12 | 0.56+/−0.10 | 0.54+/−0.05 |
| 0.53+/−0.05 | 0.47+/−0.06 | |
|
| AUC | 0.79+/−0.02 | 0.95+/−0.02 |
|
|
| 0.94+/−0.03 |
| Sensitivity |
| 0.94+/−0.04 | 0.96+/−0.02 | 0.94+/−0.02 | 0.95+/−0.01 | 0.96+/−0.01 | |
| Specificity | 0.60+/−0.06 | 0.78+/−0.09 | 0.75+/−0.05 |
| 0.76+/−0.04 | 0.75+/−0.05 |
Results are calculated on 50 bootstrap tests, using out-of-bag predictions (in bold the best performance for each characteristic).
Figure 1Receiver operating characteristic curves plotting performance of different statistical learning methods, averaging results from 50 bootstrap tests (out-of-bag predictions, dataset imputing mean/mode for missing values).
Odds ratio from fitting the LogitBoost logistic regression on the AMD diagnosis outcome (dataset imputing missing values with population’s mean/mode)
| Variable (mode) | Odds ratio | Lower 95% CI | Upper 95% CI | P-value |
|---|---|---|---|---|
| Age (per year older) | 1.09 | 1.07 | 1.11 |
|
| Gender (M vs F) | 1.05 | 0.71 | 1.57 | 0.7985 |
| Soft drusen (pos vs neg) | 19.30 | 7.82 | 47.65 |
|
| Macular scar (pos vs neg) | 1.75 | 0.57 | 5.41 | 0.329 |
| RPE defect/pigment mottling (pos vs neg) | 2.20 | 1.20 | 4.04 |
|
| Depigmentation area (pos vs neg) | 1.35 | 0.73 | 2.51 | 0.3349 |
| Subretinal fluid (pos vs neg) | 3.21 | 1.70 | 6.08 |
|
| Macular tickness (per unit increase) | 1.00 | 0.99 | 1.00 | 0.139 |
| Subretinal fibrosis (pos vs neg) | 4.60 | 1.39 | 15.24 |
|
| Subretinal hemorrhage (pos vs neg) | 5.91 | 1.49 | 23.42 |
|
Statistically significant p-values are reported in bold.
Figure 2Decision tree for the diagnosis of AMD (dataset with mean/mode imputation for missing values). The tree is to be traversed downwards from the root node. The p-values are calculated according to a chi-square test and represent the discriminatory power of a variable in a data stratum as induced by the tree partition. Each final leaf node gives the probability of AMD diagnosis based on the prevalence in the population sub-stratum following the corresponding tree pathway induced by node splits on variable values.
Variable importance ranking for one rule, random forest, adaboost and support vector machine (mean/mode dataset)
| Ranking | One rule | Random forest | AdaBoost | Support vector machine |
|---|---|---|---|---|
| 1 | Soft drusen | Age | Age | Age |
| 2 | Age | Soft drusen | Soft drusen | Gender |
| 3 | Depigmentation area | Macular tickness | Subretinal fluid | Soft drusen |
| 4 | RPE defect.pigment mottling | Depigmentation area | Subretinal hemorrhage | Subretinal hemorrhage |
| 5 | Subretinal hemorrhage | RPE defect.pigment mottling | RPE defect.pigment mottling | Subretinal fluid |
| 6 | Subretinal fibrosis | Subretinal fibrosis | Gender | Subretinal fibrosis |
| 7 | Macular.scar | Subretinal hemorrhage | Subretinal fibrosis | RPE defect.pigment mottling |
| 8 | Subretinal fluid | Subretinal fluid | Macular.scar | Macular.scar |
| 9 | Gender | Macular.scar | Macular tickness | Macular tickness |
| 10 | Macular tickness | Gender | Depigmentation area | Depigmentation area |