| Literature DB >> 32818090 |
Tariq Mehmood Aslam1,2, David Charles Hoyle1, Vikram Puri1, Goncalo Bento1,2.
Abstract
Purpose: To investigate the potential of statistical and machine learning approaches to determine the diabetic status of patients from optical coherence tomography angiography (OCT-A) images.Entities:
Keywords: artificial intelligence; diabetes; image analysis; machine learning; optical coherence tomography
Mesh:
Year: 2020 PMID: 32818090 PMCID: PMC7396193 DOI: 10.1167/tvst.9.4.2
Source DB: PubMed Journal: Transl Vis Sci Technol ISSN: 2164-2591 Impact factor: 3.283
Figure 1.OCT-A 3 × 3 mm scan of the superficial capillary plexus of the left eye of a patient with known diabetic retinopathy. (A) Original scan, (B) segmentation by our algorithms. The large central area bounded by red is the FAZ. It is irregular and distorted with numerous spaces of capillary dropout surrounding it (areas bordered by red lines). The green lines represent capillaries (small vessels) and the blue lines those areas segmented as large vessels.
Summary Statistics by Diabetic Status of all Patients Included in Study Analyses
| Diabetic no | Diabetic with | |||
|---|---|---|---|---|
| Retinopathy, | Retinopathy, | |||
| Variable | Statistic | Normals, N (n = 49) | DnR (n = 50) | DR (n = 53) |
| Sex | Male 32, Female 17 | Male 36, Female 14 | Male 43, Female 10 | |
| Age | Mean (SD) | 57.14 (13.56) | 61.06 (12.77) | 58.38 (13.06) |
| Median | 57.0 | 61.5 | 60 | |
| BCVA | Mean (SD) | 85 (19.66) | 96.36 (7.39) | 86.17 (13.43) |
| Median | 95 | 95 | 90 | |
| Adjacent area sum | Mean (SD) | 3117.12 (3662.07) | 3613.92 (3458.93) | 12427.91 (6899.22) |
| Median | 1953.0 | 1978.5 | 10959 | |
| Circ max | Mean (SD) | 0.32 (0.16) | 0.24 (0.10) | 0.18 (0.088) |
| Median | 0.29 | 0.22 | 0.18 | |
| Ave percent sk | Mean (SD) | 12.89 (0.44) | 13.14 (0.53) | 12.58 (0.41) |
| Median | 12.85 | 13.03 | 12.52 | |
| Mean ves int | Mean (SD) | 180.65 (6.43) | 181.38 (6.04) | 179.28 (7.45) |
| Median | 182.0 | 183.5 | 181.0 | |
| Mean cap int | Mean (SD) | 98.69 (6.23) | 94.22 (5.41) | 93.47 (6.03) |
| Median | 99 | 95 | 94 |
Image analysis metrics shown are area of ischemic zones around FAZ (adjacent area sum), FAZ circularity (circ max), skeletonized capillary percentage area (ave percent sk), mean vessel intensity (mean ves int), and mean capillary intensity (mean cap int).
Figure 2.Multiclass decision tree. At the root of each branch is the variable that the algorithm uses to decide the split at that branch. The labels at the terminal nodes are those in which the algorithm predicts to be the most probable diabetic status. Where we see the same terminal label at both nodes of a binary split, this means that the variable has detected different distributions despite the label being the same.
AUC Values from Analyses of Different Binary Classifiers
| Classifier | AUC | AUC | Specificity | Specificity |
|---|---|---|---|---|
| All Diabetic vs. Normal | DR vs. DnR | All Diabetic vs. Normal | DR vs. DnR | |
| Naïve Bayes | 0.78 (0.70, 0.86) | 0.90 (0.83, 0.95) | 0.43 (0.29, 0.59) | 0.68 (0.40, 0.88) |
| Decision Tree | 0.61 (0.52, 0.70) | 0.83 (0.75, 0.91) | 0.12 (0.04, 0.24) | 0.46 (0.23, 0.78) |
| Logistic Regression | 0.79 (0.71, 0.87) | 0.91 (0.85, 0.96) | 0.47 (0.31, 0.63) | 0.72 (0.56, 0.88) |
| Random Forest | 0.80 (0.73, 0.87) | 0.86 (0.79, 0.93) | 0.49 (0.31, 0.69) | 0.66 (0.34, 0.82) |
| Gradient Boosting | 0.75 (0.67, 0.83) | 0.84 (0.76, 0.92) | 0.41 (0.12, 0.59) | 0.62 (0.22, 0.80) |
Values for AUC for each of the various forms of machine learning and statistical analysis that have been applied are given with respect to the classification tasks in the first two columns; N versus DM individuals and DR versus DnR. A perfect classifier has an AUC value of 1. In the rightmost two columns, specificity estimates of each of the binary classifiers are provided for a fixed 90% value of sensitivity. Values for the 95% confidence intervals are shown in brackets.
Figure 3.The left image represents the ROC curves for all classifiers for N versus DM. The ROC curves shows how the true-positive rate (sensitivity) increases with increase in the false-positive rate (1-specificity). The ROC curve for a near perfect classifier would start from the origin and rise steeply into the top-left quadrant. The AUC measures how close the ROC curve is to that of a perfect classifier. A perfect classifier has an AUC value of 1. The right image similarly represents the ROC curves for all classifiers for DR versus DnR.
Variable Importance Measures for the Random Forest Classifier when Differentiating N from DM Individuals
| Mean Decrease | Mean Decrease in | |
|---|---|---|
| Variable | in Accuracy | Gini Coefficient |
| Adjacent area sum | 21.11 (21.059, 21.15) | 18.45 (18.44, 18.47) |
| Circ max | 10.21 (10.17, 10.26) | 15.27 (15.25, 15.28) |
| Ave percent sk | 6.44 (6.40, 6.48) | 11.01 (11.00, 11.017) |
| Mean ves int | 2.00 (1.96, 2.045) | 7.69 (7.68, 7.70) |
| Mean cap int | 12.81 (12.76, 12.85) | 12.99 (12.97, 13.00) |
The table shows the mean decrease in accuracy estimated from 2000 runs of the Random Forest algorithm applied to the full dataset. The more the accuracy of the Random Forest classifier decreases when breaking the link between the predictor variable and the outcome variable, the more important that predictor variable. Variables with a large mean decrease in accuracy are therefore more important for classification of the data. The Gini importance measures the mean decrease of node impurity by splits of a given variable. If the variable is useful, it tends to split mixed labeled nodes into pure single class nodes. The table shows the average across the 2000 runs of the algorithm of the mean decrease in Gini coefficient for each predictor variable. In all cases the numbers in brackets denote the range corresponding to ±1.96 standard errors around the average. Thus we can see that area of ischemic zones around FAZ (adjacent area sum) and FAZ circularity (circ max) are the most important variables.
Table of Coefficient Values for Logistic Regression of Image Analysis Measures Differentiating DR versus DnR
| Standard | |||
|---|---|---|---|
| Variable | Effect | Error |
|
| Adjacent area sum | 2.56 | 0.69 | 0.00021 |
| Circ max | –0.36 | 0.36 | 0.31 |
| Ave percent sk | –1.65 | 0.48 | 0.00052 |
| Mean ves int | 0.085 | 0.45 | 0.85 |
| Mean cap int | –0.11 | 0.45 | 0.81 |
It is evident from this table that significant effect is seen from the variables of area of ischemic zones around FAZ (adjacent area sum) and percentage skeletonized area (ave percent sk).
*** P Value < 0.001.