| Literature DB >> 35589921 |
Kwanhoon Jo1, Dong Jin Chang2, Ji Won Min3, Young-Sik Yoo4, Byul Lyu5, Jin Woo Kwon6, Jiwon Baek7,8.
Abstract
We sought to evaluate the performance of machine learning prediction models for identifying vision-threatening diabetic retinopathy (VTDR) in patients with type 2 diabetes mellitus using only medical data from data warehouse. This is a multicenter electronic medical records review study. Patients with type 2 diabetes screened for diabetic retinopathy and followed-up for 10 years were included from six referral hospitals sharing same electronic medical record system (n = 9,102). Patient demographics, laboratory results, visual acuities (VAs), and occurrence of VTDR were collected. Prediction models for VTDR were developed using machine learning models. F1 score, accuracy, specificity, and area under the receiver operating characteristic curve (AUC) were analyzed. Machine learning models revealed F1 score, accuracy, specificity, and AUC values of up 0.89, 0.89.0.95, and 0.96 during training. The trained models predicted the occurrence of VTDR at 10-year with F1 score, accuracy, and specificity up to 0.81, 0.70, and 0.66, respectively, on test set. Important predictors included baseline VA, duration of diabetes treatment, serum level of glycated hemoglobin and creatinine, estimated glomerular filtration rate and blood pressure. The models could predict the long-term occurrence of VTDR with fair performance. Although there might be limitation due to lack of funduscopic findings, prediction models trained using medical data can facilitate proper referral of subjects at high risk for VTDR to an ophthalmologist from primary care.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35589921 PMCID: PMC9119940 DOI: 10.1038/s41598-022-12369-0
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Summary of clinical features (continuous variables) of type 2 diabetic patients with and without vision-threatening diabetic retinopathy in datasets of 10-year VTDR prediction.
| Clinical features | Non-VTDR (n = 2,924) | VTDR (n = 6,187) | P-value | Missing data (%) | ||
|---|---|---|---|---|---|---|
| Mean | SD | Mean | SD | |||
| Age (years) | 54.77 | 11.32 | 56.69 | 12.13 | < 0.001 | 0 |
| ALT (IU/L) | 25.81 | 17.62 | 24.48 | 18.64 | 0.002 | 5.8 |
| AST (IU/L) | 23.79 | 12.10 | 23.65 | 14.72 | 0.659 | 5.8 |
| BUN (mg/dL) | 17.41 | 9.11 | 20.10 | 11.82 | < 0.001 | 9.7 |
| Serum creatinine (mg/dL) | 1.09 | 1.18 | 1.35 | 1.52 | < 0.001 | 8.5 |
| eGFR (mL/min/1.73 m2) | 75.15 | 21.54 | 71.50 | 28.94 | < 0.001 | 13.2 |
| Serum glucose (mg/dL) | 147.98 | 66.85 | 170.89 | 84.72 | < 0.001 | 1.2 |
| HbA1c (%) | 7.29 | 1.48 | 8.01 | 1.98 | < 0.001 | 1.3 |
| Height (cm) | 158.91 | 7.71 | 160.79 | 8.46 | < 0.001 | 5.5 |
| Weight (kg) | 60.56 | 9.27 | 61.00 | 10.04 | < 0.001 | 5.3 |
| BP, diatolic (mmHg) | 72.72 | 9.10 | 73.36 | 10.62 | 0.005 | 4.1 |
| BP, systolic (mmHg) | 131.36 | 16.09 | 133.84 | 19.19 | < 0.001 | 4.1 |
| Low VA (logMAR) | 0.80 | 0.24 | 0.67 | 0.29 | < 0.001 | 0.7 |
| Mean VA (logMAR) | 0.72 | 0.26 | 0.54 | 0.28 | < 0.001 | 0.7 |
| MAP (mmHg) | 111.79 | 11.96 | 113.61 | 14.28 | < 0.001 | 4.1 |
| BMI (kg/m2) | 23.98 | 3.09 | 23.59 | 3.20 | < 0.001 | 5.5 |
| Diabetes treatment duration (days) | 1334.70 | 1192.81 | 1268.80 | 1395.27 | < 0.001 | 0 |
VTDR vision-threatening diabetic retinopathy, SD standard deviation, ALT alanine transaminase, AST aspartate transaminase, BUN blood urea nitrogen, eGFR estimated glomerular filtration rate, HbA1c glycated hemoglobin, VA visual acuity, BP blood pressure, MAP mean arterial pressure, BMI body mass index.
P-value: Independent t-test between non-VTDR and VTDR.
Summary of clinical features (categorical variables) of type 2 diabetic patients with and without vision-threatening diabetic retinopathy in datasets of 10-year.
| Clinical features | Non-VTDR (n = 2,924) | VTDR (n = 6,187) | P-value |
|---|---|---|---|
| Sex, male proportion | 46.40 | 58.10 | < 0.001 |
| Chronic kidney disease (%) | 22.23 | 27.58 | < 0.001 |
| Hypertension (%) | 82.39 | 73.16 | < 0.001 |
| Cerebrovascular disease (%) | 30.47 | 26.04 | < 0.001 |
| Cardiovascular disease (%) | 33.65 | 26.87 | < 0.001 |
| Smoking (%) | 12.10 | 16.80 | < 0.001 |
| Aspirin use (%) | 54.86 | 46.36 | < 0.001 |
| Insulin use (%) | 60.50 | 72.47 | < 0.001 |
| Clopidogrel use (%) | 24.56 | 24.46 | 0.919 |
VTDR vision-threatening diabetic retinopathy.
P-value: Chi-square test between non-VTDR and VTDR.
Performance parameters of trained model on validation for prediction of VTDR at 10-year.
| Methods | Precision | Recall (sensitivity) | F1 | Accuracy | Specificity | AUC | |
|---|---|---|---|---|---|---|---|
| Decision tree | Fine | 0.755 | 0.587 | 0.661 | 0.719 | 0.698 | 0.77 |
| Logistic regression | 0.696 | 0.639 | 0.666 | 0.701 | 0.705 | 0.76 | |
| SVM | Fine Gaussian | 0.834 | 0.958 | 0.892 | 0.892 | 0.958 | 0.96 |
| Naïve Bayes | Gaussian | 0.705 | 0.541 | 0.612 | 0.680 | 0.666 | 0.74 |
| Kernel | 0.713 | 0.333 | 0.454 | 0.626 | 0.602 | 0.83 | |
| Ensemble decision tree | Boosted tree | 0.805 | 0.625 | 0.703 | 0.754 | 0.725 | 0.91 |
| Bagged | 0.854 | 0.762 | 0.806 | 0.828 | 0.810 | 0.78 | |
| RUSBoosted Tree | 0.712 | 0.651 | 0.680 | 0.714 | 0.716 | 0.82 | |
| Neural network | Narrow | 0.798 | 0.733 | 0.764 | 0.789 | 0.782 | 0.83 |
| Wide | 0.809 | 0.735 | 0.770 | 0.795 | 0.785 | 0.84 | |
| Bilayered | 0.758 | 0.654 | 0.702 | 0.741 | 0.730 | 0.82 | |
| Trilayered | 0.748 | 0.649 | 0.695 | 0.734 | 0.725 | 0.80 | |
VTDR vision-threatening diabetic retinopathy, AUC area under curve of receiver operating characteristics, SVM support vector machine.
Performance parameters of trainined model on test set for prediction of VTDR at 10-year.
| Methods | Precision | Recall (sensitivity) | F1 | Accuracy | Specificity | |
|---|---|---|---|---|---|---|
| Decision tree | Fine | 0.826 | 0.556 | 0.665 | 0.623 | 0.455 |
| Logistic regression | 0.826 | 0.627 | 0.712 | 0.660 | 0.487 | |
| SVM | Fine Gaussian | 0.703 | 0.958 | 0.811 | 0.700 | 0.664 |
| Naïve Bayes | Gaussian | 0.823 | 0.551 | 0.660 | 0.619 | 0.451 |
| Kernel | 0.914 | 0.096 | 0.173 | 0.386 | 0.346 | |
| Ensemble decision tree | Boosted tree | 0.840 | 0.613 | 0.709 | 0.661 | 0.489 |
| Bagged | 0.797 | 0.758 | 0.777 | 0.707 | 0.548 | |
| RUSBoosted Tree | 0.809 | 0.636 | 0.712 | 0.654 | 0.480 | |
| Neural network | Narrow | 0.825 | 0.672 | 0.741 | 0.684 | 0.513 |
| Wide | 0.762 | 0.747 | 0.754 | 0.673 | 0.500 | |
| Bilayered | 0.815 | 0.680 | 0.741 | 0.681 | 0.509 | |
| Trilayered | 0.810 | 0.618 | 0.702 | 0.646 | 0.473 | |
VTDR vision-threatening diabetic retinopathy, SVM support vector machine.
Performance parameters of trained model on data set including loss to follow-up cases.
| Methods | Precision | Recall (sensitivity) | F1 | Accuracy | Specificity | |
|---|---|---|---|---|---|---|
| Decision tree | Fine | 0.715 | 0.813 | 0.761 | 0.743 | 0.780 |
| Logistic regression | 0.683 | 0.679 | 0.681 | 0.680 | 0.676 | |
| SVM | Fine Gaussian | 0.975 | 0.912 | 0.943 | 0.944 | 0.917 |
| Naïve Bayes | Gaussian | 0.673 | 0.561 | 0.612 | 0.642 | 0.619 |
| Kernel | 0.784 | 0.612 | 0.688 | 0.720 | 0.678 | |
| Ensemble decision tree | Boosted tree | 0.738 | 0.848 | 0.789 | 0.772 | 0.819 |
| Bagged | 0.910 | 0.896 | 0.903 | 0.903 | 0.896 | |
| RUSBoosted Tree | 0.673 | 0.738 | 0.704 | 0.688 | 0.706 | |
| Neural network | Narrow | 0.712 | 0.731 | 0.722 | 0.716 | 0.720 |
| Wide | 0.762 | 0.747 | 0.754 | 0.673 | 0.500 | |
| Bilayered | 0.708 | 0.746 | 0.726 | 0.717 | 0.727 | |
| Trilayered | 0.703 | 0.753 | 0.727 | 0.716 | 0.730 | |
VTDR vision-threatening diabetic retinopathy, SVM support vector machine.
Figure 1Feature importance analysis. (Left) High-weighted features for VTDR prediction using neighborhood component. (Right) Important predictors revealed by the predictor importance analysis for the bagged ensemble decision tree model.
Figure 2Dataset used in development, validation, and test of diabetic retinopathy risk prediction. This flowchart shows the process of obtaining and cleaning the dataset.