| Literature DB >> 24033926 |
Ein Oh1, Tae Keun Yoo, Eun-Cheol Park.
Abstract
BACKGROUND: Blindness due to diabetic retinopathy (DR) is the major disability in diabetic patients. Although early management has shown to prevent vision loss, diabetic patients have a low rate of routine ophthalmologic examination. Hence, we developed and validated sparse learning models with the aim of identifying the risk of DR in diabetic patients.Entities:
Mesh:
Year: 2013 PMID: 24033926 PMCID: PMC3847617 DOI: 10.1186/1472-6947-13-106
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Figure 1Dataset used in the development and validation of diabetic retinopathy risk prediction. This flowchart shows the process of training, internal validation, external validation, and validation in the newly-diagnosed diabetic patients. KNHANES, Korean National Health and Nutrition Examination Survey; LASSO, least absolute shrinkage and selection operator; LR-BS, logistic regression with backward stepwise selection; OLR, ordinary logistic regression; ROC, receiver operating characteristic.
Characteristics of the patients with diabetes mellitus in the development dataset (KNHANES V-1)
| | | | | ||
| | Sex (male : female) | 253 : 237 | 211 : 195 | 42 : 42 | 0.742† |
| | Age (years) | 60.8 ± 11.7 | 60.7 ± 11.7 | 61.4 ± 11.6 | 0.463¶ |
| | Current smoke | 107 (21.8) | 84 (20.7) | 23 (27.4) | 0.177† |
| | Alcohol (>1 serving/week) | 196 (40.0) | 166 (40.9) | 30 (35.7) | 0.378† |
| | Physical activity (MET h/week) | 14.6 ± 13.0 | 14.2 ± 12.01 | 16.4 ± 16.6 | 0.673¶ |
| | Waist circumference (cm) | 87.1 ± 9.7 | 87.4 ± 9.9 | 85.5 ± 8.3 | 0.112‡ |
| | BMI (kg/m2) | 25.0 ± 3.3 | 25.2 ± 3.3 | 23.8 ± 3.3 | 0.001‡ |
| | | | | ||
| | Duration of diabetes (years) | 6.2 ± 7.4 | 5.2 ± 6.7 | 10.6 ± 8.7 | <0.001¶ |
| | Diagnosed diabetes | 331 (67.6) | 252 (62.1) | 79 (94.1) | <0.001† |
| | Insulin therapy | 30 (6.1) | 15 (3.7) | 15 (17.9) | <0.001† |
| | Anti-diabetic drug | 297 (60.6) | 233 (57.4) | 64 (76.2) | <0.001† |
| | Nondrug anti-diabetic therapy | 331 (67.6) | 252 (62.1) | 79 (94.1) | <0.001† |
| | Diagnosed hypertension | 267 (54.5) | 228 (56.2) | 39 (46.4) | 0.103† |
| | Drug for hypertension | 254 (51.8) | 216 (53.2) | 38 (45.2) | 0.184† |
| | Diagnosed hyperlipidemia | 145 (29.6) | 118 (29.1) | 27 (18.6) | 0.574† |
| | Drug for hyperlipidemia | 101 (20.6) | 84 (20.7) | 17 (20.2) | 0.926† |
| | | | | ||
| | Systolic BP (mmHg) | 126.2 ± 16.4 | 126.5 ± 16.1 | 124.9 ± 17.8 | 0.413‡ |
| | Diastolic BP (mmHg) | 75.3 ± 9.8 | 76.0 ± 9.7 | 72.1 ± 9.9 | 0.001¶ |
| | | | | ||
| | HbA1c (%) | 7.3 ± 1.5 | 7.1 ± 1.5 | 7.9 ± 1.5 | <0.001¶ |
| | FPG (mg/dL) | 139.3 ± 42.5 | 136.8 ± 40.5 | 151.2 ± 49.5 | 0.008¶ |
| | AST (IU/L) | 25.4 ± 12.8 | 25.3 ± 12.5 | 25.8 ± 14.4 | 0.408¶ |
| | ALT (IU/L) | 26.3 ± 16.4 | 26.3 ± 16.1 | 25.9 ± 17.6 | 0.297¶ |
| | Hemoglobin (g/dL) | 14.1 ± 1.5 | 14.2 ± 1.5 | 13.6 ± 1.7 | 0.003¶ |
| | Cholesterol (mg/dL) | 186.3 ± 40.8 | 185.9 ± 39.3 | 187.9 ± 47.5 | 0.836¶ |
| | HDL (mg/dL) | 47.8 ± 12.1 | 48.0 ± 12.1 | 46.9 ± 12.1 | 0.370¶ |
| | LDL (mg/dL) | 109.6 ± 34.3 | 109.6 ± 33.8 | 109.7 ± 36.8 | 0.827¶ |
| | TG (mg/dL) | 180.5 ± 172.7 | 172.9 ± 124.8 | 217.4 ± 313.0 | 0.218¶ |
| | BUN (mg/dL) | 15.8 ± 5.0 | 15.7 ± 4.9 | 16.2 ± 5.3 | 0.543¶ |
| | Serum creatinine (mg/dL) | 0.87 ± 0.27 | 0.86 ± 0.26 | 0.89 ± 0.30 | 0.709¶ |
| | | | | ||
| | Protein* (+) | 64 (13.1) | 44 (10.8) | 20 (23.8) | 0.001† |
| | Glucose* (+) | 123 (25.1) | 90 (22.2) | 33 (39.3) | 0.001† |
| | Ketone* (+) | 59 (12.0) | 48 (11.8) | 11 (13.1) | 0.744† |
| | Bilirubin* (+) | 58 (11.8) | 45 (11.1) | 13 (15.5) | 0.257† |
| | Blood* (+) | 169 (34.5) | 133 (32.8) | 36 (42.9) | 0.076† |
| | Urobilinogen* (+) | 6 (1.2) | 4 (1.0) | 2 (0.4) | 0.274§ |
| | Urine creatinine (mg/L) | 123.3 ± 69.9 | 125.9 ± 70.8 | 110.8 ± 64.2 | 0.051¶ |
| Urine sodium (mmol/day) | 124.5 ± 47.2 | 126.6 ± 48.1 | 114.3 ± 41.5 | 0.037¶ | |
*by Dipstick (0, negative; 1, positive).
p-value were obtained by †Chi-squared test, §Fisher’s exact test, ¶Mann–Whitney test, and ‡Student t-test.
Table values are given as mean ± standard deviation or number (%) unless otherwise indicated.
ALT Alanine aminotransferase AST Aspartate aminotransferase, BMI Body mass index, BP Blood pressure, BUN Blood urea nitrogen, FPG Fasting plasma glucose, HbA1c Glycated hemoglobin, HDL High-density lipoprotein, LDL Low-density lipoprotein, TG Triglyceride.
Figure 2Performance (AUC) of the penalized logistic regression models using the 5-fold cross validation. The penalized logistic regression models included ridge (A), elastic net (B), and LASSO (C). In order to optimize λ, we investigated the AUC during the 5-fold cross validation as λ increased. The λ that indicated the highest AUC was chosen for the final training condition.
Diabetic retinopathy risk predictors identified by LASSO
| | ||||||||
|---|---|---|---|---|---|---|---|---|
| | | | | | | | ||
| | Sex (female) | 0.199 | 0.099 | | | | | [ |
| | Age (years) | −0.022 | −0.260 | −0.007 | −0.077 | −0.012 | −0.146 | [ |
| | Current smoke | 0.216 | 0.088 | 0.532 | 0.218 | 0.469 | 0.192 | [ |
| | Alcohol (>1 serving/week) | −0.039 | −0.019 | −0.159 | −0.077 | −0.137 | −0.066 | [ |
| | Physical activity (MET h/week) | 0.005 | 0.058 | | | | | [ |
| | BMI (kg/m2) | −0.058 | −0.197 | −0.059 | −0.202 | −0.082 | −0.281 | [ |
| | | | | | | | ||
| | Duration of diabetes (years) | 0.054 | 0.376 | 0.027 | 0.186 | 0.027 | 0.187 | [ |
| | Diagnosed diabetes | 0.242 | 0.100 | 0.592 | 0.244 | 0.427 | 0.176 | [ |
| | Insulin therapy | 1.012 | 0.237 | 1.117 | 0.261 | 0.956 | 0.224 | [ |
| | Diagnosed hypertension | −0.228 | −0.227 | | | | | [ |
| | Drug for hyperlipidemia | −0.036 | −0.014 | | | −0.028 | −0.011 | [ |
| | | | | | | | ||
| | Diastolic BP (mmHg) | −0.007 | −0.067 | −0.007 | −0.066 | −0.004 | −0.041 | [ |
| | | | | | | | ||
| | HbA1c (%) | | | 0.103 | 0.155 | 0.054 | 0.081 | [ |
| | FPG (mg/dL) | | | 0.009 | 0.402 | 0.008 | 0.339 | [ |
| | Hemoglobin (g/dL) | | | −0.230 | −0.036 | −0.256 | −0.040 | [ |
| | TG (mg/dL) | | | 0.002 | 0.298 | 0.002 | 0.322 | [ |
| | HDL (mg/dL) | | | −0.003 | −0.030 | | | [ |
| | BUN (mg/dL) | | | 0.037 | 0.177 | 0.037 | 0.181 | [ |
| | | | | | | | ||
| | Protein (+) | | | | | 0.141 | 0.148 | [ |
| | Glucose (+) | | | | | 0.442 | 0.191 | [ |
| | Ketone (+) | | | | | −0.111 | −0.036 | |
| | Bilirubin (+) | | | | | 0.118 | 0.041 | |
| Blood (+) | 0.096 | 0.046 | ||||||
*Regression coefficient of logit operator.
†Standardized regression coefficient of logit operator.
BMI Body mass index, BP Blood pressure, BT Blood test, BUN Blood urea nitrogen, DE Demographics, FPG Fasting plasma glucose, HbA1c Glycated hemoglobin, HDL High-density lipoprotein, MH Medical history, TG Triglyceride, UT Urine test.
Figure 3Performance comparison of the prediction models in the internal validation group. BP, blood pressure; BT, blood test; DE, demographics; LASSO, least absolute shrinkage and selection operator; LR-BS, logistic regression with backward stepwise selection; MH, medical history; OLR, ordinary logistic regression; RMS, root mean square; UT, urine test.
Figure 4ROC curves for diabetic retinopathy prediction. The prediction models were tested in the internal (A) and external (B) validation groups. The LASSO and LR-BS models were trained in scenario 3. FPG, fasting plasma glucose; HbA1c, glycated hemoglobin; LASSO, least absolute shrinkage and selection operator; LR- BS, logistic regression with backward stepwise selection.
Diagnostic performance of prediction models in the internal and external validation groups
| (A) Internal validation group (N = 163) | |||||||
| | LASSO† | 0.81 (0.74-0.86) | 73.6 (66.0-80.1) | 77.4 (70.1-83.5) | 72.7 (65.1-79.3) | 40.0 | 93.2 |
| | LR-BS† | 0.79 (0.72-0.85) | 64.4* (56.5-71.7) | 83.9 (77.1-89.1) | 59.8 (51.9-67.4) | 32.9 | 94.0 |
| | HbA1c | 0.69* (0.62-0.76) | 66.3* (58.4-73.4) | 77.4 (70.1-83.5) | 63.6 (55.7-70.9) | 33.3 | 92.3 |
| | FPG | 0.54* (0.46-0.62) | 57.7* (49.7-65.3) | 61.3 (53.3-68.7) | 56.8 (48.8-64.5) | 25.0 | 86.2 |
| | Duration of diabetes | 0.72* (0.66-0.79) | 57.1* (49.1-64.7) | 87.1 (80.7-91.7) | 50.0 (42.1-57.9) | 29.0 | 94.3 |
| (B) External validation group (N = 562) | |||||||
| | LASSO† | 0.82 (0.78-0.85) | 75.2 (71.3-78.7) | 72.1 (68.0-75.8) | 76.0 (72.1-79.5) | 43.7 | 91.3 |
| | LR-BS† | 0.79 (0.75-0.83) | 68.7* (64.6-72.6) | 82.0 (78.4-85.1) | 65.3 (61.1-69.3) | 37.9 | 93.3 |
| | HbA1c | 0.69* (0.65-0.73) | 63.7* (59.5-67.7) | 70.3 (66.2-74.1) | 62.0 (57.7-66.1) | 32.4 | 89.0 |
| | FPG | 0.65* (0.60-0.69) | 68.3* (64.1-72.0) | 57.7 (53.4-61.8) | 73.4 (69.4-77.1) | 36.0 | 87.0 |
| Duration of diabetes | 0.73* (0.69-0.77) | 69.6* (66.5-74.3) | 64.9 (60.7-68.9) | 72.0 (68.0-75.7) | 37.5 | 88.8 | |
*AUC or accuracy is significantly different from the LASSO at the level of p < 0.05.
†The LASSO and LR-BS models were trained in scenario 3.
AUC Area under the receiver operating characteristic curve, CI Confidence interval, FPG Fasting plasma glucose, HbA1c Glycated hemoglobin, LASSO Least absolute shrinkage and selection operator, LR-BS Logistic regression with backward stepwise selection, NPV Negative predictive value, PPV Positive predictive value.
Diagnostic performance of prediction models in the newly-diagnosed diabetic patients in the total validation group
| LASSO* | 0.90 (0.84-0.95) | 89.2 (82.8-93.6) | 75.0 (67.1-81.6) | 89.6 (83.2-93.9) | 16.7 | 99.2 |
| LR-BS* | 0.85 (0.79-0.91) | 72.3 (64.2-79.2) | 100.0 (96.8-100.0) | 71.5 (63.4-78.5) | 8.9 | 100.0 |
| HbA1c | 0.64 (0.55-0.72) | 69.6 (61.4-76.8) | 62.5 (54.2-70.8) | 70.1 (62.0-77.3) | 4.4 | 98.1 |
| FPG | 0.73 (0.65-0.80) | 65.5 (57.2-73.1) | 75.0 (67.1-81.6) | 65.3 (57.0-72.8) | 5.7 | 98.9 |
*The LASSO and LR-BS models were trained in scenario 3.
AUC Area under the receiver operating characteristic curve, CI Confidence interval, FPG Fasting plasma glucose, HbA1c Glycated hemoglobin, LASSO Least absolute shrinkage and selection operator, LR-BS Logistic regression with backward stepwise selection, NPV Negative predictive value, PPV Positive predictive value.
Diagnostic performance of the commonly used algorithms in the literatures
| (A) Internal validation group (N = 163) | |||||||
| | SVM (RBF kernel) | 0.83 (0.76-0.88) | 74.8 (67.3-81.2) | 71.0 (63.3-77.7) | 75.8 (68.3-82.0) | 40.7 | 91.7 |
| | ANN | 0.79 (0.72-0.85) | 71.2 (63.5-77.9) | 80.6 (73.6-86.3) | 68.9 (61.2-75.9) | 37.9 | 93.8 |
| | Random Forest | 0.80 (0.73-0.85) | 72.4 (64.8-79.0) | 87.1 (80.7-91.7) | 68.9 (61.2-75.9) | 39.7 | 95.8 |
| | Naïve Bayes | 0.76 (0.69-0.82) | 74.2 (66.7-80.7) | 74.2 (66.6-80.6) | 74.2 (66.7-80.7) | 40.4 | 92.5 |
| | 0.52 (0.45-0.59) | 71.2 (63.5-77.9) | 16.1 (11.0-22.8) | 84.1 (77.4-89.3) | 19.2 | 81.0 | |
| (B) External validation group (N = 562) | |||||||
| | SVM | 0.81 (0.78-0.84) | 74.1 (70.1-77.7) | 75.7 (71.8-79.2) | 73.7 (69.7-77.3) | 42.6 | 92.1 |
| | ANN | 0.79 (0.76-0.83) | 71.9 (67.8-75.6) | 81.1 (77.5-84.3) | 69.5 (65.4-73.3) | 40.7 | 93.4 |
| | Random Forest | 0.76 (0.72-0.79) | 71.1 (67.1-74.9) | 69.4 (65.3-73.2) | 71.6 (67.5-75.3) | 38.7 | 90.0 |
| | Naïve Bayes | 0.73 (0.69-0.77) | 70.6 (66.5-74.3) | 69.4 (65.3-73.2) | 70.9 (66.8-74.6) | 38.1 | 89.9 |
| 0.52 (0.48-0.57) | 73.7 (69.7-77.3) | 16.2 (13.3-19.6) | 88.6 (85.5-91.1) | 26.9 | 80.3 | ||
*The models were trained and validated in scenario 3 without feature selection. The optimal conditions of each method were obtained in the 5-fold cross validation.
ANN Artificial neural network, AUC Area under the receiver operating characteristic curve, CI Confidence interval, NPV Negative predictive value, PPV Positive predictive value, RBF Radial basis function, SVM Support vector machine.