| Literature DB >> 34977075 |
Cheng Yang1, Qingyang Liu2, Haike Guo3,4, Min Zhang2, Lixin Zhang5, Guanrong Zhang6, Jin Zeng1, Zhongning Huang1, Qianli Meng1, Ying Cui1.
Abstract
Purpose: To development and validation of machine learning-based classifiers based on simple non-ocular metrics for detecting referable diabetic retinopathy (RDR) in a large-scale Chinese population-based survey.Entities:
Keywords: XGBoost; classifier; diabetic retinopathy; machine learning; population-based study
Year: 2021 PMID: 34977075 PMCID: PMC8717406 DOI: 10.3389/fmed.2021.773881
Source DB: PubMed Journal: Front Med (Lausanne) ISSN: 2296-858X
A list of the variables that were used for modeling in this study.
|
|
|
|
|
|
|
|
|
|
|
|
Figure 1Machine learning flowchart of this study. ML, machine learning; XGBoost, extreme gradient boosting; ANN, artificial neural network; AdaBoost, adaptive boosting; GBM, gradient boosting machine.
Characteristics of the included participants.
|
|
|
| |
|---|---|---|---|
| No. of subjects | 1,336 | 82 | |
| Age, year | 60.0 (11.0) | 61.1 (10.7) | 0.389 |
| Male, | 572 (42.81) | 42 (51.22) | 0.136 |
| Current smoker, | 350 (26.20) | 21 (25.61) | 0.906 |
| Body mass index, kg/m2 | 26.35 (4.19) | 25.90 (3.51) | 0.367 |
| Waist-to-hip ratio | 0.91 (0.07) | 0.91 (0.06) | 0.355 |
| Systolic blood pressure, mmHg | 141.7 (20.1) | 147.1 (19.8) |
|
| Diastolic blood pressure, mmHg | 78.8 (11.2) | 79.1 (10.9) | 0.787 |
| Duration of diabetes, year | 1.33 (2.82) | 5.1 (5.5) |
|
| Fasting blood glucose, mmol/L | 7.80 (5.22) | 9.54 (4.48) |
|
| HbA1c, % | 7.01 (1.67) | 8.03 (2.05) |
|
| Blood urea nitrogen, mmol/L | 6.44 (14.81) | 6.39 (2.59) | 0.979 |
| Serum creatine, μmol/L | 78.13 (43.93) | 83.61 (30.32) | 0.307 |
| Triglyceride, mmol/L | 2.10 (2.12) | 2.13 (2.70) | 0.919 |
| Total cholesterol, mmol/L | 5.45 (1.26) | 5.59 (1.32) | 0.374 |
| Use of insulin, | 11 (0.82) | 2 (2.44) | 0.136 |
| Anti-hypertension medication, | 354 (26.50) | 25 (30.49) | 0.428 |
| History of hypertension, | 477 (35.70) | 35 (42.68) | 0.202 |
| History of hyperlipidemia, | 162 (12.13) | 12 (14.63) | 0.502 |
| Family history of hypertension, | 364 (27.25) | 24 (29.27) | 0.690 |
| Family history of diabetes, | 165 (12.35) | 20 (24.39) |
|
RDR, referable diabetic retinopathy. The bold values indicate statistically significant.
Figure 2Feature importance contributed to each machine learning model. (A) XGBoost. (B) Random forest. (C) Naïve Bayes. (D) KNN.
Figure 3Venn plot showing the most important features in each model for detecting referable diabetic retinopathy.
The performance of machine learning models for diagnosing referable diabetic retinopathy.
|
|
|
|
|
|
| |
|---|---|---|---|---|---|---|
| XGBoost | 0.816 (0.033) | 0.796 (0.064) | 0.796 (0.132) | 0.799 (0.073) | 0.179 (0.039) | 0.981 (0.007) |
| Logistic regression | 0.766 (0.083) | 0.797 (0.054) | 0.683 (0.159) | 0.808 (0.062) | 0.174 (0.053) | 0.972 (0.008) |
| AdaBoost | 0.754 (0.087) | 0.755 (0.044) | 0.743 (0.114) | 0.761 (0.046) | 0.137 (0.035) | 0.974 (0.010) |
| Naïve Bayes | 0.753 (0.090) | 0.788 (0.037) | 0.689 (0.126) | 0.799 (0.033) | 0.159 (0.049) | 0.972 (0.010) |
| Random forest | 0.705 (0.070) | 0.776 (0.080) | 0.622 (0.204) | 0.768 (0.106) | 0.151 (0.058) | 0.965 (0.009) |
| Light GBM | 0.640 (0.098) | 0.941 (0.012) | 0.358 (0.249) | 0.901 (0.084) | - | 0.956 (0.012) |
| KNN | 0.577 (0.048) | 0.930 (0.025) | 0.316 (0.197) | 0.839 (0.116) | - | 0.946 (0.016) |
| ANN | 0.475 (0.041) | 0.584 (0.215) | 0.570 (0.241) | 0.588 (0.242) | 0.054 (0.012) | 0.958 (0.015) |
AUC, area under ROC curve; PPV, positive predictive value; NPV, negative predictive value; XGBoost, extreme gradient boosting; KNN, k-nearest neighbor; ANN, artificial neural network; AdaBoost, adaptive boosting; GBM, gradient boosting machine.
Figure 4Receiver operating characteristic curves of five algorithms for detecting referable diabetic retinopathy based on top-10 important variables.