| Literature DB >> 35806944 |
Li-Ying Huang1, Fang-Yu Chen1, Mao-Jhen Jhou2, Chun-Heng Kuo1, Chung-Ze Wu3,4, Chieh-Hua Lu5, Yen-Lin Chen6, Dee Pei1, Yu-Fang Cheng7, Chi-Jie Lu2,8,9.
Abstract
The urine albumin-creatinine ratio (uACR) is a warning for the deterioration of renal function in type 2 diabetes (T2D). The early detection of ACR has become an important issue. Multiple linear regression (MLR) has traditionally been used to explore the relationships between risk factors and endpoints. Recently, machine learning (ML) methods have been widely applied in medicine. In the present study, four ML methods were used to predict the uACR in a T2D cohort. We hypothesized that (1) ML outperforms traditional MLR and (2) different ranks of the importance of the risk factors will be obtained. A total of 1147 patients with T2D were followed up for four years. MLR, classification and regression tree, random forest, stochastic gradient boosting, and eXtreme gradient boosting methods were used. Our findings show that the prediction errors of the ML methods are smaller than those of MLR, which indicates that ML is more accurate. The first six most important factors were baseline creatinine level, systolic and diastolic blood pressure, glycated hemoglobin, and fasting plasma glucose. In conclusion, ML might be more accurate in predicting uACR in a T2D cohort than the traditional MLR, and the baseline creatinine level is the most important predictor, which is followed by systolic and diastolic blood pressure, glycated hemoglobin, and fasting plasma glucose in Chinese patients with T2D.Entities:
Keywords: machine learning; nephropathy; type 2 diabetes; urine albumin-creatinine ratio
Year: 2022 PMID: 35806944 PMCID: PMC9267784 DOI: 10.3390/jcm11133661
Source DB: PubMed Journal: J Clin Med ISSN: 2077-0383 Impact factor: 4.964
Figure 1Flowchart of sample selection from the Cardinal Tien Hospital Diabetes Study Cohort.
Variable definition.
| Variables | Description | Unit |
|---|---|---|
| Sex | Male/Female | - |
| Age | Patient age | year |
| Body mass index | Body mass index | Kg/m2 |
| Duration of diabetes | Duration of diabetes | year |
| Smoking | No/Yes | - |
| Alcohol | No/Yes | - |
| Baseline fasting plasma glucose | Fasting plasma glucose baseline | mg/dL |
| Baseline glycated hemoglobin | HbA1c (Glycated hemoglobin) baseline | % |
| Baseline triglyceride | Triglyceride baseline | mg/dL |
| Baseline high-density lipoprotein cholesterol | High-density lipoprotein cholesterol baseline | mg/dL |
| Baseline low-density lipoprotein cholesterol | Low-density lipoprotein cholesterol baseline | mg/dL |
| Baseline alanine aminotransferase baseline | Alanine aminotransferase baseline | U/L |
| Baseline creatinine | Creatinine baseline | mg/dL |
| Baseline systolic blood pressure | Systolic blood pressure baseline | mmHg |
| Baseline diastolic blood pressure | Diastolic blood pressure baseline | mmHg |
| uACR at the end of follow-up | Urine albumin to creatinine ratio = albumin (mg/dL)/urine creatinine (mg/dL) follow up 4 year | mg/g |
uACR: urine albumin–creatinine ratio.
Figure 2Proposed ML prediction scheme.
Equation of Performance Metrics.
| Metrics | Description | Calculation |
|---|---|---|
| MAPE | Mean Absolute Percentage Error |
|
| SMAPE | Symmetric Mean Absolute Percentage Error |
|
| RAE | Relative Absolute Error |
|
where and represent predicted and actual values, respectively; stands the number of instances.
Participant demographics.
| Variables | Mean ± SD | N |
|---|---|---|
| Age | 63.82 ± 11.49 | 1123 |
| BMI | 26.45 ± 3.95 | 1134 |
| Duration of diabetes | 14.13 ± 7.65 | 1137 |
| Baseline fasting plasma glucose | 149.84 ± 42.80 | 1146 |
| Baseline glycated hemoglobin | 7.74 ± 1.49 | 1140 |
| Baseline triglyceride | 142.99 ± 94.55 | 1144 |
| Baseline high-density lipoprotein cholesterol | 44.87 ± 12.00 | 845 |
| Baseline low-density lipoprotein cholesterol | 98.82 ± 27.73 | 1129 |
| Baseline alanine aminotransferase baseline | 29.38 ± 21.48 | 1134 |
| Baseline creatinine | 0.90 ± 0.37 | 1093 |
| Baseline systolic blood pressure | 131.13 ± 14.07 | 969 |
| Baseline diastolic blood pressure | 75.91 ± 11.66 | 969 |
| uACR at the end of follow-up | 195.30 ± 711.98 | 1147 |
|
|
| |
| Sex | 1147 | |
| Male | 608 (53.01%) | |
| Female | 539 (46.99%) | |
| Smoking | 716 | |
| No | 430 (60.06%) | |
| Yes | 286 (39.94%) | |
| Alcohol | 789 | |
| No | 715 (90.62%) | |
| Yes | 74 (9.38%) |
BMI: body mass index. uACR: urine albumin–creatinine ratio.
The average performance of the MLR, RF, SGB, CART, and XGBoost methods.
| MAPE | SMAPE | RAE | |
|---|---|---|---|
| MLR | 18.245 (4.79) | 1.545 (0.04) | 1.126 (0.17) |
| RF | 16.174 (4.82) | 1.266 (0.05) | 1.072 (0.19) |
| SGB | 14.850 (3.09) | 1.522 (0.07) | 1.040 (0.16) |
| CART | 9.528 (1.76) | 1.312 (0.06) | 0.841 (0.10) |
| XGBoost | 11.872 (2.80) | 1.274 (0.06) | 0.915 (0.11) |
MLR: multiple linear regression; RF: random forest; SGB: stochastic gradient boosting; CART: classification and regression tree; XGBoost: eXtreme gradient boosting; MAPE: mean absolute percentage error; SMAPE: symmetric mean absolute percentage error; RAE: relative absolute error.
Wilcoxon sign-rank test between four ML methods and MLR method.
| RF | SGB | CART | XGBoost | |
|---|---|---|---|---|
| MLR | 41.736 (0.001) ** | 20.814 (0.001) ** | 30.680 (0.001) ** | 44.489 (0.001) ** |
The numbers in parentheses are the corresponding p-value; **: p < 0.05.
Importance ranking of each risk factor using the four convincing methods.
| Variables | RF | SGB | CART | XGBoost | Average | |
|---|---|---|---|---|---|---|
| Sex | 11.3 | 14.9 | 15.0 | 13.7 | 13.7 | |
| Age | 4.8 | 9.0 | 9.5 | 5.4 | 7.2 | |
| Body mass index | 14.9 | 11.8 | 12.0 | 9.8 | 12.1 | |
| Duration of diabetes | 8.8 | 7.0 | 10.7 | 8.4 | 8.7 | Rank value |
| Smoking | 10.8 | 14.4 | 15.0 | 14.7 | 13.7 | 1.0~1.4 |
| Alcohol | 11.6 | 13.6 | 15.0 | 14.6 | 13.7 | 1.5~2.4 |
| Baseline fasting plasma glucose | 5.4 | 6.3 | 10.9 | 5.3 | 7.0 | 2.5~3.4 |
| Baseline glycated hemoglobin | 5.8 | 5.0 | 10.3 | 6.1 | 6.8 | 3.5~4.4 |
| Baseline triglyceride | 11.9 | 10.2 | 12.7 | 13.1 | 12.0 | 4.5~5.4 |
| Baseline high-density lipoprotein cholesterol | 7.7 | 2.8 | 5.8 | 6.8 | 5.8 | 5.5~ |
| Baseline low-density lipoprotein cholesterol | 5.8 | 10.9 | 11.2 | 7.5 | 8.9 | |
| Baseline alanine aminotransferase baseline | 9.6 | 8.3 | 12.4 | 12.6 | 10.7 | |
| Baseline creatinine | 1.3 | 1.1 | 1.8 | 1.1 | 1.3 | |
| Baseline systolic blood pressure | 5.0 | 4.9 | 4.3 | 3.9 | 4.5 | |
| Baseline diastolic blood pressure | 5.3 | 4.1 | 4.1 | 4.7 | 4.6 |
Note: Different blue colors indicate different rank values of risk factors. The darker the blue color, the more important the risk factor.
Figure 3Integrated importance ranking of all risk factors. Note: The darker color indicates the first six important risk factors of this study.