| Literature DB >> 35966858 |
Wenzhu Song1, Xiaoshuang Zhou2, Qi Duan3, Qian Wang3, Yaheng Li3, Aizhong Li3, Wenjing Zhou4, Lin Sun5, Lixia Qiu1, Rongshan Li2,3, Yafeng Li2,3,6,7.
Abstract
Objectives: Chronic kidney disease (CKD) is a common chronic condition with high incidence and insidious onset. Glomerular injury (GI) and tubular injury (TI) represent early manifestations of CKD and could indicate the risk of its development. In this study, we aimed to classify GI and TI using three machine learning algorithms to promote their early diagnosis and slow the progression of CKD.Entities:
Keywords: auxiliary diagnosis; glomerular injury; machine learning; random forest; tubular injury
Year: 2022 PMID: 35966858 PMCID: PMC9366016 DOI: 10.3389/fmed.2022.911737
Source DB: PubMed Journal: Front Med (Lausanne) ISSN: 2296-858X
FIGURE 1Before and after SMOTE of response variables for GI and TI. SMOTE, Synthetic Minority Over-Sampling Technique. It’s a good and powerful way to handle imbalanced data, and it was conducted under the parameters of k = 5, C.perc = “balance,” dist = “Overlap.” (A) GI before SMOTE; (B) TI before SMOTE; (C) GI after SMOTE; (D) TI after SMOTE.
FIGURE 2Workflow of the model construction.
Clinical parameters of study subjects (quantitative ones).
| Variables | Variables | ||
| Age(y) | 58.75 ± 9.49 | TC (mmol/L) | 4.43 ± 0.95 |
| LDL (mmol/L) | 2.35 ± 0.84 | TG (mmol/L) | 1.73 ± 0.82 |
| HDL (mmol/L) | 1.30 ± 0.37 | Hcy (mmol/L) | 22.98 ± 14.26 |
| FPG (mmol/L) | 4.97 ± 1.36 | SBP (mmHg) | 136.10 ± 18.39 |
| GHb (mmol/L) | 5.54 ± 1.09 | DBP (mmHg) | 82.84 ± 10.76 |
Clinical parameters of study subjects (qualitative ones).
| Variables | Percentage (%) | Variables | Percentage (%) | Variables | Percentage (%) |
|
|
|
| |||
| ≤Primary | 32.7 | Underweight | 1.7 | <5k | 41.8 |
| ≤Junior | 50.9 | Normal | 39.5 | 5k–10k | 25.5 |
| ≤Senior | 11.9 | Overweight | 42.6 | 10k–20k | 10.3 |
| ≥Bachelor | 4.5 | Obesity | 16.3 | >20k | 22.4 |
|
|
|
| |||
| Light | 26.3 | Rarely | 84.7 | Vegetable | 33.5 |
| Moderate | 60.5 | Sometimes | 13.2 | Balanced | 61.9 |
| Salty | 13.1 | Always | 2.1 | Meat | 4.6 |
|
|
|
| |||
| Regular | 41.7 | No | 76.2 | Male | 42.4 |
| None or a little | 58.3 | Yes | 23.8 | Female | 57.6 |
FIGURE 3Results of feature selection using LASSO. When Lamda is minimum, corresponding features were taken into model construction (14 features for GI, and 15 feature for TI). (A) Feature selection for GI; (B) feature selection for TI.
Performance evaluation of the three classifiers on the training set (GI/TI).
| Model | Accuracy (%) | Sensitivity (%) | Specificity (%) |
| RF | 99.90/99.92 | 99.96/99.94 | 99.84/99.90 |
| NB | 65.39/67.06 | 52.08/54.26 | 78.71/79.65 |
| LR | 66.40/68.52 | 64.90/66.94 | 67.90/70.08 |
Performance evaluation of the three classifiers on the testing set (GI/TI).
| Model | Accuracy(%) | Sensitivity(%) | Specificity(%) |
| RF | 78.14/80.49 | 82.00/84.60 | 74.29/76.09 |
| NB | 65.17/65.68 | 52.23/53.34 | 78.10/78.86 |
| LR | 66.87/67.51 | 64.23/66.06 | 69.51/69.04 |
FIGURE 4Comparison of the ROC curve areas of the three model classifiers. In model construction, 70% of samples were randomly divided as training set, and the rest 30% were as testing set. AUC (area under curve) was used to evaluate the performance of these three classifiers. (A) AUC of GI in the training set; (B) AUC of GI in the testing set; (C) AUC of TI in the training set; (D) AUC of TI in the testing set.
FIGURE 5Contributions of explanatory variables to the random forest model. The “%IncMSE” is the increase in mean squared error, where the error of the model prediction is increased by randomly replacing the value of each predictor variable if it is more important. Therefore, a larger value indicates that the variable is more important.