| Literature DB >> 35263356 |
Xuefei Lin1,2,3, Yongfang Liu2,3, Yizhen Chen1, Xiaodan Huang1, Jundu Li1, Yuansheng Hou1, Miaoying Shen1, Zaoqiang Lin1,4, Ronglin Zhang1,5, Haifeng Yang6, Songlin Hong7, Xusheng Liu8, Chuan Zou8,9.
Abstract
BACKGROUND AND OBJECTIVES: Immunoglobulin a nephropathy (IgAN) is the most common primary glomerular disease in the world, with different clinical manifestations, varying severity of pathological changes, common complications of crescent formation in different proportions, and great individual heterogeneous in clinical outcomes. Therefore, we aim to develop a machine learning (ML) based predictive model for predicting the prognosis of IgAN with focal crescent formation and without obvious chronic renal lesions (glomerulosclerosis <25%). MATERIALS: We retrospectively reviewed biopsy-proven IgAN patients in our hospital and cooperative hospital from 2005 to 2017. The method of feature importance of random forest (RF) was applied to conduct feature exploration of feature variables to establish the characteristic variables that are closely related to the prognosis of focal crescent IgAN. Multiple ML algorithms were attempted to establish the prediction models. The area under the precision-recall curve (AUPRC) and the area under the receiver operating characteristic curve (AUROC) were applied to evaluate the predictive performance via three-fold cross validation (namely 2 training sets and 1 validation set).Entities:
Mesh:
Substances:
Year: 2022 PMID: 35263356 PMCID: PMC8906594 DOI: 10.1371/journal.pone.0265017
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1All ML models were cross-verified with 3 fold cross-validation.
Fig 2Enrollment of IgAN patients in our cohort.
Baseline cohort characteristics.
| Factors | Overall (N = 374) | None-Endpoint (N = 308) | Endpoint (N = 66) | P value |
|---|---|---|---|---|
| Male, n (%) | 175(46.8) | 153(49.68) | 22(33.33) | 0.016 |
| Age(years) | 31(26–40) | 31(26–38.75) | 31.5(25–46) | 0.502 |
| Follow up, n (%) | 32.99(25.86–54.68) | 33.13(26.04–55.71) | 31.62(24.73–50.19) | 0.288 |
| Disease course, months | 7(1–24) | 7.5(1–24) | 5.5(1–24) | 0.821 |
| eGFR, mL/min/1.73m2 | 108.3±39.49 | 106.92±35.76 | 114.75±53.46 | 0.259 |
| Serum creatine, umol/L | 80.7±29.74 | 81.14±28.47 | 78.66±35.28 | 0.54 |
| Proteinuria, g/24h | 0.85(0.43–1.57) | 0.8(0.43–1.49) | 1.03(0.48–2.32) | 0.055 |
| Hematuria (red blood cells/high-power field) | 51(22.75–146.4) | 51(23–145.5) | 51(19.95–173.75) | 0.998 |
| BUN, mmol/L | 4.77(4–5.78) | 4.8(4.04–5.77) | 4.57(3.69–5.8) | 0.455 |
| Uric Acid, mmol/L | 341.5(280–414.25) | 343.5(280–416) | 335(278–406) | 0.752 |
| Cholesterol, mmol/L | 4.6(4–5.39) | 4.6(4–5.38) | 4.69(4.11–5.46) | 0.562 |
| Triglyceride, mmol/L | 1.2(0.9–1.76) | 1.2(0.89–1.7) | 1.33(0.88–2.31) | 0.224 |
| HDL-C, mmol/L | 1.27(1.05–1.53) | 1.24(1.03–1.55) | 1.35(1.09–1.53) | 0.338 |
| LDL-C, mmol/L | 2.82(2.32–3.5) | 2.89(2.34–3.48) | 2.76(2.22–3.51) | 0.752 |
| Blood glucose | 4.85(4.42–5.1) | 4.85(4.4–5.1) | 4.79(4.44–5.11) | 0.731 |
| TP, g/L | 67(62–71.53) | 67.3(62.98–71.98) | 65.5(59.38–70.35) | 0.031 |
| Serum albumin, g/L | 40.9(37.18–43.9) | 41.1(37.7–44.15) | 38.8(34.28–42.7) | 0.005 |
| Serum IgA, g/L | 3.05(2.46–3.5) | 3.05(2.45–3.54) | 3.05(2.47–3.49) | 0.959 |
| Serum C3, g/L | 1.02(0.9–1.11) | 1.02(0.9–1.1) | 1.02(0.93–1.14) | 0.585 |
| SBP, mmHg | 120(110–130) | 120(110–130) | 121.5(112.75–134) | 0.149 |
| DBP, mmHg | 79.5(70–86) | 80(70–85) | 78.5(70–89.25) | 0.419 |
| MAP, mmHg | 93.17(84.25–100) | 92.84(83.67–100) | 93.67(86.5–102.84) | 0.239 |
| Hypertension (%) | 116(31) | 93(30.19) | 23(34.85) | 0.458 |
| Diabetes (%) | 6(1.6) | 4(1.3) | 2(3.03) | 0.287 |
| Hepatitis (%) | 28(7.5) | 26(8.44) | 2(3.03) | 0.195 |
| CVD (%) | 1(0.3) | 1(0.32) | 0(0) | 1 |
| Smoke (%) | 15(4) | 12(3.9) | 3(4.55) | 0.735 |
| Alcohol (%) | 10(2.7) | 9(2.92) | 1(1.52) | 1 |
| M1 (%) | 308(82.4) | 249(80.84) | 59(89.39) | 0.098 |
| E1 (%) | 50(13.4) | 39(12.66) | 11(16.67) | 0.386 |
| S1 (%) | 199(53.2) | 157(50.97) | 42(63.63) | 0.061 |
| T1 (%) | 46(12.3) | 35(11.36) | 11(16.67) | 0.209 |
| T2 (%) | 5(1.3) | 3(0.97) | 2(3.03) | 0.197 |
| C1 (%) | 105(28.1) | 85(27.6) | 20(30.3) | 0.556 |
| C2 (%) | 14(3.7) | 10(3.25) | 4(6.06) | 0.42 |
| RAAS blockade (%) | 247(66) | 206(66.9) | 41(62.1) | 0.458 |
| Immunosuppressant (%) | 140(37.4) | 108(35.1) | 32(48.5) | 0.041 |
The demographic, clinical, laboratory data and treatment of the IgAN patients. C3, complement 3; TP, total protein; MAP, mean arterial pressure; eGFR, estimated glomerular filtration rate; LDL-C, low density lipoprotein cholesterol; HDL-C, high density lipoprotein cholesterol; BUN, blood urea nitrogen; SBP, Systolic blood pressure; DBP, Diastolic blood pressure; CVD, cardiovascular disease, RAAS, renin-angiotensin-aldosterone system. Immunosuppressants include Steroids, cyclophosphamide, ciclosporin, mycophenolate mofetil and others.
* P < 0.05
Fig 3Contribution of the included features of the combined event in IgAN patients.
HDL-C, High density lipoprotein cholesterol, LDL-C, Low density Lipoprotein cholesterol, TP, Total serum protein.
Predictors selected using random forest and the corresponding feature importance score.
| Features | Importance score |
|---|---|
| Baseline eGFR, ml/min per 1.73m2 | 0.066177 |
| Serum creatine, mmol/L | 0.059347 |
| Serum triglycerides, mmol/L | 0.054830 |
| Proteinuria, g/d | 0.049275 |
| MAP, mm Hg | 0.043798 |
| Hematuria (red blood cells/high-power field) | 0.043790 |
| Serum C3, g/L | 0.043743 |
| Age at biopsy, years | 0.036900 |
| Crescent proportion of glomeruli, % | 0.013346 |
| Global crescent proportion of glomeruli, % | 0.006574 |
eGFR, estimated glomerular filtration rate; MAP, mean arterial pressure; C3, complement 3.
Fig 4Receiver operating characteristic (ROC) curves of the three candidate models for the prognosis of IgAN.
AUC, area under the curve.
Fig 5Precision-recall curves of the three candidate models for the prognosis of IgAN.
Fig 6Receiver operating characteristic (ROC) curves of the three candidate models for the prognosis of IgAN without ’Crescent proportion’ and ’Global crescent proportion’.
Fig 7Precision-recall curves of the three candidate models for the prognosis of IgAN without ’Crescent proportion’ and ’Global crescent proportion’.
Summary of the comparison of IgAN with ’Crescent proportion’ and ’Global crescent proportion’ model performance.
| Prediction model | Precision | Recall | F1-score | Accuracy | AUROC | AUPRC |
|---|---|---|---|---|---|---|
| Support Vector Machine | 0.77 | 0.77 | 0.73 | 0.77 | 0.7957 | 0.765 |
| Random Forest | 0.69 | 0.70 | 0.61 | 0.70 | 0.6443 | 0.472 |
| Naïve Bayes | 0.74 | 0.74 | 0.69 | 0.74 | 0.7078 | 0.637 |
Summary of the comparison of IgAN without ’Crescent proportion’ and ’Global crescent proportion’ model performance.
| Prediction model | Precision | Recall | F1-score | Accuracy | AUROC | AUPRC |
|---|---|---|---|---|---|---|
| Support Vector Machine | 0.78 | 0.68 | 0.56 | 0.68 | 0.831 | 0.716 |
| Random Forest | 0.65 | 0.68 | 0.63 | 0.68 | 0.7041 | 0.567 |
| Naïve Bayes | 0.70 | 0.72 | 0.70 | 0.72 | 0.5959 | 0.567 |
Fig 8The Lift curve with Support Vector Machine model.
“Class 0” indicates IgAN patients with none conbined endpoint progression, and “Class 1” indicates IgAN patients with the conbined endpoint progression.
Fig 9Calibration plots of the three candidate models for the prognosis of IgAN with ’Crescent proportion’ and ’Global crescent proportion’.
Fig 10Calibration plots of the three candidate models for the prognosis of IgAN without ’Crescent proportion’ and ’Global crescent proportion’.