| Literature DB >> 33246427 |
Zvi Segal1, Dan Kalifa1, Kira Radinsky1, Bar Ehrenberg1, Guy Elad1, Gal Maor1, Maor Lewis1, Muhammad Tibi1, Liat Korn2, Gideon Koren3.
Abstract
BACKGROUND: End stage renal disease (ESRD) describes the most severe stage of chronic kidney disease (CKD), when patients need dialysis or renal transplant. There is often a delay in recognizing, diagnosing, and treating the various etiologies of CKD. The objective of the present study was to employ machine learning algorithms to develop a prediction model for progression to ESRD based on a large-scale multidimensional database.Entities:
Keywords: Algorithm; End stage renal disease; Machine learning; Prediction model
Year: 2020 PMID: 33246427 PMCID: PMC7693522 DOI: 10.1186/s12882-020-02093-0
Source DB: PubMed Journal: BMC Nephrol ISSN: 1471-2369 Impact factor: 2.388
Fig. 1Natural Language Processing with Word2vec algorithm for feature embedding
Comparison of calculated features between ESRD positive and ESRD negative patients (performed on all the data)
| Feature | Control group | Case group | |
|---|---|---|---|
| Acute kidney injury (AKI) episodes (per year) | 0.43[±2.01] | 0.88[±2.83] | < 0.001 |
| Electrolyte imbalance events (per year) | 0.07[±0.41] | 0.14[±0.72] | < 0.001 |
| Fluid retention events (per year) | 0.01[±0.30] | 0.04[±0.56] | < 0.001 |
| Urinalysis exams (per year) | 1.39[±1.99] | 1.72[±2.50] | < 0.001 |
| Kidney biopsies (per year) | 0.001[±0.029] | 0.004[±0.050] | < 0.001 |
| Days under ACEi treatment | 66.83 ± [107.48] | 55.97 ± [93.83] | < 0.001 |
| Hospitalizations | 1.10[±2.40] | 1.45[±2.83] | < 0.001 |
| Hypertensive crisis episodes (per year) | 0.44[±0.99] | 1.39[±2.37] | < 0.001 |
| Loop diuretics prescriptions (per year) | 0.70[±1.86] | 0.97[±1.97] | < 0.001 |
| Lab proteinuria | 0.11[±0.51] | 0.20[±0.69] | < 0.001 |
| Hyperparathyroidism | 0.05[±0.33] | 0.13[±0.56] | < 0.001 |
| Phosphorus abnormalities | 0.0008[±0.020] | 0.0127[±0.215] | < 0.001 |
| Chronic nephritic syndrome | 0.001[±0.031] | 0.012[±0.19] | < 0.001 |
| Non-nephrogenic complications of diabetes | 0.58[±2.16] | 0.81[±2.75] | < 0.001 |
| CHF (percent positives) | 19.8% | 27.3% | < 0.001 |
| Stroke (percent positives) | 8.3% | 10.5% | 0.003 |
| Ischemic heart disease (percent positive) | 31.9% | 39.3% | < 0.001 |
| Myocardial infarction | 36.4% | 56.1% | < 0.001 |
| Anemia | 35.0% | 45.8% | < 0.001 |
| Obesity | 18.5% | 12.3% | < 0.001 |
| Sex–Female | 50% | 47.1% | 0.08 |
| Sex–Male | 50% | 52.9% | 0.08 |
| Age (years) | 70.72 [±13.12] | 70.00 [±11.95] | 0.83 |
| CKD Stage 1, 6 months before index data | 3.67% | 0.82% | < 0.001 |
| CKD Stage 2, 6 months before index data | 10.00% | 3.09% | < 0.001 |
| CKD Stage 3, 6 months before index data | 47.04% | 35.01% | < 0.001 |
| CKD Stage 4, 6 months before index data | 8.14% | 52.55% | < 0.001 |
Fig. 2a: ROC Curve. Summary of the results of the XGBoost model. The C-statistic for the model was 0.93 (95% confidence intervals for the C-statistic are [0.916–0.943]), with a sensitivity of 0.715 and specificity of 0.958. Positive Predictive Value (PPV) was 0.517 and Negative Predictive Value (NPV) was 0.981. b: Precision Recall Curve
Subgroup analysis. Patients were divided into subgroups based on the following criteria: early (Stages 1–2)/ late (Stages 3-4) CKD stage, younger (under 60)/older (over 60) age, and gender so that each patient was ultimately referenced to one of eight possible different subgroups. The final trained model was implemented on each subgroup
| Subgroup size | Positive cases | C- statistics | Sensitivity | Specificity | PPV | NPV | |
|---|---|---|---|---|---|---|---|
| Males ckd S3/S3. Age 60+ | 1784 | 164 | 0.919 | 0.756 | 0.931 | 0.528 | 0.974 |
| Males,ckd S3/S4, Age 60- | 348 | 44 | 0.878 | 0.659 | 0.908 | 0.509 | 0.948 |
| Males ckd S1/S2 Age 60+ | 1061 | 16 | 0.925 | 0.625 | 0.983 | 0.357 | 0.995 |
| Males ckd S1/S2, Age 60- | 559 | 5 | 0.968 | 0.600 | 0.982 | 0.231 | 0.996 |
| Females ckd S3/S4 Age 60+ | 1862 | 152 | 0.918 | 0.711 | 0.944 | 0.529 | 0.973 |
| Females ckd S3/S4 Age 60- | 230 | 34 | 0.891 | 0.765 | 0.913 | 0.605 | 0.957 |
| Females ckd S1/S2 Age 60+ | 1064 | 13 | 0.906 | 0.765 | 0.991 | 0.526 | 0.997 |
| Females ckd S1/S2 Age 60- | 426 | 10 | 0.918 | 0.300 | 0.993 | 0.500 | 0.983 |
Feature importance analysis
| Feature | Feature importance |
|---|---|
| Age | 0.030 |
| CKD stage | 0.018 |
| Hypertensive crisis events per year | 0.016 |
| Recently diagnosed hypertension | 0.013 |
| Total drug prescriptions per year | 0.010 |
| Total cost of outpatient specialist visits per year | 0.007 |
| Annual medication costs | 0.006 |
| Hypertensive nephropathy | 0.006 |
| Recently diagnosed hyperlipidemia | 0.006 |
| Time gap between last CKD stage diagnosis to most recent | 0.004 |
| Number of urinalysis tests per year | 0.004 |
| Ever diagnosis of hypertension | 0.004 |
| Total cost of ER and inpatient visits per year | 0.003 |
| Total annual claims costs | 0.003 |
| Acute kidney injury events per year | 0.002 |
| Anemia of CKD | 0.002 |
| Recently diagnosed diabetes | 0.002 |
This analysis performed on the final trained model demonstrated age to be the most important differentiating factor, followed by the highest CKD stage diagnosed during the eligibility period, the annual count of hypertensive crisis diagnoses, and the presence of newly diagnosed (in the past year) hypertension