| Literature DB >> 34221370 |
Victoria Garcia-Montemayor1, Alejandro Martin-Malo1,2,3, Carlo Barbieri4, Francesco Bellocchio4, Sagrario Soriano1, Victoria Pendon-Ruiz de Mier1,2, Ignacio R Molina1, Pedro Aljama1,2, Mariano Rodriguez1,2,3.
Abstract
BACKGROUND: Besides the classic logistic regression analysis, non-parametric methods based on machine learning techniques such as random forest are presently used to generate predictive models. The aim of this study was to evaluate random forest mortality prediction models in haemodialysis patients.Entities:
Keywords: haemodialysis; machine learning; mortality; predictive models; random forest
Year: 2020 PMID: 34221370 PMCID: PMC8247746 DOI: 10.1093/ckj/sfaa126
Source DB: PubMed Journal: Clin Kidney J ISSN: 2048-8505
FIGURE 1:Cohort selection flow chart. The comparison of mortality models was performed on nine different cohorts that are represented in the dashed boxes. On the left, the number of patients included to evaluate the prediction of mortality at 6 months, 1 year and 2 years. For each mortality prediction period there were three separate analyses according to the minimal period after the first HD session used for the collection of baseline data (input variables: 30, 60 or 90 days).
Baseline comorbidities and biochemistry obtained during the first 30 days of dialysis
| Baseline | Characteristics |
|---|---|
| Gender (male/female), | 953 (61)/618 (39) |
| Age (years), mean ± standard deviation (SD) | 62.33 ± 15.89 |
| Comorbidities, | |
| Diabetes mellitus | 482 (31) |
| Cardiac failure | 319 (20) |
| COPD | 144 (9) |
| Tumoral disease (non-metastatic) | 131 (8) |
| Myocardial infarction | 102 (6) |
| Hepatopathy (non-cirrhotic) | 68 (4) |
| Stroke | 9 (1) |
| Charlson comorbidity index (mean) | 6 |
| Biochemical parameters, mean ± SD | |
| Haemoglobin (g/dL) | 10.08 ± 2.79 |
| Ferritin (ng/mL) | 290.1 ± 362.64 |
| TSI (%) | 18.73 ± 10.32 |
| Creatinine (mg/dL) | 7.3 ± 4.4 |
| Albumin (g/dL) | 3.54 ± 0.55 |
| CRP (median) (mg/L) | 8.8 (IQR: 19.5) |
| Calcium (mg/dL) | 9.04 ± 3.88 |
| Phosphorous (mg/dL) | 5.04 ± 1.66 |
| PTH (pg/mL) | 288.35 ± 297.72 |
| Alkaline phosphatase (UI/L) | 124.88 ± 108.64 |
| Potassium (mEq/L) | 4.91 ± 0.89 |
| Magnesium (mg/dL) | 2.22 ± 0.45 |
| β2-microglobulin (µg/L) | 19.44 ± 8.61 |
| Others | |
| BMI, mean ± SD | 27.1 ± 5.41 |
| Residual diuresis (mL), mean ± SD | 631.73 ± 730.6 |
| Vascular access (catheter), | 830 (53) |
FIGURE 2:Prediction of mortality by random forest analysis. The dashed line represents the value of the AUC of the mortality prediction ROC curve obtained by the random forest regression model. Each dot shows the influence of each variable on the AUC value that is obtained if the effect of the specified variable is turned off. This is achieved by randomly changing the values of the variable in the test set. Each graph represents the result of mortality prediction (at 6 months, 1 year and 2 years) and for each mortality prediction period there were three separate analyses according to the minimal period of days after the first HD session used for the collection of the baseline data (input variables 30, 60 and 90 days). (A) Prediction of mortality at 6 months and a 30-day period after the first HD for baseline data collection. (B) Prediction of mortality at 6 months and a 60-day period after the first HD for baseline data collection. (C) Prediction of mortality at 6 months and a 90-day period after the first HD for baseline data collection. (D) Prediction of mortality at 1 year and a 30-day period after the first HD for baseline data collection. (E) Prediction of mortality at 1 year and a 60-day period after the first HD for baseline data collection. (F) Prediction of mortality at 1 year and a 90-day period after the first HD for baseline data collection. (G) Prediction of mortality at 2 years and a 30-day period after the first HD for baseline data collection. (H) Prediction of mortality at 2 years and a 60-day period after the first HD for baseline data collection. (I) Prediction of mortality at 2 years and a 90-day period after the first HD for baseline data collection.
FIGURE 3:Influence of variables on mortality by logistic regression. The dashed line represents the value of the AUC of the mortality prediction ROC curve obtained by the logistic regression model. Each dot shows the AUC value obtained if the effect of the specified variable is removed. Each graph represents the result of mortality prediction at 6 months, 1 year and 2 years and for each mortality prediction period there were three separate analyses according to the minimal period of days after the first HD session used for the collection of baseline data (input variables 30, 60 and 90 days). (A) Prediction of mortality at 6 months and a 30-day period after the first HD for baseline data collection. (B) Prediction of mortality at 6 months and a 60-day period after the first HD for baseline data collection. (C) Prediction of mortality at 6 months and a 90-day period after the first HD for baseline data collection. (D) Prediction of mortality at 1 year and a 30-day period after the first HD for baseline data collection. (E) Prediction of mortality at 1 year and a 60-day period after the first HD for baseline data collection. (F) Prediction of mortality at 1 year and a 90-day period after the first HD for baseline data collection. (G) Prediction of mortality at 2 years and a 30-day period after the first HD for baseline data collection. (H) Prediction of mortality at 2 years and a 60-day period after the first HD for baseline data collection. (I) Prediction of mortality at 2 years and a 90-day period after the first HD for baseline data collection.
Comparisons of AUCs obtained by random forest and logistic regression
| Prediction of mortality | Number of patients | Deaths | AUC | Difference in AUC (RF − LR) (%) | P-value | ||||
|---|---|---|---|---|---|---|---|---|---|
| Prediction pPeriod | Period (days) after first HD for baseline data collection | Random forest | Logistic regression | ||||||
| AUC (%) | 95% CI | AUC (%) | 95% CI | ||||||
| 6 months | 30 | 1456 | 80 | 70.14 | 67.95–72.33 | 69.01 | 66.8–71.21 | 1.13 | 0.32 |
| 60 | 1432 | 56 | 67.55 | 64.88–70.22 | 66.84 | 64.15–69.52 | 0.71 | 0.61 | |
| 90 | 1419 | 43 | 71.75 | 68.84–74.65 | 67.15 | 64.18–70.13 | 4.60 | 0.18 | |
| 1 year | 30 | 1336 | 166 | 73.31 | 71.8–74.82 | 71.16 | 69.62–72.7 | 2.15 | 0.01* |
| 60 | 1312 | 142 | 73.19 | 71.56–74.81 | 71.22 | 69.57–72.87 | 1.97 | 0.02* | |
| 90 | 1299 | 129 | 72.82 | 71.12–74.52 | 71.94 | 70.22–73.65 | 0.88 | 0.32 | |
| 2 years | 30 | 1244 | 271 | 72.59 | 71.37–73.81 | 68.73 | 67.47–69.99 | 3.86 | <0.001 |
| 60 | 1220 | 247 | 72.42 | 71.14–73.7 | 68.64 | 67.33–69.96 | 3.78 | <0.001 | |
| 90 | 1207 | 234 | 72.06 | 70.75–73.37 | 69.78 | 68.45–71.12 | 2.28 | <0.001 | |
For each mortality prediction period (6 months, 1 year and 2 years), analysis was based on baseline variable values obtained during a minimum number of days after the first HD session: 30 days, 60 days and 90 days.