| Literature DB >> 32366838 |
Junhyug Noh1, Kyung Don Yoo2, Wonho Bae3, Jong Soo Lee2, Kangil Kim4, Jang-Hee Cho5, Hajeong Lee6, Dong Ki Kim6,7, Chun Soo Lim7,8, Shin-Wook Kang9, Yong-Lim Kim5, Yon Su Kim6,7, Gunhee Kim10, Jung Pyo Lee11,12.
Abstract
Herein, we aim to assess mortality risk prediction in peritoneal dialysis patients using machine-learning algorithms for proper prognosis prediction. A total of 1,730 peritoneal dialysis patients in the CRC for ESRD prospective cohort from 2008 to 2014 were enrolled in this study. Classification algorithms were used for prediction of N-year mortality including neural network. The survival hazard ratio was presented by machine-learning algorithms using survival statistics and was compared to conventional algorithms. A survival-tree algorithm presented the most accurate prediction model and outperformed a conventional method such as Cox regression (concordance index 0.769 vs 0.745). Among various survival decision-tree models, the modified Charlson Comorbidity index (mCCI) was selected as the best predictor of mortality. If peritoneal dialysis patients with high mCCI (>4) were aged ≥70.5 years old, the survival hazard ratio was predicted as 4.61 compared to the overall study population. Among the various algorithm using longitudinal data, the AUC value of logistic regression was augmented at 0.804. In addition, the deep neural network significantly improved performance to 0.841. We propose machine learning-based final model, mCCI and age were interrelated as notable risk factors for mortality in Korean peritoneal dialysis patients.Entities:
Mesh:
Year: 2020 PMID: 32366838 PMCID: PMC7198502 DOI: 10.1038/s41598-020-64184-0
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Baseline characteristics of study patients for peritoneal dialysis.
| Variables | All (N = 1730)(%) | Non-survivor (N = 343)(%) | Survivor (N = 1387)(%) | P |
|---|---|---|---|---|
| Age (years) | 52.7 ± 12.6 | 62.3 ± 10.8 | 50.3 ± 11.8 | <0.001 |
| Sex (male) | 991 (57.3) | 217 (63.3) | 774 (55.8) | 0.012 |
| BMI (kg/m2) | 23.3 ± 3.2 | 23.5 ± 3.2 | 23.2 ± 3.2 | 0.133 |
| Primary renal disease | <0.001 | |||
| Diabetes | 617 (35.7) | 179 (52.2) | 438 (31.6) | |
| Hypertension | 382 (22.1) | 61 (17.8) | 321 (23.1) | |
| Glomerulonephritis | 292 (16.9) | 30 (8.7) | 262 (18.9) | |
| Cystic kidney disease | 32 (1.8) | 6 (1.7) | 26 (1.9) | |
| Unknown | 117 (6.8) | 29 (8.5) | 88 (6.3) | |
| Others | 290 (16.8) | 38 (11.1) | 252 (18.2) | |
| History of CVD | 461 (26.6) | 149 (43.4) | 312 (22.5) | <0.001 |
| History of DM | 712 (41.2) | 209 (60.9) | 503(36.3) | <0.001 |
| Dialysis duration (months) | 59.1 ± 46.3 | 59.9 ± 46.2 | 58.9 ± 46.3 | 0.727 |
| Smoking history (%) | 151 (8.7) | 18 (5.2) | 133 (9.6) | 0.011 |
| Modified CCI | 4.4 ± 2.0 | 5.9 ± 2.0 | 4.0 ± 1.8 | <0.001 |
| Use of RAAS blockade | 963 (55.7) | 191 (55.7) | 772 (55.7) | 0.993 |
| Systolic BP (mmHg) | 133 ± 21 | 132 ± 21 | 134 ± 21 | 0.134 |
| Diastolic BP (mmHg) | 79 ± 22 | 76 ± 13 | 80 ± 23 | 0.005 |
| Hemoglobin (g/dL) | 10.1 ± 3.4 | 10.3 ± 1.5 | 10.1 ± 3.7 | 0.344 |
| BUN | 64.5 ± 31.3 | 55.8 ± 28.7 | 66.5 ± 31.6 | <0.001 |
| Creatinine | 8.9 ± 4.5 | 7.7 ± 3.9 | 9.1 ± 4.5 | <0.001 |
| Calcium | 8.4 ± 0.9 | 8.5 ± 0.9 | 8.3 ± 1.0 | 0.070 |
| Phosphorus | 5.2 ± 1.6 | 4.7 ± 1.5 | 5.3 ± 1.6 | <0.001 |
| Uric acid | 7.4 ± 2.0 | 7.0 ± 1.9 | 7.5 ± 2.0 | <0.001 |
| Total cholesterol | 171 ± 44 | 168 ± 43 | 172 ± 44 | 0.195 |
| Albumin | 3.69 ± 2.19 | 3.67 ± 3.43 | 3.69 ± 1.79 | 0.908 |
| Intact-PTH | 290.6 ± 297.7 | 247.8 ± 282.1 | 300.8 ± 300.3 | 0.016 |
| β2-microglobulin | 51.2 ± 82.3 | 48.9 ± 70.6 | 51.8 ± 85.3 | 0.697 |
CVD, cardiovascular disease; DM, diabetes mellitus; MCCI, modified Charlson comorbidity index; RAAS, renin-angiotensin-aldosterone system.; SBP, systolic blood pressure; DBP, diastolic blood pressure; BUN, blood urea nitrogen; PTH, parathyroid hormone.
*Values are presented as n (%) for categoric variables, and mean ± standard deviation for continuous variables.
Figure 1Patients’ follow up after peritoneal dialysis initiation. (A) Number of patients at the year of follow up at 1 year after peritoneal dialysis (PD) initiation (B) Ratio of non-survivor and survivor at 1 year after PD initiation.
Figure 2Model structure.
Performance of the 5-year classification model without weighting methods in PD patients.
| Validation method | Validation ratio | Test set size | Main algorithm | Hyperparameters* | Test performance |
|---|---|---|---|---|---|
| One validation | 0.285 | 95 | Bagging | nbagg = 10 | 0.6863 |
| Cross-validation | 95 | Bagging | nbagg = 70 | 0.7407 | |
| Cross-validation | 95 | Decision tree | cp = -1 | 0.7222 | |
| One validation | 0.285 | 95 | Lasso | lambda = 2e-04 | 0.8105 |
| Cross-validation | 95 | Lasso | lambda = 0.05 | 0.7979 | |
| 95 | Logistic regression | Nothing | 0.8219 | ||
| One validation | 0.285 | 95 | Random forest | ntree = 800 | 0.7258 |
| Cross-validation | 95 | Random forest | ntree = 1000 | 0.7535 | |
| One validation | 0.285 | 95 | Ridge | /lambda = 2e-04 | 0.8105 |
| Cross-validation | 95 | Ridge | /lambda = 0.09 | 0.8164 |
Test ratio fix 0.3, and test performance were presented asAUC.
*We add explanation of the hyperparameters in the supplementary material.
Figure 3The 5-year mortality prediction after PD initiation using a decision tree (DT) model. The 5-year mortality of prediction rate is reported as a percentage (%). Decision tree for the training, test and validation data set, after stratified sampling, with ‘Y’ indicating a positive conclusion and ‘N’ a negative conclusion.
Performance of the prediction models for mortality by survival statistics without imputation methods in PD patients.
| Validation method | Validation ratio | Test set size | Main algorithm | Hyperparameters* | Test performance |
|---|---|---|---|---|---|
| Cross-validation | 357 | Survival tree | cp = 0.016 | 0.7914 | |
| Cross-validation | 357 | Survival ridge | 0.7610 | ||
| One validation set | 0.285 | 357 | Survival ridge | 0.7593 | |
| Cross-validation | 357 | Survival random forest | splitrule = logrank | 0.7479 | |
| One validation set | 0.285 | 357 | Survival random forest | splitrule = logrank | 0.7477 |
| Cross-validation | 357 | Survival Lasso | lambda = 0.05 | 0.7599 | |
| One validation set | 0.285 | 357 | Survival Lasso | lambda = 0.07 | 0.7576 |
| Cross-validation | 357 | Survival bagging | nbagg = 150 | 0.7631 | |
| One validation set | 0.285 | 357 | Survival bagging | nbagg = 10 | 0.7505 |
| 357 | Cox regression | Nothing | 0.7458 |
Test ratio fix 0.3, and test performance were presented as concordance index. *We add explanation of the hyperparameters in the supplementary material.
Figure 4The patients’ survival prediction after PD initiation using survival hazard ratio modeling. The relative mortality risk is presented as a survival hazard ratio (HR). The survival decision tree for the training, test and validation data set, after stratified sampling, with ‘Y’ indicating a positive conclusion and ‘N’ a negative conclusion.
Cox regression analysis for the mortality according to the modified Charlson comorbidity index (mCCI) group in the high risk patients.
| Subgroup | Model | Univariate | Multivariate | ||||
|---|---|---|---|---|---|---|---|
| HR | 95% CI | P | HR | ||||
| Overall group | Low | Ref. | Ref. | ||||
| Moderate | 6.135 | (3.318, 11.342) | <0.001 | 2.700 | (0.911, 7.997) | 0.073 | |
| High | 17.708 | (9.632, 32.557) | <0.001 | 4.618 | (1.398, 15.256) | 0.012 | |
| Male group | Low | Ref. | Ref. | ||||
| Moderate | 3.456 | (1.742, 6.905) | <0.001 | 1.549 | (0.509, 4.713) | 0.441 | |
| High | 9.236 | (4.691, 18.186) | <0.001 | 2.565 | (0.735, 8.957) | 0.140 | |
| Age ≥ 60 group | Low | Ref. | Ref. | ||||
| Moderate | 0.528 | (0.073, 3.804) | 0.526 | 0.417 | (0.050, 3.496) | 0.420 | |
| High | 0.988 | (0.138, 7.068) | 0.990 | 1.031 | (0.116, 9.138) | 0.978 | |
| History of DM group | Low | Ref. | Ref. | ||||
| Moderate | 1.506 | (0.208, 10.888) | 0.685 | 0.522 | (0.067, 4.075) | 0.535 | |
| High | 4.063 | (0.568, 29.057) | 0.162 | 0.757 | (0.096, 5.998) | 0.757 | |
| Low albumin group (≤3.5 g/dl) | Low | Ref. | Ref. | ||||
| Moderate | 2.792 | (0.974, 8.004) | 0.056 | 1.726 | (0.468, 6.364) | 0.412 | |
| High | 9.182 | (3.342, 25.229) | <0.001 | 3.114 | (0.744, 13.027) | 0.120 | |
mCCI group; Low, mCCI score 0–2; Moderate, mCCI score 3–5; High, mCCI score ≥6.
Multivariate analysis was done with adjustment confounding including such as age, sex, primary renal disease, smoking history, dialysis duration, BUN, systolic BP, BMI, Hb, Calcium, history of DM, CVD, usage of RAAS blockade, and serum albumin.
CVD, cardiovascular disease; DM, diabetes mellitus; RAAS, renin-angiotensin-aldosterone system.; SBP, systolic blood pressure; DBP, diastolic blood pressure; BUN, blood urea nitrogen; PTH, parathyroid hormone.
Figure 5The 5-year mortality prediction after PD initiation using a decision tree (DT) model with repeated measured data. The repeated measured data include 24-hour urine volume, RAAS blockade use, and dialysis efficiency (weekly KT/V). The 5-year mortality of prediction rate is reported as a percentage (%). Decision tree for the training, test and validation data set, after stratified sampling, with ‘Y’ indicating a positive conclusion and ‘N’ a negative conclusion.
Performance of the 5-year prediction model by deep neural network with autoencoder tree using repeated measured parameter.
| Imputation method | Validation method | Validation ratio | Test set size | Main algorithm | Hyperparameters* | Test performance |
|---|---|---|---|---|---|---|
| MICE/CART | 174 | Logistic Regression | Nothing | 0.8045 | ||
| MICE/CART | One validation set | 0.1 | 174 | Ridge | 0.8236 | |
| MICE/CART | One validation set | 0.1 | 174 | Lasso | lambda = 0.02 | 0.8193 |
| MICE/CART | One validation set | 0.1 | 174 | Bagging | nbagg = 30 | 0.8332 |
| MICE/CART | One validation set | 0.1 | 174 | Random Forest | ntree = 500 | 0.8381 |
| MICE/CART | One validation set | 0.1 | 174 | Neural Networks | hunits = [8] | 0.8066 |
| Autoencoder | One validation set | 0.1 | 174 | Neural Networks | AE hunits = [16, 16]/FC hunits = [16] | 0.8419 |
| Autoencoder | One validation set | 0.1 | 174 | Long Short-Term Memory networks | LSTM hunits = 16/AE hunits = [16, 16]/FC hunits = [16] | 0.8582 |
Test ratio fix 0.3, Training ratio fix 0.6, and test performance were presented as AUC.
*We add explanation of the hyperparameters in the supplementary material.