| Literature DB >> 26379761 |
Fen Miao1, Yun-Peng Cai2, Yu-Xiao Zhang3, Ye Li2, Yuan-Ting Zhang4.
Abstract
Existing models for predicting mortality based on traditional Cox proportional hazard approach (CPH) often have low prediction accuracy. This paper aims to develop a clinical risk model with good accuracy for predicting 1-year mortality in cardiac arrhythmias patients using random survival forest (RSF), a robust approach for survival analysis. 10,488 cardiac arrhythmias patients available in the public MIMIC II clinical database were investigated, with 3,452 deaths occurring within 1-year followups. Forty risk factors including demographics and clinical and laboratory information and antiarrhythmic agents were analyzed as potential predictors of all-cause mortality. RSF was adopted to build a comprehensive survival model and a simplified risk model composed of 14 top risk factors. The built comprehensive model achieved a prediction accuracy of 0.81 measured by c-statistic with 10-fold cross validation. The simplified risk model also achieved a good accuracy of 0.799. Both results outperformed traditional CPH (which achieved a c-statistic of 0.733 for the comprehensive model and 0.718 for the simplified model). Moreover, various factors are observed to have nonlinear impact on cardiac arrhythmias prognosis. As a result, RSF based model which took nonlinearity into account significantly outperformed traditional Cox proportional hazard model and has great potential to be a more effective approach for survival analysis.Entities:
Mesh:
Year: 2015 PMID: 26379761 PMCID: PMC4562335 DOI: 10.1155/2015/303250
Source DB: PubMed Journal: Comput Math Methods Med ISSN: 1748-670X Impact factor: 2.238
Baseline characteristics of dead and alive patients during one-year followup.
| Characteristics | Dead | Alive |
|
|---|---|---|---|
|
| 3452 | 7036 | |
| Demographics | |||
| Age, years | 75.32 (12.84) | 71.94 (23.596) | <0.001 |
| Gender, male | 59% | 57% | 0.1 |
| BMI, kg/m2 | 22.76 (102) | 33.48 (2588) | <0.001 |
| Clinical variables | |||
| Arrhythmias type | |||
| CA | 518 (15%) | 352 (5%) | <0.001 |
| VF | 138 (4%) | 211 (3%) | 0.459 |
| VT | 390 (11%) | 844 (12%) | 0.095 |
| AF | 2727 (79%) | 5488 (78%) | 0.281 |
| Slow arrhythmias | 276 (8%) | 1055 (15%) | <0.001 |
| HF | 1484 (43%) | 2110 (30%) | <0.001 |
| Myocardial infraction | 621 (18%) | 1266 (18%) | 0.867 |
| Bundle branch block | 18 (0.5%) | 38 (0.6%) | 0.168 |
| Valvular heart diseases | 517 (15%) | 915 (13%) | 0.088 |
| Stroke | 424 (12%) | 633 (9%) | 0.01 |
| Hypertension | 932 (27%) | 2251 (32%) | <0.001 |
| Acute pulmonary heart disease | 79 (2%) | 141 (2%) | 0.238 |
| Chronic pulmonary heart disease | 178 (5%) | 352 (5%) | 0.277 |
| Uncomplicated diabetes | 690 (20%) | 1547 (22%) | 0.001 |
| Complicated diabetes | 242 (7%) | 352 (5%) | 0.001 |
| Hypothyroidism | 345 (10%) | 704 (10%) | 0.143 |
| Renal failure | 483 (14%) | 422 (6%) | <0.001 |
| Liver disease | 138 (4%) | 141 (2%) | <0.001 |
| Laboratory variables | |||
| K, mEq/L | 4.81 (0.89) | 4.96 (0.94) | <0.001 |
| NA, mEq/L | 139.21 (4.25) | 138.85 (2.93) | <0.001 |
| WBC, K/ | 19.79 (13.88) | 16.1 (8.13) | <0.001 |
| RBC, K/ | 4.09 (0.75) | 4.15 (0.58) | <0.001 |
| ALT, IU/L | 185.57 (659) | 87.54 (372) | <0.001 |
| AST, IU/L | 345.31 (1388) | 125.49 (631) | <0.001 |
| CKPK, IU/L | 711.67 (4944) | 532.99 (2061) | 0.017 |
| SCR, mg/dL | 2.49 (2.03) | 1.8 (1.76) | <0.001 |
| BUN, mg/dL | 56.64 (34.15) | 38.68 (26.22) | <0.001 |
| Glucose, mg/dL | 196.91 (85.41) | 187.57 (58.71) | <0.001 |
| PT, seconds | 22.97 (15.88) | 20.78 (12.69) | <0.001 |
| INR | 3.29 (5.49) | 2.49 (2.88) | <0.001 |
| PTT, seconds | 75.79 (47.99) | 69.10 (43.64) | <0.001 |
| BR, mg/dL | 2.08 (4.66) | 1.09 (2.02) | <0.001 |
| Medications | |||
| Class I agents | 86 (2.4%) | 112 (1.6%) | 0.006 |
| Class II agents | 2481 (72%) | 5366 (76%) | <0.001 |
| Class III agents | 153 (4.4%) | 781 (11%) | <0.001 |
| Class IV agents | 810 (23%) | 1205 (17%) | <0.001 |
| Class V agents | 2255 (65%) | 4622 (66%) | 0.716 |
C-statistics for comprehensive model and simplified model with different methods.
| Model | Method | ||
|---|---|---|---|
| RSF | CPH |
| |
| Comprehensive model | 0.810 | 0.733 | <0.01 |
| Simplified model | 0.799 | 0.718 | |
Figure 1Minimal depth from RSF analysis. Horizontal line is threshold for separating predictive variables that are below the line. The diameter of each circle is in proportion to the forest-averaged number of maximal subtrees for that variable: 1: cardiac arrest, 2: log of BUN, 3: log of BMI, 4: log of AST, 5: log of age, 6: log of SCR, 7: log of BR, 8: log of K, 9: log of WBC, 10: log of ALT, 11: log of NA, 12: log of CKPK, 13: class II agents, 14: log of glucose, 15: log of INR, 16: CHF, 17: renal failure, 18: log of RBC, 19: log of PTT, 20: class V agents, 21: log of PT, 22: stroke, 23: sex, 24: AF, 25: class IV agents, 26: myocardial infarction, 27: hypertension, 28: uncomplicated diabetes, 29: valvular heart disease, 30: slow arrhythmias, 31: VT, 32: VF, 33: hypothyroidism, 34: complicated diabetes, 35: class III agents, 36: liver disease, 37: chronic pulmonary heart disease, 38: acute pulmonary heart disease, 39: class I agents, 40: bundle branch block.
Figure 2Error rates with simplified RSF for ensemble cumulative hazard function and VIMP for predictors. ClsII indicates class II agents. log∗ indicates log of index.
Figure 3(a) Ensemble survival function for each individual. Red line is overall ensemble survival, while green line is Nelson-Aalen estimator. (b) Comparison of the population ensemble survival function and the Nelson-Aalen estimator.
Cox proportional hazard model with comprehensive risk factors.
| Predictors | Coefficient |
| HR | 95.0% CI | |
|---|---|---|---|---|---|
| Lower | Upper | ||||
| Demographics | |||||
| log age |
|
|
|
|
|
| log BMI |
|
|
|
|
|
| Clinical risk factors | |||||
| Cardiac arrest |
|
|
|
|
|
| Slow arrhythmias | −.420 | <0.001 | 0.657 | 0.572 | 0.755 |
| CHF | .119 | 0.002 | 1.126 | 1.042 | 1.216 |
| Myocardial infraction | .168 | 0.001 | 1.182 | 1.069 | 1.307 |
| Stroke | .340 | <0.001 | 1.405 | 1.266 | 1.560 |
| Renal failure | .243 | <0.001 | 1.275 | 1.132 | 1.437 |
| Laboratory risk factors | |||||
| log K |
|
|
|
|
|
| log WBC | .266 | <0.001 | 1.305 | 1.210 | 1.408 |
| log RBC | −.443 | 0.001 | .642 | .494 | .834 |
| log BUN |
|
|
|
|
|
| log glucose | .187 | 0.002 | 1.205 | 1.071 | 1.356 |
| log CKPK | −.106 | <0.001 | .900 | .872 | .928 |
| log AST | .379 | <0.001 | 1.460 | 1.357 | 1.572 |
| log ALT | −.255 | <0.001 | .775 | .722 | .832 |
| log PT | −.447 | <0.001 | .640 | .523 | .782 |
| log INR | .334 | <0.001 | 1.397 | 1.228 | 1.590 |
| log BR | .107 | <0.001 | 1.113 | 1.062 | 1.165 |
| Medications | |||||
| Class I agents | 0.376 | 0.002 | 1.456 | 1.147 | 1.849 |
| Class II agents | −0.316 | <0.001 | 0.729 | 0.670 | 0.793 |
| Class III agents |
|
|
|
|
|
| Class V agents | −0.203 | <0.001 | 0.816 | 0.754 | 0.883 |
HR: hazard ratio; CI: confidence level; log ∗ indicates log of variables.
Figure 4Ensemble mortality against given continuous variables. Mortality is presented in terms of total death number. Points colored with blue correspond to events, while black ones correspond to censored observations. log∗ indicates log of index.
Figure 5RSF-estimated 1-year mortality as a function of BUN, AST, and BMI. Smoothed curves are computed based on the estimated mortality for each patient.
Figure 6Estimated error rate with comprehensive RSF for different grown trees.