Literature DB >> 22777999

Regression trees for predicting mortality in patients with cardiovascular disease: what improvement is achieved by using ensemble-based methods?

Peter C Austin1, Douglas S Lee, Ewout W Steyerberg, Jack V Tu.   

Abstract

In biomedical research, the logistic regression model is the most commonly used method for predicting the probability of a binary outcome. While many clinical researchers have expressed an enthusiasm for regression trees, this method may have limited accuracy for predicting health outcomes. We aimed to evaluate the improvement that is achieved by using ensemble-based methods, including bootstrap aggregation (bagging) of regression trees, random forests, and boosted regression trees. We analyzed 30-day mortality in two large cohorts of patients hospitalized with either acute myocardial infarction (N = 16,230) or congestive heart failure (N = 15,848) in two distinct eras (1999-2001 and 2004-2005). We found that both the in-sample and out-of-sample prediction of ensemble methods offered substantial improvement in predicting cardiovascular mortality compared to conventional regression trees. However, conventional logistic regression models that incorporated restricted cubic smoothing splines had even better performance. We conclude that ensemble methods from the data mining and machine learning literature increase the predictive performance of regression trees, but may not lead to clear advantages over conventional logistic regression models for predicting short-term mortality in population-based samples of subjects with cardiovascular disease.
© 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

Entities:  

Mesh:

Year:  2012        PMID: 22777999      PMCID: PMC3470596          DOI: 10.1002/bimj.201100251

Source DB:  PubMed          Journal:  Biom J        ISSN: 0323-3847            Impact factor:   2.207


1 Introduction

Predicting the probability of the occurrence of a binary outcome or event is of key importance in many areas of clinical and health services research. Accurate prediction of the probability of patient outcomes, such as mortality, allows for effective risk stratification of subjects and for the comparison of health care outcomes across different providers. Logistic regression is the most commonly used method for prediction in the biomedical literature. Many clinical investigators are interested in the use of regression trees to predict the probability of the occurrence of an event. Despite studies highlighting the inferior predictive accuracy of regression trees compared to that of logistic regression (Ennis et al., 1998; Austin 2007), some authors continue to express enthusiasm for the use of regression trees (Young and Andrews, 2008). In the data mining and machine learning literature, extensions of classical regression trees have been developed. Many of these methods involve aggregating predictions over an ensemble of regression trees. These methods include bootstrap aggregated (bagged) regression trees, random forests, and boosted regression trees. However, there is a paucity of research into the comparative performance of these methods for predicting clinical outcomes. The objective of the current study was to compare the relative performance of regression trees, ensemble-based methods, and logistic regression for predicting short-term mortality in population-based samples of patients hospitalized with cardiovascular disease.

2 Methods

2.1 Data sources

The Enhanced Feedback for Effective Cardiac Treatment (EFFECT) Study is an initiative to improve the quality of care for patients with cardiovascular disease in Ontario (Tu et al., 2004, 2009). During the first phase (referred to as the EFFECT Baseline sample), detailed clinical data were collected on patients hospitalized with acute myocardial infarction (AMI) and congestive heart failure (CHF) between April 1, 1999 and March 31, 2001 at 86 hospital corporations in Ontario, Canada, by retrospective chart review. During the second phase (referred to as the EFFECT Follow-up sample), data were abstracted on patients hospitalized with these conditions between April 1, 2004 and March 31, 2005 at 81 Ontario hospital corporations. Data on patient demographics, vital signs and physical examination at presentation, medical history, and results of laboratory tests were collected for these samples. In the EFFECT study, data were available on 11,506 and 7889 patients hospitalized with a diagnosis of AMI during the first and second phases of the study, respectively (9945 and 8339 for CHF, respectively). After excluding subjects with missing data on key variables, 9298 and 6932 subjects were available from the first and second phases, respectively (8240 and 7608 for CHF, respectively), for inclusion in the current study. In the current study, the outcome was a binary variable denoting whether the patient died within 30 days of hospital admission. Candidate predictor variables were those variables described in the tables in the appendices.

2.2 Statistical methods for predicting cardiovascular outcomes

We used conventional regression trees, bagged regression trees, random forests, and boosted regression trees to predict the probability of 30-day mortality for patients hospitalized with cardiovascular disease. Readers are referred elsewhere for details on these tree-based methods (Clark and Pregibon, 1993; Freund and Schapire, 1996; Breiman et al., 1998; Friedman et al., 2000; Breiman, 2001; Hastie et al., 2001; McCaffrey et al., 2004; Buhlmann and Hathorn, 2007). For bagged regression trees, a regression tree was grown in each of 100 bootstrap samples. For random forests, 500 regression trees were grown. When fitting random forests of regression trees, we let the size of the set of randomly selected predictor variables used for determining each binary split to be , where p denotes the total number of predictor variables and denotes the floor function (this is the default in the R implementation of random forests). For boosted regression trees, we considered four different base regression models: regression trees of depth one through four (which have also been referred to as regression trees with interaction depths one through four). For boosted regression trees, we considered sequences of 10,000 regression trees. For all methods, we used implementations available in R statistical software (R version 2.11.1, R Foundation for Statistical Computing, Vienna, Austria). We grew conventional regression trees using the function from the rpart package (version 3.1-46). The optimal size of each regression tree was determined using cross-validation using the function. Regression trees were then pruned using the function. For bagging, random forests, and boosted regression trees, we used the function from the ipred package (version 0.8-8), the function from the randomForest package (version 4.5-36), and the function from the gbm package (version 1.6-3.1), respectively. We used two different logistic regression models to predict the probability of 30-day mortality, both of which consisted of only main effects. In the first logistic regression model, all continuous covariates were assumed to have a linear relationship with the log-odds of death. The second logistic regression model used restricted cubic smoothing splines with four knots and three degrees of freedom to model the relationship between continuous covariates and the log-odds of death (Harrell, 2001). For both logistic regression models, all candidate predictors were included in the regression models, and no variable reduction was used. We used the function to estimate the first logistic regression model, while we used the and functions from the Design library (version 2.3-0) to estimate the logistic regression model that incorporated restricted cubic smoothing splines. For comparative purposes, we compared the predictive performance of the above methods with previously developed disease-specific mortality prediction models. The GRACE (Global Registry of Acute Coronary Events) score was derived and validated for predicting mortality in patients hospitalized with acute coronary syndromes (Granger et al., 2003). The score comprises the following variables: Killip Class, systolic blood pressure, heart rate, age, and creatinine level. In the AMI sample, 30-day mortality was regressed on the GRACE score using a univariable logistic regression model (instead of entering the components of the score separately). We used the GRACE score as it has been shown in a recent systematic review to predict mortality in patients with acute coronary syndromes more accurately than other scores (D'Ascenzo et al., 2012). The EFFECT-HF mortality prediction model is a logistic regression model that has been derived and validated for predicting 30-day and one-year mortality in patients hospitalized with CHF (Lee et al., 2003). The model for predicting 30-day mortality uses the following variables: age, systolic blood pressure, respiratory rate, sodium, urea, history of stroke or transient ischemic attack, dementia, chronic obstructive pulmonary disease, cirrhosis of the liver, and cancer. In the CHF sample, 30-day mortality was regressed on the individual variables in the EFFECT-HF model.

2.3 Determining the predictive ability of different regression methods

We examined both the in-sample and out-of-sample predictive accuracy of each method. First, each model was estimated in the EFFECT Baseline sample. Using the fitted model, predictions for each subject were used to calculate the area under the receiver operating characteristic (ROC) curve (abbreviated as the AUC and which is equivalent to the c-statistic (Harrell, 2001; Steyerberg, 2009)), the Scaled Brier's Score, and the generalized R index (Harrell, 2001; Steyerberg, 2009; Steyerberg et al., 2010) (the Scaled Brier Score is Brier's Score scaled by its maximum possible score). We used bootstrap methods, with 100 bootstrap samples, to calculate an optimism-corrected estimate of each measure of predictive accuracy (Efron and Tibshirani, 1993; Steyerberg, 2009). Second, we assessed model performance using the EFFECT Baseline sample as the derivation sample and the EFFECT Follow-up sample as the validation.

2.4 Assessing calibration

We assessed the calibration of predictions obtained in the EFFECT Follow-up sample (the validation sample) using models developed in the EFFECT Baseline sample (the derivation sample) in three different ways. First, the mean predicted probability of death in the validation sample was compared with the observed probability of death in the validation sample to indicate calibration-in-the-large (Steyerberg, 2009). Second, we determined the calibration slope (deviation of the calibration slope from unity denotes miscalibration) (Steyerberg, 2009). The calibration slope assesses deviation between observed and expected probabilities of mortality across the range of predicted risk. It may be used to indicate whether there is a need to shrink predicted probabilities. Third, using the subjects from the validation sample, we used a lowess scatterplot smoother to graphically describe the relationship between observed and predicted mortality (Harrell, 2001; Steyerberg, 2009). Deviation of this calibration plot from a diagonal line with unit slope indicates miscalibration.

2.5 The relationship between continuous predictor variables and the log-odds of mortality

A potential limitation to the use of regression trees is their dichotomization of continuous predictor variables. We examined the relationship between five continuous predictor variables (age, systolic blood pressure, heart rate, glucose, and creatinine) and the log-odds of 30-day mortality in the EFFECT-AMI Baseline sample. For age, we created a synthetic dataset in which age was allowed to take on the percentiles of the distribution of age in the EFFECT Baseline sample, with the value of all the other covariates in this synthetic dataset being set to the sample median in the EFFECT Baseline sample. We used each of the prediction models that were developed in the EFFECT Baseline sample to estimate the log-odds of 30-day mortality for each subject in this synthetic dataset. We repeated this process for the other four continuous variables.

3 Results

3.1 AMI sample

The percentage of patients who died within 30 days of admission did not differ between the EFFECT Baseline sample (10.9%) and the EFFECT Follow-up sample (10.5%) (p = 0.427, Appendices A and B).
Appendix A

Comparison of baseline characteristics between AMI patients who died within 30 days of admission and those who survived for 30 days subsequent to admission in the EFFECT Baseline and Follow-up samples

VariableEFFECT Baseline sample Death within 30 daysp-valueEFFECT Follow-up sample Death within 30 –daysp-value


No (N = 8288)Yes (N = 1010)No (N = 6206)Yes (N = 726)
Age68.0 (56.0–77.0)80.0 (73.0–86.0)<.00168.0 (56.0–78.0)82.0 (74.0–87.0)<.001
Female sex2837 (34.2%)496 (49.1%)<.0012209 (35.6%)354 (48.8%)<.001
Cardiogenic shock50 (0.6%)92 (9.1%)<.0016 (0.1%)14 (1.9%)<.001
Acute congestive heart failure/pulmonary edema390 (4.7%)136 (13.5%)<.001369 (5.9%)110 (15.2%)<.001
Systolic blood pressure148.0 (129.0–170.0)128.5 (106.0–150.0)<.001144.0 (125.0–165.0)123.0 (104.0–146.0)<.001
Diastolic blood pressure83.0 (71.0–96.0)72.5 (60.0–88.0)<.00180.0 (70.0–93.0)70.0 (58.0–84.0)<.001
Heart rate80.0 (67.0–96.0)90.0 (72.0–111.0)<.00181.0 (68.0–97.0)90.0 (74.0–109.0)<.001
Respiratory rate20.0 (18.0–22.0)22.0 (20.0–28.0)<.00120.0 (18.0–21.0)20.0 (18.0–26.0)<.001
Diabetes2094 (25.3%)339 (33.6%)<.0011683 (27.1%)249 (34.3%)<.001
Hypertension3793 (45.8%)493 (48.8%)0.0673599 (58.0%)450 (62.0%)0.039
Current smoker2815 (34.0%)195 (19.3%)<.0011777 (28.6%)100 (13.8%)<.001
Dyslipidemia2676 (32.3%)183 (18.1%)<.0012821 (45.5%)266 (36.6%)<.001
Family history of CAD2693 (32.5%)135 (13.4%)<.0012096 (33.8%)91 (12.5%)<.001
Cerebrovascular disease/TIA761 (9.2%)188 (18.6%)<.001673 (10.8%)183 (25.2%)<.001
Angina2715 (32.8%)358 (35.4%)0.0861823 (29.4%)276 (38.0%)<.001
Cancer234 (2.8%)51 (5.0%)<.00194 (1.5%)22 (3.0%)0.003
Dementia239 (2.9%)129 (12.8%)<.001265 (4.3%)126 (17.4%)<.001
Peptic ulcer disease459 (5.5%)56 (5.5%)0.993285 (4.6%)62 (8.5%)<.001
Previous AMI1863 (22.5%)280 (27.7%)<.0011430 (23.0%)242 (33.3%)<.001
Asthma452 (5.5%)62 (6.1%)0.368384 (6.2%)43 (5.9%)0.779
Depression571 (6.9%)105 (10.4%)<.001593 (9.6%)102 (14.0%)<.001
Peripheral vascular disease593 (7.2%)119 (11.8%)<.001488 (7.9%)107 (14.7%)<.001
Previous revascularization770 (9.3%)78 (7.7%)0.102775 (12.5%)81 (11.2%)0.302
Congestive heart failure326 (3.9%)132 (13.1%)<.001312 (5.0%)102 (14.0%)<.001
Hyperthyroidism96 (1.2%)20 (2.0%)0.02618 (0.3%)≤5 (0.1%)0.458
Aortic stenosis118 (1.4%)41 (4.1%)<.001101 (1.6%)37 (5.1%)<.001
Hemoglobin141.0 (129.0–151.0)128.0 (114.0–143.0)<.001141.0 (127.0–152.0)125.0 (111.0–138.0)<.001
White blood count9.4 (7.6–11.8)11.6 (9.1–15.0)<.0019.6 (7.7–12.1)11.7 (8.9–15.6)<.001
Sodium139.0 (137.0–141.0)139.0 (136.0–141.0)<.001139.0 (137.0–141.0)138.0 (135.0–141.0)<.001
Potassium4.0 (3.7–4.4)4.2 (3.9–4.7)<.0014.0 (3.7–4.4)4.3 (3.9–4.8)<.001
Glucose7.7 (6.3–10.5)9.8 (7.3–14.3)<.0017.5 (6.3–9.9)9.0 (6.8–12.3)<.001
Urea6.3 (5.0–8.2)9.3 (6.6–14.4)<.0016.4 (5.0–8.5)10.2 (7.3–15.2)<.001
Creatinine91.0 (77.0–110.0)120.0 (92.0–171.0)<.00192.0 (79.0–113.0)127.0 (95.0–181.0)<.001

Note: Continuous variables are reported as median (25th percentile–75th percentile); dichotomous variables are reported as N (%).

The Kruskal–Wallis test and the Chi-squared test were used to compare continuous and categorical baseline characteristics, respectively, between patients who died within 30 days of admission and those who did not in each of the EFFECT Baseline and EFFECT Follow-up samples.

Appendix B

Comparison of baseline covariates between AMI patients in the EFFECT Baseline sample and the EFFECT Follow-up sample

VariableEFFECT Baseline sample N = 9298EFFECT Follow-up sample N = 6932p- value
Death within 30 days of admission1010 (10.9%)726 (10.5%)0.427
Age69.0 (57.0–78.0)71.0 (58.0–80.0)<.001
Female sex3333 (35.8%)2563 (37.0%)0.14
Cardiogenic shock142 (1.5%)20 (0.3%)<.001
Acute congestive heart failure/pulmonary edema526 (5.7%)479 (6.9%)0.001
Systolic blood pressure146.0 (126.0–168.0)143.0 (122.0–164.0)<.001
Diastolic blood pressure82.0 (70.0—95.0)80.0 (68.0–92.0)<.001
Heart rate80.0 (68.0—98.0)82.0 (69.0–99.0)0.005
Respiratory rate20.0 (18.0—22.0)20.0 (18.0–22.0)<.001
Diabetes2433 (26.2%)1932 (27.9%)0.015
Hypertension4286 (46.1%)4049 (58.4%)<.001
Current smoker3010 (32.4%)1877 (27.1%)<.001
Dyslipidemia2859 (30.7%)3087 (44.5%)<.001
Family history of CAD2828 (30.4%)2187 (31.5%)0.122
Cerebrovascular disease/TIA949 (10.2%)856 (12.3%)<.001
Angina3073 (33.1%)2099 (30.3%)<.001
Cancer285 (3.1%)116 (1.7%)<.001
Dementia368 (4.0%)391 (5.6%)<.001
Peptic ulcer disease515 (5.5%)347 (5.0%)0.134
Previous AMI2143 (23.0%)1672 (24.1%)0.111
Asthma514 (5.5%)427 (6.2%)0.088
Depression676 (7.3%)695 (10.0%)<.001
Peripheral vascular disease712 (7.7%)595 (8.6%)0.032
Previous revascularization848 (9.1%)856 (12.3%)<.001
Congestive heart failure458 (4.9%)414 (6.0%)0.003
Hyperthyroidism116 (1.2%)19 (0.3%)<.001
Aortic stenosis159 (1.7%)138 (2.0%)0.187
Hemoglobin140.0 (127.0–151.0)139.0 (124.0–151.0)0.024
White blood count9.6 (7.7–12.2)9.8 (7.8–12.4)0.004
Sodium139.0 (137.0–141.0)139.0 (137.0–141.0)<.001
Potassium4.1 (3.7–4.4)4.1 (3.8–4.4)0.828
Glucose7.8 (6.4–10.9)7.6 (6.3–10.3)<.001
Urea6.5 (5.0–8.6)6.6 (5.1–9.1)<.001
Creatinine93.0 (78.0–115.0)94.0 (80.0–119.0)<.001

Note: Continuous variables are reported as median (25th percentile–75th percentile); dichotomous variables are reported as N (%).

The Kruskal–Wallis test and the Chi-squared test were used to compare continuous and categorical baseline characteristics, respectively, between patients in the EFFECT Baseline sample and the EFFECT Follow-up sample.

3.1.1 Comparison of predictive ability of different methods

Regression trees resulted in predicted probabilities of 30-day mortality with the lowest accuracy (Table 1). In the EFFECT Baseline sample, the use of boosted regression trees of depth four resulted in predictions with the greatest accuracy when using the AUC and the Scaled Brier's Score to assess model performance. However, a logistic regression model that incorporated restricted cubic smoothing splines resulted in the greatest out-of-sample predictive accuracy when using the EFFECT Follow-up sample as the validation sample.
Table 1

Measures of predictive accuracy in the AMI samples

ModelApparent performance (EFFECT Baseline)Optimism (bootstrap estimate)Optimism- corrected performance (EFFECT Baseline)EFFECT Follow- up
AUC
 Regression tree0.7680.0130.7550.767
 Bagged trees0.807−0.0050.8120.820
 Random forests0.823−0.0030.8260.843
 Boosted trees—depth one0.8500.0090.8410.841
 Boosted trees—depth two0.8640.0130.8510.848
 Boosted trees—depth three0.8700.0160.8540.851
 Boosted trees—depth four0.8750.0190.8550.852
 Logistic regression0.8530.0050.8480.852
 Logistic regression—Splines0.8620.0090.8540.858
 Logistic regression—GRACE score0.8280.0010.8270.826
R2
 Regression tree0.2150.0280.1860.203
 Bagged trees0.254−0.0010.2540.257
 Random forests0.288−0.0030.2910.304
 Boosted trees—depth one0.3240.0210.3040.295
 Boosted trees—depth two0.3490.0340.3160.301
 Boosted trees—depth three0.3670.0460.3200.305
 Boosted trees—depth four0.3830.0590.3240.307
 Logistic regression0.3320.0120.3200.315
 Logistic regression—Splines0.3540.0210.3320.330
 Logistic regression—GRACE score0.2800.0010.2790.259
Scaled Brier's score
 Regression tree0.1470.0280.1190.119
 Bagged trees0.1680.0010.1670.119
 Random forests0.103−0.0390.1420.134
 Boosted trees—depth one0.2120.0140.1980.186
 Boosted trees—depth two0.2460.0270.2190.197
 Boosted trees—depth three0.2640.0390.2250.198
 Boosted trees—depth four0.2800.0510.2290.197
 Logistic regression0.2280.0120.2160.198
 Logistic regression—Splines0.2460.0210.2250.211
 Logistic regression—GRACE score0.1830.0020.1820.149
Measures of predictive accuracy in the AMI samples The three logistic regression models, random forests, and boosted regression trees of depth four resulted in calibration slopes closest to one (Table 2). The two logistic regression models had very similar calibration to one another (Fig. 1). The calibration of the GRACE risk score model deviated from that of the other two logistic regression models in the upper range of predicted risk. The regression tree resulted in predictions that displayed the greatest degree of miscalibration. Apart from boosted regression trees of depth one, the remaining prediction methods resulted in some overestimation of the risk of death among subjects with a higher predicted probability of death. Of the four boosted regression trees, the use of trees of depth two resulted in predictions with the best calibration. No method had uniformly superior calibration compared to the other approaches. Logistic regression (with or without splines) demonstrated good concordance between observed and predicted probabilities among subjects with a lower predicted probability of death. However, bagged regression trees and random forests resulted in predictions with a good concordance between observed and predicted probabilities among subjects with a higher predicted probability of death. To a certain extent, the use of boosted regression trees of depth two resulted in reasonable performance across the range of predicted values.
Table 2

Measures of model calibration in the EFFECT Follow-up samples

ModelAMI CohortCHF Cohort


Calibration interceptCalibration slopeCalibration interceptCalibration slope
Logistic regression−0.1711.000−0.0911.032
Logistic regression—GRACE score/0.1581.045−0.1181.029
EFFECT-HF model
Logistic regression—splines−0.1810.985−0.1890.985
Regression tree−0.3950.896−0.3430.890
Bagged regression tree0.0731.1740.2731.215
Random forest−0.2871.022−0.3600.950
Boosted trees—depth one0.5051.4100.6121.407
Boosted trees—depth two0.0291.1440.2701.230
Boosted trees—depth three−0.0981.0740.1171.149
Boosted trees—depth four−0.1551.0400.0421.108
Figure 1

Calibration plot in EFFECT2 AMIcohort.

Measures of model calibration in the EFFECT Follow-up samples Calibration plot in EFFECT2 AMIcohort.

3.1.2 Continuous predictor variables and the log-odds of mortality

The relationship between age and the log-odds of death was approximately linear according to the restricted cubic smoothing splines (Fig. 2). The regression tree modeled a single step function to relate age to the log-odds of the outcome. The ensemble-based methods described a flat relationship between age and the log-odds of the outcome until approximately age 70 years, at which point, the log-odds of death increased with increasing age. For each of the four other covariates, the regression tree modeled a flat or null relationship between the covariate and the log-odds of death. Either the covariate was not used in the regression tree, or it was used in only a branch of the tree that was different from that branch of the tree that described the subject whose covariates were set to the sample median. Furthermore, for some of the covariates (e.g., heart rate and creatinine), the logistic regression model that incorporated restricted cubic splines described a relationship that was approximately flat at the lower range of the distribution of the covariate and/or was approximately flat at the higher range of the distribution of the covariate. Several of the ensemble-based methods approximated these plateau-like relationships.
Figure 2

Relationship between key continuous variables and log-odds of death.

Relationship between key continuous variables and log-odds of death.

3.1.3 The distributions of predicted risks

We report nonparametric estimates of the distribution of the predicted probability of 30-day death for each subject in the validation sample using each of the different prediction methods (Fig. 3). Since the fitted regression tree had eight terminal nodes, there were only eight different predicted probabilities of 30-day death. Apart from regression trees and bagged regression trees, the other predictive models provided unimodal distributions of predicted risk. Furthermore, the distributions were, as would be expected clinically, positively skewed. Logistic regression resulted in predicted probabilities of 30-day death that ranged from 0.001 to 0.964 (0.001–0.961 when smoothing splines were incorporated into the model). When a conventional regression tree was used, the range of predicted probabilities was 0.040–0.546. With boosted regression trees of depth four, the range was 0.023–0.907.
Figure 3

Distribution of predicted probabilities of death in AMI sample.

Distribution of predicted probabilities of death in AMI sample.

3.2 CHF sample

The percentage of subjects who died within 30 days of admission did not differ between the EFFECT Baseline sample (10.8%) and the EFFECT Follow-up sample (9.9%) (p = 0.083, Appendices C and D).
Appendix C

Comparison of baseline characteristics between CHF patients who died within 30 days of admission and those who survived for 30 days subsequent to admission in the EFFECT Baseline and Follow-up samples

VariableEFFECT Baseline sampleEFFECT Follow-up sample


Death within 30 days: No N = 7353Death within 30 days: Yes N = 887p- valueDeath within 30 days: No N = 6853Death within 30 days: Yes N = 755p- Value
Age77.0 (69.0–83.0)82.0 (74.0–88.0)<.00178.0 (70.0–84.0)83.0 (77.0–88.0)<.001
Female sex3692 (50.2%)465 (52.4%)0.2133478 (50.8%)408 (54.0%)0.086
Systolic blood pressure148.0 (128.0–172.0)130.0 (112.0–152.0)<.001146.0 (126.0–169.0)128.0 (109.0–148.0)<.001
Heart rate92.0 (76.0–110.0)94.0 (78.0–110.0)0.20890.0 (73.0–108.0)93.0 (76.0–111.0)0.008
Respiratory rate24.0 (20.0–30.0)25.0 (20.0–32.0)<.00124.0 (20.0–28.0)24.0 (20.0–30.0)<.001
Neck vein distension4062 (55.2%)455 (51.3%)0.0264161 (60.7%)435 (57.6%)0.098
S3728 (9.9%)57 (6.4%)<.001435 (6.3%)31 (4.1%)0.015
S4284 (3.9%)18 (2.0%)0.006192 (2.8%)9 (1.2%)0.009
Rales >50% of lung field752 (10.2%)151 (17.0%)<.001841 (12.3%)131 (17.4%)<.001
Pulmonary edema3766 (51.2%)452 (51.0%)0.8844151 (60.6%)452 (59.9%)0.707
Cardiomegaly2652 (36.1%)292 (32.9%)0.0653043 (44.4%)329 (43.6%)0.664
Diabetes2594 (35.3%)280 (31.6%)0.0282619 (38.2%)239 (31.7%)<.001
Cerebrovascular disease/TIA1161 (15.8%)213 (24.0%)<.0011217 (17.8%)184 (24.4%)<.001
Previous AMI2714 (36.9%)307 (34.6%)0.182505 (36.6%)269 (35.6%)0.617
Atrial fibrillation2139 (29.1%)264 (29.8%)0.6772417 (35.3%)297 (39.3%)0.027
Peripheral vascular disease950 (12.9%)132 (14.9%)0.102915 (13.4%)111 (14.7%)0.303
Chronic obstructive pulmonary disease1211 (16.5%)194 (21.9%)<.0011518 (22.2%)229 (30.3%)<.001
Cirrhosis52 (0.7%)11 (1.2%)0.08552 (0.8%)≤5 (0.4%)0.266
Cancer814 (11.1%)136 (15.3%)<.001749 (10.9%)131 (17.4%)<.001
Left bundle branch block1082 (14.7%)150 (16.9%)0.083934 (13.6%)99 (13.1%)0.694
Hemoglobin125.0 (111.0–138.0)120.0 (105.0–136.0)<.001123.0 (109.0–137.0)118.0 (105.0–132.0)<.001
White blood count8.9 (7.0–11.4)10.0 (7.5–12.9)<.0018.8 (7.0–11.4)9.8 (7.6–13.0)<.001
Sodium139.0 (136.0–141.0)138.0 (135.0–141.0)<.001139.0 (136.0–142.0)138.0 (135.0–142.0)0.001
Potassium4.2 (3.9–4.6)4.4 (4.0–4.9)<.0014.2 (3.8–4.6)4.4 (4.0–4.9)<.001
Glucose7.5 (6.0–10.7)7.7 (6.2–10.9)0.027.3 (6.0–10.1)7.5 (6.1–10.2)0.158
Urea8.1 (6.0–11.8)11.7 (8.1–17.4)<.0018.2 (6.0–11.6)11.4 (7.8–18.3)<.001

Note: Continuous variables are reported as median (25th percentile–75th percentile); dichotomous variables are reported as N (%).

The Kruskal–Wallis test and the Chi-squared test were used to compare continuous and categorical baseline characteristics, respectively, between patients who died within 30 days of admission and those who did not in each of the EFFECT Baseline and EFFECT Follow-up samples.

Appendix D

Comparison of baseline covariates between CHF patients in the EFFECT Baseline sample and the EFFECT Follow-up sample

VariableEFFECT Baseline sample (N = 8240)EFFECT Follow-up sample (N = 7608)p- value
Death within 30 days of admission887 (10.8%)755 (9.9%)0.083
Age77.0 (70.0–84.0)79.0 (70.0–85.0)<.001
Female sex4157 (50.4%)3886 (51.1%)0.429
Systolic blood pressure146.0 (126.0–170.0)144.0 (124.0–167.5)<.001
Heart rate92.0 (76.0–110.0)90.0 (73.0–109.0)<.001
Respiratory rate24.0 (20.0–30.0)24.0 (20.0–28.0)<.001
Neck vein distension4517 (54.8%)4596 (60.4%)<.001
S3785 (9.5%)466 (6.1%)<.001
S4302 (3.7%)201 (2.6%)<.001
Rales >50% of lung field903 (11.0%)972 (12.8%)<.001
Pulmonary edema4218 (51.2%)4603 (60.5%)<.001
Cardiomegaly2944 (35.7%)3372 (44.3%)<.001
Diabetes2874 (34.9%)2858 (37.6%)<.001
Cerebrovascular disease/TIA1374 (16.7%)1401 (18.4%)0.004
Previous AMI3021 (36.7%)2774 (36.5%)0.793
Atrial fibrillation2403 (29.2%)2714 (35.7%)<.001
Peripheral vascular disease1082 (13.1%)1026 (13.5%)0.511
Chronic obstructive pulmonary disease1405 (17.1%)1747 (23.0%)<.001
Cirrhosis63 (0.8%)55 (0.7%)0.761
Cancer950 (11.5%)880 (11.6%)0.941
Left bundle branch block1232 (15.0%)1033 (13.6%)0.014
Hemoglobin124.0 (110.0–138.0)123.0 (109.0–137.0)0.001
White blood count9.0 (7.1–11.6)8.9 (7.0–11.5)0.062
Sodium139.0 (136.0–141.0)139.0 (136.0–142.0)0.028
Potassium4.2 (3.9–4.6)4.2 (3.9–4.6)0.105
Glucose7.5 (6.1–10.7)7.3 (6.0–10.1)<.001
Urea8.4 (6.1–12.4)8.4 (6.2–12.2)0.635

Note: Continuous variables are reported as median (25th percentile–75th percentile); dichotomous variables are reported as N (%).

The Kruskal–Wallis test and the Chi-squared test were used to compare continuous and categorical baseline characteristics, respectively, between patients in the EFFECT Baseline sample and the EFFECT Follow-up sample.

3.2.1 Comparison of predictive ability of different regression methods

For all three measures of predictive accuracy, regression trees resulted in predicted probabilities of 30-day mortality with both the lowest in-sample and out-of-sample accuracy (Table 3). In the EFFECT Baseline sample, the use of boosted regression trees of depth four resulted in predictions with the greatest accuracy when assessing performance using the AUC and the Scaled Brier's Score. A logistic regression model that incorporated restricted cubic smoothing splines resulted in the greatest out-of-sample predictive accuracy when using the EFFECT Follow-up sample as the validation sample.
Table 3

Measures of accuracy in CHF samples

ModelApparent performance (EFFECT Baseline)Optimism (bootstrap estimate)Optimism- corrected performance (EFFECT Baseline)EFFECT Follow- up
AUC
 Regression tree0.6740.0120.6620.661
 Bagged trees0.713−0.0110.7240.725
 Random forests0.752−0.0030.7550.764
 Boosted trees—depth one0.7690.0120.7570.760
 Boosted trees—depth two0.7880.0210.7670.770
 Boosted trees—depth three0.8010.0290.7720.774
 Boosted trees—depth four0.8110.0360.7760.777
 Logistic regression0.7730.0080.7650.781
 Logistic regression—Splines0.7860.0130.7730.786
 Logistic regression—EFFECT HF0.7620.0030.7590.775
R2
 Regression tree0.0960.0180.0790.077
 Bagged trees0.119−0.0030.1220.117
 Random forests0.164−0.0070.1710.170
 Boosted trees—depth one0.1870.0190.1680.163
 Boosted trees—depth two0.2200.0400.1800.175
 Boosted trees—depth three0.2440.0600.1840.178
 Boosted trees—depth four0.2660.0790.1870.180
 Logistic regression0.1940.0120.1820.194
 Logistic regression—Splines0.2160.0220.1940.203
 Logistic regression—EFFECT HF0.1740.0040.1700.179
Scaled Brier's score
 Regression tree0.0580.0160.0430.039
 Bagged trees0.071−0.0010.0710.039
 Random forests0.097−0.0210.1180.087
 Boosted trees—depth one0.1060.0100.0960.091
 Boosted trees—depth two0.1390.0260.1130.104
 Boosted trees—depth three0.1610.0400.1210.106
 Boosted trees—depth four0.1790.0540.1260.107
 Logistic regression0.1250.0100.1150.113
 Logistic regression—Splines0.1420.0180.1240.119
 Logistic regression—EFFECT HF0.1060.0040.1030.098
Measures of accuracy in CHF samples Boosted regression trees of depth four resulted in the mean predicted log-odds of death being the closest to the observed log-odds of death in the validation sample (Table 2). The three logistic regression models resulted in calibration slopes closest to one. As in the AMI sample, no method had uniformly superior calibration to the other methods (Fig. 4). Logistic regression (with or without splines) and random forests resulted in predictions with a good concordance between observed and predicted probabilities among subjects with a lower predicted probability of death.
Figure 4

Calibration plot in EFFECT2 CHF cohort.

Calibration plot in EFFECT2 CHF cohort.

4 Discussion

We examined the ability of ensemble-based methods to predict the probability of 30-day mortality in patients who were hospitalized with either an AMI or CHF. Our primary finding was that logistic regression models that incorporated restricted cubic smoothing splines had the greatest out-of-sample predictive accuracy, in both the AMI and CHF populations. Our derivation and validation samples consisted of population-based samples of unselected patients with either AMI or CHF from temporally distinct periods (1999–2001 vs. 2004–2005, respectively). Patients in the validation sample tended to be older and modestly sicker than patients in the derivation sample. For these reasons, the estimates of out-of-sample performance are likely to be generalizable to other current settings. Several secondary findings should be highlighted from the current study. First, ensemble-based methods offer substantially greater predictive accuracy compared to conventional regression trees for predicting short-term mortality in patients hospitalized with cardiovascular disease. Second, for predicting short-term cardiovascular mortality, ensemble-based methods did not offer a clear advantage over conventional logistic regression. Third, logistic regression resulted in the greatest range of predicted probabilities of 30-day death in the validation sample. Logistic regression thus permitted for the greatest degree in separation of patients according to predicted probability. In the current study, we have focused on predicting outcomes rather than on describing the nature of the relationship between specific covariates and the outcome. While the latter is of interest in clinical medicine and epidemiology, prediction is also of great importance. First, it allows clinicians to make treatment decisions informed by global patient prognosis instead of multiple potential clinical factors that may have variable impacts on mortality risk. It has been previously demonstrated that without the guidance of global risk scores, the prescription of drug therapies demonstrates a risk-treatment mismatch, such that higher-risk patients are less likely to receive potentially life-saving treatment (Lee et al., 2005). Ideally, prognostic data should guide treatment decisions because: (a) some treatments should be restricted to patients with a poor prognosis, considering side effects of treatment and financial costs (e.g., coronary artery bypass graft surgery); (b) conversely, patients with a poor prognosis may not be candidates for other therapies (e.g., implantable cardiac defibrillators); (c) the timing of different treatment options versus end-of-life care is dependent on prognosis; and (d) admission to hospital is ideally reserved for patients who have worse prognosis (Lee et al., 2010). When assessing prognosis, multivariate risk scores such as the GRACE score or the EFFECT-HF model have several potential advantages for clinicians, administrators, and researchers. They allow physicians to synthesize information from multiple clinical characteristics (e.g., demographic, vital signs, laboratory measurements, presenting signs and symptoms) to make global predictions about prognosis, rather than being overly influenced by subjective interpretation of specific patient characteristics in isolation. Thus, the models developed in this study synthesize information to improve the accuracy of the prediction of patient prognosis. Furthermore, risk models are essential for risk adjustment when comparing quality of care and outcomes among different health care plans and providers (i.e., hospital report cards). Finally, the design and analysis of randomized controlled trials may benefit from stratification by prognosis (Steyerberg, 2009). While the extent of clinical use is not definitively known, the GRACE score appears to be commonly used as research tool for formally determining patient risk in the context of research studies, rather than as a tool for clinical decision making. Widespread adoption of these risk scores and of models similar to those developed in the current study by clinicians could improve the ability of physicians to make estimates of patients’ prognosis, rather than relying on a subjective interpretation of specific clinical characteristics. Some limitations of our study need to be acknowledged. We applied only a selection of modern modeling methods. Regression models did not include shrinkage or penalized estimation methods. We did not consider neural networks, support vector machine techniques, or the recently proposed “superlearner”, which may be relevant alternative approaches in some circumstances (van der Laan and Rose, 2011). We conclude that bagged regression trees, random forests, and boosted regression trees may result in superior prediction of 30-day mortality in AMI and CHF patients compared to conventional regression trees. However, ensemble-based prediction methods may not offer improvements over logistic regression models that incorporated flexible functions to model nonlinear relationships between continuous covariates and the log-odds of the outcome.
  11 in total

1.  Propensity score estimation with boosted regression for evaluating causal effects in observational studies.

Authors:  Daniel F McCaffrey; Greg Ridgeway; Andrew R Morral
Journal:  Psychol Methods       Date:  2004-12

2.  A comparison of statistical learning methods on the Gusto database.

Authors:  M Ennis; G Hinton; D Naylor; M Revow; R Tibshirani
Journal:  Stat Med       Date:  1998-11-15       Impact factor: 2.373

Review 3.  TIMI, GRACE and alternative risk scores in Acute Coronary Syndromes: a meta-analysis of 40 derivation studies on 216,552 patients and of 42 validation studies on 31,625 patients.

Authors:  Fabrizio D'Ascenzo; Giuseppe Biondi-Zoccai; Claudio Moretti; Mario Bollati; Pierluigi Omedè; Filippo Sciuto; Davide G Presutti; Maria Grazia Modena; Mauro Gasparini; Matthew J Reed; Imad Sheiban; Fiorenzo Gaita
Journal:  Contemp Clin Trials       Date:  2012-01-11       Impact factor: 2.226

4.  A comparison of regression trees, logistic regression, generalized additive models, and multivariate adaptive regression splines for predicting AMI mortality.

Authors:  Peter C Austin
Journal:  Stat Med       Date:  2007-07-10       Impact factor: 2.373

5.  Risk-treatment mismatch in the pharmacotherapy of heart failure.

Authors:  Douglas S Lee; Jack V Tu; David N Juurlink; David A Alter; Dennis T Ko; Peter C Austin; Alice Chong; Therese A Stukel; Daniel Levy; Andreas Laupacis
Journal:  JAMA       Date:  2005-09-14       Impact factor: 56.272

6.  Assessing the performance of prediction models: a framework for traditional and novel measures.

Authors:  Ewout W Steyerberg; Andrew J Vickers; Nancy R Cook; Thomas Gerds; Mithat Gonen; Nancy Obuchowski; Michael J Pencina; Michael W Kattan
Journal:  Epidemiology       Date:  2010-01       Impact factor: 4.822

7.  Effectiveness of public report cards for improving the quality of cardiac care: the EFFECT study: a randomized trial.

Authors:  Jack V Tu; Linda R Donovan; Douglas S Lee; Julie T Wang; Peter C Austin; David A Alter; Dennis T Ko
Journal:  JAMA       Date:  2009-11-18       Impact factor: 56.272

8.  Predictors of hospital mortality in the global registry of acute coronary events.

Authors:  Christopher B Granger; Robert J Goldberg; Omar Dabbous; Karen S Pieper; Kim A Eagle; Christopher P Cannon; Frans Van De Werf; Alvaro Avezum; Shaun G Goodman; Marcus D Flather; Keith A A Fox
Journal:  Arch Intern Med       Date:  2003-10-27

9.  Early deaths in patients with heart failure discharged from the emergency department: a population-based analysis.

Authors:  Douglas S Lee; Michael J Schull; David A Alter; Peter C Austin; Andreas Laupacis; Alice Chong; Jack V Tu; Thérèse A Stukel
Journal:  Circ Heart Fail       Date:  2010-01-27       Impact factor: 8.790

10.  Developing a prognostic model for traumatic brain injury--a missed opportunity?

Authors:  Neil H Young; Peter J D Andrews
Journal:  PLoS Med       Date:  2008-08-05       Impact factor: 11.069

View more
  25 in total

1.  Improving Hospital Performance Rankings Using Discrete Patient Diagnoses for Risk Adjustment of Outcomes.

Authors:  Brendan DeCenso; Herbert C Duber; Abraham D Flaxman; Shane M Murphy; Michael Hanlon
Journal:  Health Serv Res       Date:  2017-03-13       Impact factor: 3.402

2.  Early illness features associated with mortality in the juvenile idiopathic inflammatory myopathies.

Authors:  Adam M Huber; Gulnara Mamyrova; Peter A Lachenbruch; Julia A Lee; James D Katz; Ira N Targoff; Frederick W Miller; Lisa G Rider
Journal:  Arthritis Care Res (Hoboken)       Date:  2014-05       Impact factor: 4.794

3.  Pre-operative prediction of surgical morbidity in children: comparison of five statistical models.

Authors:  Jennifer N Cooper; Lai Wei; Soledad A Fernandez; Peter C Minneci; Katherine J Deans
Journal:  Comput Biol Med       Date:  2014-12-08       Impact factor: 4.589

4.  Revisiting performance metrics for prediction with rare outcomes.

Authors:  Samrachana Adhikari; Sharon-Lise Normand; Jordan Bloom; David Shahian; Sherri Rose
Journal:  Stat Methods Med Res       Date:  2021-09-01       Impact factor: 2.494

5.  Machine learning and discriminant function analysis in the formulation of generic models for sex prediction using patella measurements.

Authors:  Mubarak A Bidmos; Oladiran I Olateju; Sabiha Latiff; Tawsifur Rahman; Muhammad E H Chowdhury
Journal:  Int J Legal Med       Date:  2022-10-07       Impact factor: 2.791

6.  Using methods from the data-mining and machine-learning literature for disease classification and prediction: a case study examining classification of heart failure subtypes.

Authors:  Peter C Austin; Jack V Tu; Jennifer E Ho; Daniel Levy; Douglas S Lee
Journal:  J Clin Epidemiol       Date:  2013-02-04       Impact factor: 6.437

7.  A clinical risk stratification tool for predicting treatment resistance in major depressive disorder.

Authors:  Roy H Perlis
Journal:  Biol Psychiatry       Date:  2013-02-04       Impact factor: 13.382

8.  Multicenter Comparison of Machine Learning Methods and Conventional Regression for Predicting Clinical Deterioration on the Wards.

Authors:  Matthew M Churpek; Trevor C Yuen; Christopher Winslow; David O Meltzer; Michael W Kattan; Dana P Edelson
Journal:  Crit Care Med       Date:  2016-02       Impact factor: 7.598

9.  Beyond discrimination: A comparison of calibration methods and clinical usefulness of predictive models of readmission risk.

Authors:  Colin G Walsh; Kavya Sharman; George Hripcsak
Journal:  J Biomed Inform       Date:  2017-10-24       Impact factor: 6.317

10.  Machine learning enhances the performance of short and long-term mortality prediction model in non-ST-segment elevation myocardial infarction.

Authors:  Woojoo Lee; Joongyub Lee; Seoung-Il Woo; Seong Huan Choi; Jang-Whan Bae; Seungpil Jung; Myung Ho Jeong; Won Kyung Lee
Journal:  Sci Rep       Date:  2021-06-18       Impact factor: 4.379

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.