| Literature DB >> 26098570 |
Shyam Visweswaran1, Antonio Ferreira2, Guilherme A Ribeiro3, Alexandre C Oliveira3, Gregory F Cooper1.
Abstract
Deriving predictive models in medicine typically relies on a population approach where a single model is developed from a dataset of individuals. In this paper we describe and evaluate a personalized approach in which we construct a new type of decision tree model called decision-path model that takes advantage of the particular features of a given person of interest. We introduce three personalized methods that derive personalized decision-path models. We compared the performance of these methods to that of Classification And Regression Tree (CART) that is a population decision tree to predict seven different outcomes in five medical datasets. Two of the three personalized methods performed statistically significantly better on area under the ROC curve (AUC) and Brier skill score compared to CART. The personalized approach of learning decision path models is a new approach for predictive modeling that can perform better than a population approach.Entities:
Mesh:
Year: 2015 PMID: 26098570 PMCID: PMC4476684 DOI: 10.1371/journal.pone.0131022
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1An example population decision tree and a personalized decision path.
Panel (a) gives the names of the 21 variables and panel (b) gives their values for a test (current) patient whose outcome we want to predict. Panel (c) shows a population decision tree (derived by CART) and the path used for performing inference, and panel (d) shows a personalized decision path (derived by the DP-BAY method that is described later) for the patient in (b).
Brief descriptions of the datasets.
| Dataset | # Variables (cnt + dsc = total | Target variable | Sample size | Positive outcome count (percent) |
|---|---|---|---|---|
| pneumonia | 38 + 120 = 158 | dire outcome | 2,287 | 261 (11.4%) |
| sepsis-d | 7 + 14 = 21 | death | 1,673 | 189 (11.3%) |
| sepsis-s | 7 + 14 = 21 | severe sepsis | 1,673 | 478 (28.6%) |
| heart failure-d | 11 + 10 = 21 | death | 11,178 | 500 (4.5%) |
| heart failure-c | 11 + 10 = 21 | complications incld. death | 11,178 | 1,255 (11.2%) |
| HIT | 50 + 9 = 59 | HIT | 549 | 76 (13.8%) |
| Alzheimer | 0 + 100 = 100 | Alzheimer’s disease | 1,411 | 861 (61.0%) |
The # Variables column gives the number of continuous (cnt) and discrete (dsc) predictor variables, as well as the total number of variables (which excludes the target variable). The target variables are all binary and denote the presence or absence of a clinical outcome.
Fig 2Pseudocode for the DP-BAY method.
Fig 3Pseudocode for the DP-AUC method.
AUCs for the datasets and outcomes shown in Table 1.
| Dataset | CART | DP-AUC | DP-IG | DP-BAY |
|---|---|---|---|---|
| pneumonia | 0.659 (0.611, 0.707) | 0.654 (0.626, 0.683) | 0.732 (0.699, 0.765) |
|
| sepsis-d | 0.669 (0.620, 0.718) | 0.746 (0.715, 0.776) |
| 0.757 (0.723, 0.792) |
| sepsis-s | 0.658 (0.629, 0.687) | 0.670 (0.644, 0.696) | 0.710 (0.688, 0.732) |
|
| heart failure-d | 0.682 (0.653, 0.710) | 0.707 (0.683, 0.730) | 0.734 (0.708, 0.761) |
|
| heart failure-c | 0.653 (0.636, 0.670) | 0.644 (0.630, 0.658) | 0.699 (0.687, 0.712) |
|
| HIT | 0.818 (0.771, 0.864) | 0.847 (0.811, 0.883) | 0.830 (0.790, 0.870) |
|
| Alzheimer | 0.624 (0.589, 0.658) | 0.598 (0.575, 0.621) | 0.650 (0.648, 0.692) |
|
| Mean | 0.680 (0.634, 0.727) | 0.693 (0.629, 0.756) | 0.734 (0.694, 0.773) |
|
For each method the table gives the mean AUC obtained from 20-fold cross-validation and 95% confidence intervals. The last row gives the mean AUC and 95% confidence intervals over all datasets. Highest mean AUC for each outcome is in bold.
Two-sided paired-samples t test comparing the pairwise performance of the four methods on AUC.
| CART | DP-AUC | DP-IG | DP-BAY | |
|---|---|---|---|---|
| CART | -0.851 (0.427) | -5.290 ( | -5.460 ( | |
| DP-AUC | -3.090 ( | -3.260 ( | ||
| DP-IG | -1.860 (0.113) |
In each cell in the table, the number on top is the mean difference between the method in the row and the method in the column and the number at the bottom is the corresponding p value. The mean difference is negative when method in the row has a lower AUC than the method in the column. Results in bold indicate p values of 0.05 or smaller.
BS and BSS for the datasets and outcomes shown in Table 1.
| Dataset | CART | DP-AUC | DP-IG | DP-BAY |
|---|---|---|---|---|
| pneumonia | 0.76 / 0.09 | 0.77 / 0.10 | 0.80 / 0.15 |
|
| sepsis-d | 0.79 / 0.11 | 0.81 / 0.13 |
| 0.82 / 0.15 |
| sepsis-s | 0.70 / 0.06 | 0.71 / 0.08 | 0.74 / 0.10 |
|
| heart failure-d | 0.73 / 0.08 | 0.74 / 0.08 | 0.79 / 0.11 |
|
| heart failure-c | 0.67 / 0.02 | 0.66 / 0.02 | 0.71 / 0.06 |
|
| HIT | 0.84 / 0.16 | 0.86 / 0.19 | 0.86 / 0.19 |
|
| Alzheimer | 0.62 / 0.07 | 0.61 / 0.05 | 0.66 / 0.12 |
|
| Mean | 0.73 / 0.08 | 0.74 / 0.09 | 0.77 / 0.13 |
|
For each method the table gives the mean 1—BS and BSS obtained from 20-fold cross-validation. The last row gives the mean 1—BS and BSS over all datasets. Highest mean 1—BS and BSS for each outcome are in bold.
Two-sided paired-samples t test comparing the pairwise performance of the four methods on BSS.
| CART | DP-AUC | DP-IG | DP-BAY | |
|---|---|---|---|---|
| CART | -0.851 (0.427) | -5.290 ( | -5.460 ( | |
| DP-AUC | -3.090 ( | -3.260 ( | ||
| DP-IG | -1.860 (0.113) |
In each cell in the table, the number on top is the mean difference between the method in the row and the method in the column and the number at the bottom is the corresponding p value. The mean difference is negative when method in the row has a lower BSS than the method in the column. Results in bold indicate p values of 0.05 or smaller.
Proportion of test individuals for which the decision-path model is different from the path in CART model.
| Dataset | DP-AUC | DP-IG | DP-BAY |
|---|---|---|---|
| pneumonia | 0.31 | 0.17 | 0.23 |
| sepsis-d | 0.29 | 0.15 | 0.16 |
| sepsis-s | 0.24 | 0.18 | 0.20 |
| heart failure-d | 0.19 | 0.17 | 0.16 |
| heart failure-c | 0.26 | 0.18 | 0.19 |
| HIT | 0.21 | 0.12 | 0.14 |
| Alzheimer | 0.25 | 0.18 | 0.22 |
| Mean | 0.25 | 0.16 | 0.18 |
For each method the table gives the mean proportion obtained from 20-fold cross-validation. The last row gives the mean proportion over all datasets.