| Literature DB >> 24699007 |
Chuang Wu1, Roni Rosenfeld1, Gilles Clermont2.
Abstract
Prediction of patient-centered outcomes in hospitals is useful for performance benchmarking, resource allocation, and guidance regarding active treatment and withdrawal of care. Yet, their use by clinicians is limited by the complexity of available tools and amount of data required. We propose to use Disjunctive Normal Forms as a novel approach to predict hospital and 90-day mortality from instance-based patient data, comprising demographic, genetic, and physiologic information in a large cohort of patients admitted with severe community acquired pneumonia. We develop two algorithms to efficiently learn Disjunctive Normal Forms, which yield easy-to-interpret rules that explicitly map data to the outcome of interest. Disjunctive Normal Forms achieve higher prediction performance quality compared to a set of state-of-the-art machine learning models, and unveils insights unavailable with standard methods. Disjunctive Normal Forms constitute an intuitive set of prediction rules that could be easily implemented to predict outcomes and guide criteria-based clinical decision making and clinical trial execution, and thus of greater practical usefulness than currently available prediction tools. The Java implementation of the tool JavaDNF will be publicly available.Entities:
Mesh:
Year: 2014 PMID: 24699007 PMCID: PMC3974677 DOI: 10.1371/journal.pone.0089053
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Predictors (features) inluded in the different models.
| Model | Features included |
| Model 1 | Demographics (age, sex, race, chronic, disease), Macrophysiology (APACHE III score, number of organ system failure on day 1) |
| Model 2 | Demographics, physiology, day 1 cytokines |
| Model 3 | Demographics, physiology, SNP profile |
| Model 4 | Demographics, physiology, day 1 cytokines, SNP profile |
| Model 5 | Demographics, physiology, day 1 cytokines, SNP profile, coagulation data |
| Model 6 | Demographics, physiology, FACS |
| Model 7 | Demographics, physiology, day 1 cytokines, SNP profile, coagulation data, FACS |
| Model 8 | Demographics, physiology, all available cytokines, SNP profile, coagulation data, FACS |
Figure 1Availability of data across physiologic domains.
Of 1815 patients with cytokine data on day 1, much smaller numbers of patients had single nucleotide profiles (SNP), Fluorescent-Antibody Cell Sorting (FACS) measurements of surface markers, or full coagulation studies (Coags)performed.
Figure 2Prediction performance of DNF learning on hospital and 90-day mortality data.
10-fold cross validation is applied to assess the prediction performance of DNF learning on hospital and 90-day mortality, and compare the performance when using the whole feature set (Model 8, see Table 1) and only day 1 (Model 7) and/or day 2 cytokine (Model 7 + day 2 cytokines).
Comparative performance of models on predicting 90-day mortality.
| Model | NB | SVM | NN | LOG | BL | RT | RF | DNF |
| Model 1 | .740 | .716 | .746 | .748 |
|
|
|
|
| Model 2 | .733 | .697 | .690 | .747 |
|
|
|
|
| Model 3 | .709 | .683 | .742 |
|
|
|
|
|
| Model 4 | .745 | .714 | .733 |
|
|
|
|
|
| Model 5 | .770 | .718 | .728 |
|
|
|
|
|
| Model 6 | .739 | .689 | .696 | .728 | .739 | .690 | .699 |
|
| Model 7 | .783 | .747 | .751 | .785 | .766 | .701 | .715 |
|
| Model 8 | .704 | .744 | .756 | .723 | .768 | .575 | .628 |
|
NB-Naive Bayes, SVM-Support vector machine, NN-neural network, LOG-Logistic regression, BL-Boosted logistic regression, RT-Random tree, RF-Random forest, DNF-Disjunctive Normal Form learning.
Comparative performance of models on predicting 90-day mortality.
| Scores | NB | SVM | NN | LOG | BL | RT | RF | DNF |
| ROC | .747 | .752 | .757 | .738 | .748 | .655 | .698 |
|
| 1 - Brier Score | .712 | .750 | .844 | .822 | .867 | .792 | .874 |
|
NB-Naive Bayes, SVM-Support vector machine, NN-neural network, LOG-Logistic regression, BL-Boosted logistic regression, RT-Random tree, RF-Random forest, DNF-Disjunctive Normal Form learning.
DNF literals explanation.
| literal | meaning | value type | num of value groups |
|
| Presence of some organ dysfunction on day 1 | integer | 5 |
|
| Quartile of procalcitonin | integer | 5 |
|
| Quartile of the inflammatory marker IL-6 on the second day of admission | integer | 5 |
|
| Genetic polymorphism of IL-1 receptor antagonist protein | Gene | 3 |
|
| Quartile of age | integer | 5 |
|
| Quartile of the inflammatory marker IL-10 on the day of admission | integer | 5 |
|
| Genetic polymorphism of IL-1 receptor antagonist protein | Gene | 5 |
|
| Quartile of APACHE III score | integer | 5 |
|
| Burden of chronic illness, as determined by the Charlson index | integer | 5 |
|
| Quartile of coagulation Factor IX activity | integer | 5 |
Note.
*: when missing values present in the data, they are treated as a literal, but they are never selected in the DNF learning.
Figure 3Interpreting DNF models on three patients.
The prediction procedure of DNF is represented in three layers: the top layer is the DNF itself; the middle layer is the clause level; and the bottom layer is the final outcome. Red color rectangles indicate that patient data is above the threshold and a severity condition is met; green rectangles indicate that patient data is below and the condition is not met. Three example patients are shown. For patient A, , and are all above the threshold and results in a positive Clause 2 so the predicted outcome is mortality. For patient B, Clause 2 is negative due to the low (procalcitonin in the lowest quartile); however high turns on Clause 1 and predicts mortality too. Patient C has high but it is not sufficient to turn on either Clause 1 or 2 and she is therefore predicted to survive.
DNF of the patient mortality.
| Hospmort mortality (model 8) |
|
| Hospmort mortality (model 7) |
|
| 90-day mortality (model 8) |
|
| 90-day mortality (model 7) |
|