| Literature DB >> 29744026 |
Sean Barnes1, Suchi Saria2, Scott Levin3.
Abstract
Widespread adoption of electronic health records (EHR) and objectives for meaningful use have increased opportunities for data-driven predictive applications in healthcare. These decision support applications are often fueled by large-scale, heterogeneous, and multilevel (i.e., defined at hierarchical levels of specificity) patient data that challenge the development of predictive models. Our objective is to develop and evaluate an approach for optimally specifying multilevel patient data for prediction problems. We present a general evolutionary computational framework to optimally specify multilevel data to predict individual patient outcomes. We evaluate this method for both flattening (single level) and retaining the hierarchical predictor structure (multiple levels) using data collected to predict critical outcomes for emergency department patients across five populations. We find that the performance of both the flattened and hierarchical predictor structures in predicting critical outcomes for emergency department patients improve upon the baseline models for which only a single level of predictor-either more general or more specific-is used (p < 0.001). Our framework for optimizing the specificity of multilevel data improves upon more traditional single-level predictor structures and can readily be adapted to similar problems in healthcare and other domains.Entities:
Mesh:
Year: 2018 PMID: 29744026 PMCID: PMC5878885 DOI: 10.1155/2018/7174803
Source DB: PubMed Journal: J Healthc Eng ISSN: 2040-2295 Impact factor: 2.682
Common multilevel predictor data available in electronic health records.
| Multilevel predictors | Description | Examples |
|---|---|---|
| Reasons for visit | Descriptors of the reason for the healthcare system encounter | Ambulatory care chief complaints; inpatient admission diagnoses |
| Diagnoses | Descriptors of patients' differential or final diagnosis departing the healthcare system | International classification of disease codes (e.g., ICD-10); read codes |
| Medical history | Descriptors of previous medical history and chronic conditions | EHR problem lists (e.g., diabetes, previous coronary artery bypass graft (CABG), hypertension) |
| Diagnostic and therapeutic procedures | Descriptors of diagnostic and therapeutic courses of action taken | Procedure coding system (ICD-10-PCS), surgical procedures, rehabilitation |
| Diagnostic exams | Descriptors of medical tests conducted | Laboratory exams, imaging exams, physical exams |
| Medication | Descriptors of medications administered | US Food and Drug Administration Drug Class (e.g., opioids and hydrocodone) |
| Administrative | Descriptor of the administrative status of patients | Inpatient, outpatient, observation |
Summary of categorical predictor variables (abnormal ranges indicated in bold).
| Predictor | Categories | Ranges/categories |
|---|---|---|
| Age | 8 | 18–29, 30–39, 40–49, 50–59, 60–69, 70–79, 80–89, >90 |
| Gender | 2 | Male, female |
| Arrival mode | 2 | Via ambulance, walk in |
| Temperature (°F)∗ | 6 |
|
| Pulse (bpm)∗ | 8 |
|
| Respiratory rate (bpm)∗ | 6 |
|
| Blood pressure (mmHG)∗ | 6 |
|
| Oxygen saturation (%)∗ | 4 |
|
∗Each vital sign also includes an additional category for missing data.
Patient population summary.
| ACAD | COMM | BRAZIL | UAE | NAT | |
|---|---|---|---|---|---|
| Sample size | 104.5 K | 144.9 K | 94.8 K | 103.5 K | 74.6 K |
| Unique complaints | 686 | 616 | 358 | 288 | 649 |
| Critical outcome prevalence | 3.45% | 3.48% | 3.00% | 1.68% | 3.05% |
Figure 1Genetic algorithm representation and recombination operators.
Summary table of genetic algorithm control parameters and operators.
| Parameter | Setting |
|---|---|
| Population size ( | 40 |
| Number of generations | 100 |
| Selection | Tournament ( |
| Crossover operation | Uniform |
| Crossover rate | 0.6 |
| Mixing ratio | 0.2 |
| Mutation operation | Bit flip |
| Mutation rate | 0.2 |
| Bit flip rate | 0.05 |
Figure 2Bullseye performance for baseline models (with specific complaints only and complaint categories only, resp.) and flattened genetic algorithm for the academic hospital. Overall performance is indicated outside of the bullseye. Statistical significance for the difference in 5-fold cross-validated AUC (using DeLong's method) between the flattened genetic algorithm approach and the corresponding baseline models is indicated by ∗∗∗ for p < 0.001, ∗∗ for p < 0.01, and ∗ for p < 0.05.
Figure 3Bullseye performance for baseline models (with specific complaints only and complaint categories only, resp.) and hierarchical genetic algorithm for the academic hospital. Overall performance is indicated outside of the bullseye. Statistical significance for the difference in 5-fold cross-validated AUC (using DeLong's method) between the flattened genetic algorithm approach and the corresponding baseline models is indicated by ∗∗∗ for p < 0.001, ∗∗ for p < 0.01, and ∗ for p < 0.05.
Figure 4Histogram summary of differences in predicted probabilities of the hierarchical approach relative to baseline models. (a, b, c) Baseline model with complaints only. (d, e, f) Baseline model with categorized complaints only for the inner (a, d), middle (b, e), and outer subgroups (c, f) of patients. Note that the y-axis displays the frequency (count) of patients on a logarithmic scale.
| Flattened | |||||
|---|---|---|---|---|---|
| ACAD | COMM | BRAZIL | UAE | NHAMCS | |
| Overall AUC | 0.8431 | 0.8361 | 0.8261 | 0.8820 | 0.8429 |
| Training time (hr) | 42.47 | 78.67 | 19.89 | 15.00 | 29.06 |
| Selected complaints (%) | 48.3 | 52.8 | 53.4 | 59.0 | 49.9 |
| Hierarchical | |||||
|---|---|---|---|---|---|
| ACAD | COMM | BRAZIL | UAE | NHAMCS | |
| Overall AUC | 0.8433 | 0.8364 | 0.8260 | 0.8819 | 0.8436 |
| Training time (hr) | 4.93 | 8.91 | 3.46 | 3.27 | 3.09 |
| Selected complaints (%) | 49.3 | 64.6 | 55.6 | 55.6 | 46.4 |
| Comparison | |||||
|---|---|---|---|---|---|
| Difference in overall AUC ( | 0.6144 | 0.2210 | 0.7022 | 0.3622 | 0.2579 |
| Jointly selected complaints (%) | 28.1 | 33.1 | 32.4 | 37.5 | 27.5 |
| Jointly excluded complaints (%) | 30.6 | 27.6 | 23.5 | 22.9 | 31.2 |