| Literature DB >> 29929496 |
Sara Bersche Golas1, Takuma Shibahara2, Stephen Agboola3,4,5, Hiroko Otaki2, Jumpei Sato2, Tatsuya Nakae2, Toru Hisamitsu2, Go Kojima2, Jennifer Felsted3, Sujay Kakarmath3,4,5, Joseph Kvedar3,4,5, Kamal Jethwani3,4,5.
Abstract
BACKGROUND: Heart failure is one of the leading causes of hospitalization in the United States. Advances in big data solutions allow for storage, management, and mining of large volumes of structured and semi-structured data, such as complex healthcare data. Applying these advances to complex healthcare data has led to the development of risk prediction models to help identify patients who would benefit most from disease management programs in an effort to reduce readmissions and healthcare cost, but the results of these efforts have been varied. The primary aim of this study was to develop a 30-day readmission risk prediction model for heart failure patients discharged from a hospital admission.Entities:
Keywords: Deep learning; Deep unified networks; Heart failure; Machine learning; Readmission reduction; Value-based care
Mesh:
Year: 2018 PMID: 29929496 PMCID: PMC6013959 DOI: 10.1186/s12911-018-0620-z
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Target periods and value expressions of each data type
| Data Type | Data Source | Target periods | Value expression |
|---|---|---|---|
| Demographics | EDW |
| Binary (0 or 1); Continuous/discrete value |
| Admissions | EDW | Admission date to discharge date | Continuous/discrete value |
| Diagnoses | EDW | 2 years pre-discharge date to discharge date | Binary of occurrence in the target period |
| Labs | EDW | Admission date to 1 week after admission date; 1 week before discharge date to discharge date | Number of occurrences in the target period a |
| Medications | EDW/RPDR | Admission date to discharge date | Number of occurrences in the target period |
| Procedures | EDW | Admission date to discharge date | Number of occurrences in the target period |
| Notes | RPDR | Admission date to discharge date b | Binary of occurrence in the target period c |
a abnormal occurrences, b except Social History = penultimate to admission date, c except Allergies = number of occurrences in the target period
Fig. 1Network architectures of deep neural network and deep unified networks. Demonstrates the network architecture of deep unified networks (right side) compared to deep neural networks (left side)
Fig. 2Patient selection flowchart. Summarizes the patient selection process
Patient Cohort Demographics
| Patient Characteristics | Whole Cohort | Patients with readmissions | Patients without readmissions | |
|---|---|---|---|---|
| Age Distribution | 0.75 | |||
| Median (1, 3 quartile) | 75.7 (64.3, 84.7) | 76.0 (64.4, 84.6) | 75.5 (64.2, 84.7) | |
| Gender, n (%) | 0.74 | |||
| Male | 6073 (52.8) | 1839 (52.5) | 4234 (52.9) | |
| Race, n (%) | 0.16 | |||
| White | 9490 (84.5) | 2872 (83.8) | 6618 (84.9) | |
| Black or African American | 889 (7.9) | 297 (8.7) | 592 (7.6) | |
| Hispanic or Latino (all races) | 423 (3.8) | 141 (4.1) | 282 (3.6) | |
| Asian | 221 (2.0) | 61 (1.8) | 160 (2.1) | |
| Other, or more than one race | 202 (1.8) | 58 (1.7) | 144 (1.8) | |
| |
|
|
| |
| Marital status, n (%) | 0.07 | |||
| Married / Partnered | 5125 (45.6) | 1516 (44.3) | 3609 (46.2) | |
| Widow | 2360 (21.0) | 718 (21) | 1642 (21) | |
| Single | 2274 (20.2) | 715 (20.9) | 1559 (19.9) | |
| Divorced / Separated | 1069 (9.5) | 323 (9.4) | 746 (9.5) | |
| Other | 412 (3.7) | 148 (4.3) | 264 (3.4) | |
| |
|
|
| |
| Highest educational attainment, n (%) | < 0.01* | |||
| Some High School or Less | 1117 (12.7) | 392 (14.2) | 725 (12) | |
| High School Graduate/GED | 3418 (38.7) | 1127 (40.7) | 2291 (37.8) | |
| Some College/Vocational/Technical Program | 492 (5.6) | 134 (4.8) | 358 (5.9) | |
| Graduate of College or Postgraduate School | 2911 (33.0) | 864 (31.2) | 2047 (33.8) | |
| Other | 887 (10.1) | 253 (9.1) | 634 (10.5) | |
| |
|
|
| |
| Employment status, n (%) | < 0.01* | |||
| Retired | 4338 (57.3) | 1329 (57.7) | 3009 (57.1) | |
| Employed a | 2067 (27.3) | 556 (24.1) | 1511 (28.7) | |
| Disability | 656 (8.7) | 259 (11.2) | 397 (7.5) | |
| Unemployed | 467 (6.2) | 149 (6.5) | 318 (6) | |
| Other | 49 (0.6) | 12 (0.5) | 37 (0.7) | |
| |
|
|
| |
| Number of comorbidities, n (%) b | < 0.01* | |||
| 0 | 2619 (22.8) | 578 (16.5) | 2041 (25.5) | |
| 1 | 3032 (26.3) | 840 (24) | 2192 (27.4) | |
| 2 | 2684 (23.3) | 841 (24) | 1843 (23) | |
| ≥ 3 | 3175 (27.6) | 1243 (35.5) | 1932 (24.1) |
aIncludes part-time and self-employment
bThe following list of comorbidities was selected based on a literature review of comorbidities frequently found in patients with heart failure [27–33], in addition to clinical opinion of study staff physicians. Please see Additional file 3: Appendix B for a complete list of ICD-9 codes used to identify each condition. Each condition evaluated is listed here with the percentage of the study population who presented to a PHS facility with the condition as the principal diagnosis for either an inpatient or outpatient encounter at least once between 2014 and 2015. Hypertension (37.3%), cardiovascular disease (32.5%), chronic kidney disease / renal insufficiency (26.1%), non-secondary diabetes mellitus (22.8%), anemia (19.2%), chronic obstructive pulmonary disease (9.8%), osteoarthritis (9.1%), mental health conditions (7.2%), back pain (3.3%), osteoporosis (3.2%), obesity (2.3%)
Description of contributing variables and results of variable reduction post-processing, by data type
| Data Type | Major Variables | Number of feature categories | Variable reduction from ➔ to |
|---|---|---|---|
| Demographics | Marital status, education, gender, language | 2 | 39 ➔ 15 |
| Admissions | Total cost of index admission, age at admission, cumulative number of 30-day readmissions, length of stay | 2 | 217 ➔ 53 |
| Diagnoses | ICD-9 codes, WKF | 4 | 8101 ➔ 1297 |
| Labs | WKF at admission, WKF at discharge | 2 | 94 ➔ 58 |
| Medications | RXCUI, Medication name, WKF | 7 | 16,779 ➔ 1107 |
| Procedures | ICD-9 codes | 1 | 1833 ➔ 95 |
| Notes | Words from: social history, hospital course, hospital reason, allergies | 7 | 7558 ➔ 887 |
|
|
|
Hyper-parameters of Gradient Boosting (XGBoost), Maxout networks, and DUNs
| GRADIENT BOOSTING (XGBOOST) | ||||
| Parameter name | Distribution and search range | Best parameter | ||
| learning_rate | Log-uniform [−5.0, −0.5] | 0.007 | ||
| max_depth | Discrete uniform [3, 25] | 5 | ||
| min_child_weight | Discrete uniform [1, 10] | 1 | ||
| n_estimators | Discrete uniform [100, 1000] | 398 | ||
| gamma | Log-uniform [−10, 0] | 0.042 | ||
| alpha | Log-uniform [−10, 0] | 0.0003 | ||
| lambda | Log-uniform [−10, 0] | 0. 116 | ||
| subsample | Discrete uniform (units of 0.05) [0.5, 1.0] | 0.70 | ||
| colsample_bytree | Discrete uniform (units of 0.05) [0.5, 1.0] | 0.80 | ||
| MAXOUT NETWORKS and DUNs | ||||
| Parameter name | Distribution and search range | Best parameter | ||
|
|
| |||
| Number of epochs | Discrete uniform [20, 100] | 22 | 100 | |
| Number of inner layers | Discrete uniform [2, 5] | 3 | 5 | |
| Number of inner neurons | Discrete uniform [100, 1000] | 914 | 759 | |
| Number of maxout | Discrete uniform [2, 5] | 5 | – | |
| Activation function | Random choice from: sigmoid, tanh, softplus, softsign | Sigmoid | Sigmoid | |
| Dropout rate of: | - input layer | Uniform [0.001, 0.5] | 0.446 | 0.397 |
| - inner layers | Uniform [0.001, 0.5] | 0.394 | 0.433 | |
10-fold CV Results
| AUC mean ± sd | Accuracy mean ± sd | Precision mean ± sd | Recall mean ± sd | f1 mean ± sd | |
|---|---|---|---|---|---|
| Logistic regression | 0.664 ± 0.015 | 0.626 ± 0.020 | 0.336 ± 0.014 | 0.616 ± 0.029 | 0.435 ± 0.012 |
| Gradient boosting | 0.650 ± 0.011 | 0.612 ± 0.013 | 0.325 ± 0.008 | 0.615 ± 0.032 | 0.425 ± 0.010 |
| Maxout networks | 0.695 ± 0.016 | 0.645 ± 0.016 | 0.354 ± 0.016 | 0.631 ± 0.016 | 0.454 ± 0.016 |
| DUNs (proposed) | 0.705 ± 0.015 | 0.646 ± 0.018 | 0.360 ± 0.015 | 0.652 ± 0.036 | 0.464 ± 0.013 |
Fig. 3ROC curves of 10-fold CV. Demonstrates the ROC curve for each predictive modeling technique
Fig. 4Layer importance of DUNs. Shows the boxplot distributions of the attention unit output (Y axis) against each layer number (X axis)
Feature importance ranking of logistic regression
| Rank | Feature description |
|---|---|
| 1 | Cumulative number of |
| 2 | Presence of |
| 3 | Number of |
| 4 | Presence of |
| 5 | Number of |
| 6 | Presence of |
| 7 | Presence of |
| 8 | Presence of |
| 9 | Presence of |
| 10 | Presence of |
| 11 | Presence of |
| 12 | Number of |
| 13 | Presence of |
| 14 | Presence of |
| 15 | Presence of |
Fig. 5Projected net savings from readmission reduction by using prediction models to select CCCP enrollees. Shows the net savings from readmission reduction, calculated by changing the number of CCCP enrollees along with the classification threshold, using the ROC curves of 10-fold CV of 1) DUNs, 2) maxout networks, 3) logistic regression, and 4) gradient boosting as shown in Fig. 3
Maximum net savings and corresponding accuracies
| Max net savings ($M) | Accuracy | |
|---|---|---|
| DUNs (proposed) | 3.403 ± 0.536 | 0.764 ± 0.014 |
| Maxout networks | 3.241 ± 0.561 | 0.754 ± 0.014 |
| Logistic regression | 2.173 ± 0.357 | 0.750 ± 0.018 |
| Gradient boosting | 1.787 ± 0.428 | 0.739 ± 0.016 |