| Literature DB >> 31243311 |
Ahmed Allam1,2, Mate Nagy3, George Thoma4, Michael Krauthammer5,6,7,8.
Abstract
Heart failure (HF) is one of the leading causes of hospital admissions in the US. Readmission within 30 days after a HF hospitalization is both a recognized indicator for disease progression and a source of considerable financial burden to the healthcare system. Consequently, the identification of patients at risk for readmission is a key step in improving disease management and patient outcome. In this work, we used a large administrative claims dataset to (1) explore the systematic application of neural network-based models versus logistic regression for predicting 30 days all-cause readmission after discharge from a HF admission, and (2) to examine the additive value of patients' hospitalization timelines on prediction performance. Based on data from 272,778 (49% female) patients with a mean (SD) age of 73 years (14) and 343,328 HF admissions (67% of total admissions), we trained and tested our predictive readmission models following a stratified 5-fold cross-validation scheme. Among the deep learning approaches, a recurrent neural network (RNN) combined with conditional random fields (CRF) model (RNNCRF) achieved the best performance in readmission prediction with 0.642 AUC (95% CI, 0.640-0.645). Other models, such as those based on RNN, convolutional neural networks and CRF alone had lower performance, with a non-timeline based model (MLP) performing worst. A competitive model based on logistic regression with LASSO achieved a performance of 0.643 AUC (95% CI, 0.640-0.646). We conclude that data from patient timelines improve 30 day readmission prediction, that a logistic regression with LASSO has equal performance to the best neural network model and that the use of administrative data result in competitive performance compared to published approaches based on richer clinical datasets.Entities:
Mesh:
Year: 2019 PMID: 31243311 PMCID: PMC6595068 DOI: 10.1038/s41598-019-45685-z
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Overview of HF dataset.
| Variables | HF Dataset (n = 272,778) |
|---|---|
|
| |
| Age, mean (SD) | 72.89 (14) |
| Gender female, count (%) | 133765 (49%) |
|
| |
| Medicare | 391535 (76.4%) |
| Private insurance | 47327 (9.23%) |
| Medicaid | 47095 (9.19%) |
| Self-pay | 13115 (2.55%) |
| Other | 11859 (2.31%) |
| No charge | 1514 (0.29%) |
|
| |
| HF events, count (%) | 343328 (66.94%) |
| days all-cause readmission, count (%) | 81087 (23.61%) |
| Timeline length, mean (SD) | 1.88 (1.4) |
|
| |
| Congestive heart failure; non-hypertensive | 777047 (10.29%) |
| Coronary atherosclerosis and other heart disease | 547890 (7.25%) |
| Residual codes | 305406 (4.04%) |
| Cardiac dysrhythmias | 298823 (3.95%) |
| Chronic kidney disease | 254593 (3.37%) |
|
| |
| Diagnostic cardiac catheterization; coronary arteriography | 106428 (14.95%) |
| Respiratory intubation and mechanical ventilation | 57202 (8.03%) |
| Blood transfusion | 52251 (7.34%) |
| Diagnostic ultrasound of heart (echocardiogram) | 41076 (5.77%) |
| Hemodialysis | 38083 (5.35%) |
Trained models’ performance based on the area under the ROC curve (AUC). CI: confidence interval.
| Model name | AUC | CI - low | CI - high |
|---|---|---|---|
| CNN | 0.619 | 0.616 | 0.622 |
| CNN-Wide | 0.632 | 0.629 | 0.635 |
| RNN (Convex_HF_lastHF) | 0.635 | 0.632 | 0.638 |
| RNN (LastHF) | 0.636 | 0.633 | 0.638 |
| RNN (Uniform_HF) | 0.631 | 0.628 | 0.634 |
| RNN (Convex_HF_NonHF) | 0.627 | 0.624 | 0.630 |
| RNNSS (Convex_HF_lastHF) | 0.621 | 0.618 | 0.624 |
| RNNSS (LastHF) | 0.625 | 0.623 | 0.628 |
| RNNSS (Uniform_HF) | 0.617 | 0.614 | 0.619 |
| RNNSS (Convex_HF_NonHF) | 0.625 | 0.622 | 0.628 |
| Neural CRF (Pairwise) | 0.634 | 0.631 | 0.637 |
| Neural CRF (Unary) | 0.631 | 0.629 | 0.634 |
| CRF Only (Pairwise) | 0.628 | 0.625 | 0.631 |
| CRF Only (Unary) | 0.630 | 0.627 | 0.633 |
| RNNCRF (Pairwise) |
| 0.640 | 0.645 |
| RNNCRF (Unary) | 0.638 | 0.635 | 0.641 |
| MLP | 0.628 | 0.625 | 0.631 |
| Logistic regression | 0.637 | 0.634 | 0.640 |
| Logistic regression |
| 0.640 | 0.646 |
Figure 1Performance analysis of the tested models. Panels A–E report the average ROC curve of the best models. The optimal cutoff is based on the average Youden-Index of each model for all 5-folds. Standard deviation of the optimal cutoff position is reported on the graph. Panel F reports the cumulative average AUC performance as a function of patients’ timeline length.
Figure 2Top-50 features in LASSO models contributing to the increase of log-odds of readmission. The coefficients were normalized using the maximum absolute value of the models’ trained weights.
Figure 3A toy example of a patient’s timeline with 30 days all-cause readmission labeling.