| Literature DB >> 32101147 |
Lei Wang1,2, Liping Tong3, Darcy Davis4, Tim Arnold5, Tina Esposito4.
Abstract
BACKGROUND: The main goal of this study is to explore the use of features representing patient-level electronic health record (EHR) data, generated by the unsupervised deep learning algorithm autoencoder, in predictive modeling. Since autoencoder features are unsupervised, this paper focuses on their general lower-dimensional representation of EHR information in a wide variety of predictive tasks.Entities:
Keywords: Autoencoder; Enhanced Reg; Important response-specific predictors; LASSO; Predictive model; Predictive performance
Year: 2020 PMID: 32101147 PMCID: PMC7043035 DOI: 10.1186/s12874-020-00923-1
Source DB: PubMed Journal: BMC Med Res Methodol ISSN: 1471-2288 Impact factor: 4.615
Descriptive statistics of important variables for Readmit30. For binary variables like Acuity, the figures represent the number of positive cases and corresponding percentage of the sample (in parenthesis). For numeric variables like Length of Stay, the figures are sample means and corresponding standard deviations (in parenthesis)
| Variables | Overall Index Admissions | Index Admissions by the Value of Readmit30 | |
|---|---|---|---|
| YES (11.70%) | NO (88.30%) | ||
| 1. Length of Stay | 4.45 (4.45) | 5.61 (5.47) | 4.30 (4.27) |
| 2. Acuity | 81,048 (77.63) | 10,641 (87.14) | 70,407 (76.37) |
| 3. Number of ER Encounters in Last Six Months | 0.36 (0.91) | 0.58 (1.28) | 0.33 (0.85) |
| 4. Number of Inpatient Encounters in Last Year | 6.01 (12.84) | 12.04 (19.83) | 5.21 (11.36) |
| 5. Polypharmacy | 18.88 (8.94) | 19.19 (8.41) | 18.84 (9.01) |
| 6. Number of Inpatient Encounters in Last Six Months | 0.62 (1.30) | 1.17 (1.74) | 0.55 (1.21) |
| 7. Discharge Disposition | |||
| Home/Self Care | 45,931 (44.00) | 3816 (31.25) | 42,115 (45.68) |
| Home Care | 23,290 (22.31) | 3560 (29.15) | 19,730 (21.40) |
| SNF | 25,669 (24.59) | 3668 (30.04) | 22,001 (23.87) |
| Rehab | 2948 (2.82) | 367 (3.01) | 2581 (2.80) |
| LTC, Federal Hospital | 1783 (1.71) | 463 (3.79) | 1320 (1.43) |
| AMA | 551 (0.53) | 99 (0.81) | 452 (0.49) |
| Others | 4226 (4.05) | 238 (1.95) | 3988 (4.33) |
| 8. Mean Albumin Level | |||
| < 3.4 g/dl | 54,177 (51.89) | 8087 (66.23) | 46,090 (50.00) |
| 3.4–5.0 g/dl | 20,535 (19.67) | 1829 (14.98) | 18,706 (20.29) |
| > 5 | 134 (0.13) | 11 (0.09) | 123 (0.13) |
| Unknown | 29,552 (28.31) | 2284 (18.70) | 27,268 (29.58) |
| 9. Leukemia Current | 297 (0.28) | 64 (0.52) | 233 (0.25) |
| 10. Leukemia History | 1272 (1.22) | 246 (2.01) | 1026 (1.11) |
| 11. Malignancy Current | 5043 (4.83) | 847 (6.94) | 4196 (4.55) |
| 12. Malignancy History | 26,620 (25.50) | 3924 (32.13) | 22,696 (24.62) |
| 13. RF without Hemo Current | 14,061 (13.47) | 2348 (19.23) | 11,713 (12.71) |
| 14. History of Alcohol Substance Abuse | 20,641 (19.77) | 2954 (24.19) | 17,687 (19.19) |
| 15. Dementia Current | 3305 (3.17) | 398 (3.26) | 2907 (3.15) |
| 16. Dementia History | 15,559 (14.90) | 2143 (17.55) | 13,416 (14.55) |
| 17. Trauma Current | 7900 (7.57) | 949 (7.77) | 6951 (7.54) |
| 18. Trauma History | 50,428 (48.30) | 6995 (57.28) | 43,433 (47.11) |
Definition of true positive, false positive, true negative and false negative
| Predicted Value | Measures | ||||
|---|---|---|---|---|---|
| 1 | 0 | ||||
| True Value | 1 | true positive (a) | false negative (c) | Sensitivity: a/(a + c) | Recall: a/(a + c) |
| 0 | false positive (b) | true negative (d) | Specificity: d/(d + b) | ||
| Measures | PPV: a/(a + b) | NPV: d/(c + d) | |||
| Precision: a/(a + b) | |||||
Simulation study results. Mean and coefficient of variation (in parenthesis) of precision (when recall = 0.70), PPV (when NPV = 0.95), AUC, NO. (number of features in predictive models) of five prediction models in testing set in 100 repetitions
| Scenarios | Prediction Models | Precision (%) | PPV (%) (NPV = 0.95) | AUC | NO. |
|---|---|---|---|---|---|
| 1. Raw Data | Autoencoder | 24.23 (0.18) | 19.93 (0.07) | 0.749 (0.01) | 50 (0.00) |
| LASSO ( | 28.25 (0.17) | 25.09 (0.05) | 0.788 (0.01) | 300 (0.06) | |
| Random Forest | 25.63 (0.18) | 21.93 (0.06) | 0.767 (0.01) | 100 (0.00) | |
| Simple Reg | 20.96 (0.20) | 15.73 (0.11) | 0.708 (0.02) | 12 (0.00) | |
| Enhanced Reg | 24.62 (0.18) | 20.45 (0.07) | 0.754 (0.01) | 57 (0.03) | |
| 2. Correct Categories | Autoencoder | 25.07 (0.18) | 21.45 (0.07) | 0.757 (0.03) | 50 (0.00) |
| LASSO ( | 26.25 (0.17) | 22.94 (0.05) | 0.771 (0.01) | 132 (0.02) | |
| Random Forest | 24.93 (0.18) | 21.57 (0.06) | 0.759 (0.01) | 136 (0.00) | |
| Simple Reg | 21.36 (0.18) | 17.10 (0.09) | 0.713 (0.01) | 16 (0.00) | |
| Enhanced Reg | 25.77 (0.17) | 22.32 (0.06) | 0.766 (0.01) | 60 (0.06) | |
| 3. Incorrect Categories | Autoencoder | 22.73 (0.18) | 18.82 (0.08) | 0.732 (0.01) | 60 (0.00) |
| LASSO ( | 24.07 (0.17) | 20.25 (0.06) | 0.748 (0.01) | 132 (0.02) | |
| Random Forest | 22.70 (0.18) | 18.67 (0.07) | 0.733 (0.01) | 136 (0.00) | |
| Simple Reg | 19.83 (0.19) | 15.31 (0.12) | 0.690 (0.02) | 16 (0.00) | |
| Enhanced Reg | 23.61 (0.18) | 19.69 (0.07) | 0.743 (0.01) | 69 (0.03) | |
| 4. Incorrect Categories and Missing Data | Autoencoder | 24.16 (0.18) | 20.45 (0.07) | 0.748 (0.03) | 60 (0.00) |
| LASSO ( | 25.32 (0.17) | 21.67 (0.06) | 0.761 (0.01) | 175 (0.08) | |
| Random Forest | 23.61 (0.18) | 19.92 (0.07) | 0.745 (0.01) | 226 (0.00) | |
| Simple Reg | 20.92 (0.19) | 16.31 (0.10) | 0.706 (0.02) | 28 (0.00) | |
| Enhanced Reg | 24.89 (0.17) | 21.25 (0.07) | 0.756 (0.02) | 81 (0.04) |
Real data results. Mean and coefficient of variation (in parenthesis) of precision (when recall = 0.7), PPV (when NPV = 0.95 for Readmit 30 and 0.99 for the others), AUC, NO. (number of features in predictive models) of five prediction models in testing set in 100 repetitions
| Response | Prediction | Precision (%) | PPV (%) | AUC | NO. |
|---|---|---|---|---|---|
| Readmit30 | Autoencoder | 19.04 (0.02) | 16.88 (0.02) | 0.707 (0.01) | 200 (0.00) |
| LASSO ( | 19.70 (0.02) | 17.79 (0.02) | 0.719 (0.00) | 162 (0.10) | |
| Random Forest | 18.48 (0.02) | 16.50 (0.02) | 0.707 (0.01) | 469 (0.00) | |
| Simple Reg | 18.70 (0.02) | 16.06 (0.02) | 0.700 (0.01) | 25 (0.00) | |
| Enhanced Reg | 19.69 (0.02) | 17.68 (0.02) | 0.717 (0.00) | 144 (0.10) | |
| COPD | Autoencoder | 55.90 (0.02) | 42.16 (0.03) | 0.961 (0.00) | 200 (0.00) |
| LASSO ( | 58.02 (0.02) | 44.50 (0.02) | 0.963 (0.00) | 266 (0.04) | |
| Random Forest | 56.19 (0.02) | 40.45 (0.03) | 0.956 (0.00) | 469 (0.00) | |
| Simple Reg | 51.51 (0.02) | 35.53 (0.03) | 0.952 (0.00) | 21 (0.00) | |
| Enhanced Reg | 57.06 (0.02) | 43.62 (0.02) | 0.962 (0.00) | 161 (0.08) | |
| AMI | Autoencoder | 57.40 (0.04) | 68.80 (0.03) | 0.985 (0.00) | 200 (0.00) |
| LASSO ( | 58.57 (0.04) | 70.10 (0.04) | 0.986 (0.00) | 64 (0.59) | |
| Random Forest | 56.32 (0.03) | 65.90 (0.03) | 0.982 (0.00) | 469 (0.00) | |
| Simple Reg | 52.24 (0.04) | 56.43 (0.06) | 0.984 (0.00) | 11 (0.00) | |
| Enhanced Reg | 59.26 (0.04) | 70.66 (0.03) | 0.986 (0.00) | 129 (0.14) | |
| Heart Failure | Autoencoder | 61.48 (0.02) | 43.94 (0.02) | 0.961 (0.00) | 200 (0.00) |
| LASSO ( | 63.15 (0.02) | 45.88 (0.02) | 0.964 (0.00) | 195 (0.08) | |
| Random Forest | 60.67 (0.02) | 42.56 (0.02) | 0.958 (0.00) | 469 (0.00) | |
| Simple Reg | 57.81 (0.02) | 38.50 (0.02) | 0.954 (0.00) | 18 (0.00) | |
| Enhanced Reg | 62.37 (0.02) | 45.09 (0.02) | 0.962 (0.00) | 158 (0.10) | |
| Pneumonia | Autoencoder | 40.17 (0.03) | 34.56 (0.03) | 0.955 (0.00) | 200 (0.00) |
| LASSO ( | 42.18 (0.03) | 35.94 (0.02) | 0.958 (0.00) | 204 (0.09) | |
| Random Forest | 38.27 (0.03) | 32.44 (0.03) | 0.951 (0.00) | 469 (0.00) | |
| Simple Reg | 32.44 (0.02) | 28.76 (0.02) | 0.942 (0.00) | 11 (0.00) | |
| Enhanced Reg | 41.39 (0.03) | 35.54 (0.02) | 0.957 (0.00) | 173 (0.08) |