| Literature DB >> 33121479 |
Dongdong Zhang1,2, Changchang Yin3, Jucheng Zeng1,2, Xiaohui Yuan2, Ping Zhang4,5.
Abstract
BACKGROUND: The broad adoption of electronic health records (EHRs) provides great opportunities to conduct health care research and solve various clinical problems in medicine. With recent advances and success, methods based on machine learning and deep learning have become increasingly popular in medical informatics. However, while many research studies utilize temporal structured data on predictive modeling, they typically neglect potentially valuable information in unstructured clinical notes. Integrating heterogeneous data types across EHRs through deep learning techniques may help improve the performance of prediction models.Entities:
Keywords: Data fusion; Deep learning; Electronic health records; Time series forecasting
Mesh:
Year: 2020 PMID: 33121479 PMCID: PMC7596962 DOI: 10.1186/s12911-020-01297-6
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Label statistics and characteristics of 3 prediction tasks. SD represents standard deviation
| In-hospital mortality | 30-day readmission | Long length of stay | |||||
|---|---|---|---|---|---|---|---|
| Yes | No | Yes | No | Yes | No | ||
| # of admissions (%) | 3771 (9.6) | 35658 (90.4) | 2237 (5.7) | 37192 (94.3) | 19689 (49.9) | 19740 (50.1) | |
| Length of hospital stay (SD) | 12.1 (14.4) | 10.1 (10.3) | 12.8 (13.2) | 10.1 (10.6) | 16.3 (12.5) | 4.2 (1.6) | |
| Demographics | Age (SD) | 68.1 (14.8) | 61.7 (16.6) | 63.1 (16.4) | 62.2 (16.6) | 63.5 (15.9) | 61.1 (17.1) |
| Gender (male) | 2102 | 20699 | 1315 | 21486 | 11370 | 11431 | |
| Admission type | EMERGENCY | 3512 | 29164 | 1987 | 30689 | 16493 | 16183 |
| ELECTIVE | 142 | 5661 | 213 | 5590 | 2648 | 3155 | |
| URGENT | 117 | 833 | 37 | 913 | 548 | 402 | |
| Vital signs (SD) | Heart rate | 89.9 (20.2) | 84.3 (17.6) | 85.5 (17.6) | 84.8 (17.9) | 87.0 (18.5) | 83.2 (17.2) |
| SysBP | 116.3 (23.0) | 120.0 (21.1) | 119.2 (22.8) | 119.7 (21.3) | 119.6 (21.9) | 119.7 (20.9) | |
| DiasBP | 58.9 (14.4) | 61.9 (14.2) | 61.6 (15.7) | 61.6 (14.2) | 61.1 (14.1) | 62.0 (14.4) | |
| MeanBP | 76.5 (15.9) | 79.0 (15.0) | 78.2 (16.4) | 78.8 (15.1) | 78.7 (15.4) | 78.7 (14.9) | |
| Respiratory rate | 20.6 (6.0) | 18.5 (5.2) | 19.0 (5.5) | 18.7 (5.3) | 19.1 (5.5) | 18.5 (5.1) | |
| Temperature | 36.8 (1.1) | 36.9 (0.8) | 36.8 (0.8) | 36.9 (0.8) | 36.9 (0.9) | 36.8 (0.8) | |
| SpO2 | 97.0 (4.1) | 97.3 (2.7) | 97.3 (2.9) | 97.3 (2.9) | 97.4 (2.9) | 97.2 (2.9) | |
| Lab tests (SD) | Anion gap | 16.2 (5.0) | 14.0 (3.6) | 14.4 (3.9) | 14.2 (3.9) | 14.4 (3.8) | 14.0 (3.9) |
| Albumin | 2.8 (0.6) | 3.2 (0.6) | 3.0 (0.6) | 3.1 (0.6) | 3.0 (0.6) | 3.2 (0.6) | |
| Bands | 10.6 (12.0) | 10.0 (9.9) | 9.7 (9.5) | 10.2 (10.4) | 10.2 (10.4) | 10.0 (10.4) | |
| Bicarbonate | 21.8 (5.7) | 23.7 (4.7) | 24.1 (5.4) | 23.5 (4.8) | 23.4 (4.9) | 23.6 (4.8) | |
| Bilirubin | 4.1 (7.3) | 1.7 (3.7) | 2.3 (5.1) | 2.1 (4.5) | 2.3 (4.8) | 1.8 (4.0) | |
| Creatinine | 1.8 (1.6) | 1.4 (1.7) | 1.9 (2.1) | 1.5 (1.7) | 1.6 (1.8) | 1.4 (1.6) | |
| Chloride | 104.9 (7.6) | 105.4 (6.4) | 104.2 (6.9) | 105.4 (6.5) | 105.2 (6.7) | 105.6 (6.3) | |
| Glucose | 150.8 (79.3) | 141.7 (70.6) | 142.5 (74.3) | 142.7 (71.6) | 143.8 (69.1) | 141.5 (74.5) | |
| Hematocrit | 31.2 (5.8) | 31.5 (5.4) | 30.4 (5.4) | 31.6 (5.5) | 31.3 (5.5) | 31.7 (5.4) | |
| Hemoglobin | 10.5 (2.0) | 10.9 (1.9) | 10.3 (1.8) | 10.8 (1.9) | 10.7 (1.9) | 10.9 (1.9) | |
| Lactate | 4.0 (3.5) | 2.4 (1.8) | 2.5 (1.8) | 2.7 (2.2) | 2.7 (2.1) | 2.6 (2.4) | |
| Platelet | 193.3 (126.6) | 212.2 (110.7) | 209.7 (122.9) | 210.2 (111.9) | 209.1 (120.1) | 211.4 (103.6) | |
| Potassium | 4.2 (0.8) | 4.1 (0.7) | 4.2 (0.7) | 4.1 (0.7) | 4.1 (0.7) | 4.1 (0.7) | |
| PTT | 45.2 (28.1) | 40.8 (25.2) | 42.9 (26.5) | 41.2 (25.5) | 42.4 (25.9) | 39.9 (25.1) | |
| INR | 1.8 (1.4) | 1.5 (0.9) | 1.6 (1.1) | 1.5 (1.0) | 1.6 (1.1) | 1.5 (0.9) | |
| PT | 18.3 (8.8) | 15.9 (6.7) | 17.4 (9.0) | 16.1 (6.9) | 16.5 (7.4) | 15.8 (6.5) | |
| Sodium | 138.7 (6.5) | 138.8 (5.0) | 138.5 (5.3) | 138.8 (5.2) | 138.7 (5.4) | 138.8 (5.0) | |
| BUN | 37.1 (27.2) | 25.3 (21.7) | 31.9 (24.9) | 26.2 (22.5) | 28.7 (23.9) | 24.3 (21.0) | |
| WBC | 14.4 (21.4) | 11.6 (10.7) | 12.2 (20.1) | 11.9 (11.6) | 12.3 (13.4) | 11.5 (10.9) | |
Fig. 1Architecture of CNN-based fusion-CNN. Fusion-CNN uses document embeddings, 2-layer CNN and max-pooling to model sequential clinical notes. Similarly, 2-layer CNN and max-pooling are used to model temporal signals. The final patient representation is the concatenation of the latent representation of sequential clinical notes, temporal signals, and the static information vector. Then the final patient representation is passed to output layers to make predictions
Fig. 2Architecture of LSTM-based Fusion-LSTM. Fusion-LSTM uses document embeddings, a BiLSTM layer, and a max-pooling layer to model sequential clinical notes. 2-layer LSTMs are used to model temporal signals. The concatenated patient representation is passed to output layers to make predictions
In-hospital mortality prediction on MIMIC-III. U, T, S represents unstructured data, temporal signals, and static information respectively
| Model | Model inputs | F1 | AUROC | AUPRC | ||
|---|---|---|---|---|---|---|
| Baseline models | LR | 0.341 (0.325, 0.357) | 0.805 (0.799, 0.811) | 0.188 (0.173, 0.203) | 1 | |
| LR | 0.373 (0.358, 0.388) | 0.825 (0.817, 0.833) | 0.210 (0.200, 0.220) | < 0.001 | ||
| LR | 0.395 (0.380, 0.410) | 0.862 (0.859, 0.865) | 0.230 (0.217, 0.243) | < 0.001 | ||
| RF | 0.349 (0.325, 0.373) | 0.735 (0.720, 0.750) | 0.181 (0.157, 0.205) | < 0.001 | ||
| RF | 0.255 (0.236, 0.274) | 0.665 (0.657, 0.673) | 0.134 (0.126, 0.142) | < 0.001 | ||
| RF | 0.349 (0.331, 0.367) | 0.735 (0.724, 0.746) | 0.181 (0.163, 0.199) | < 0.001 | ||
| Deep models | Fusion-CNN | 0.346 (0.330, 0.362) | 0.827 (0.823, 0.831) | 0.194 (0.184, 0.204) | < 0.001 | |
| Fusion-CNN | 0.358 (0.341, 0.375) | 0.826 (0.825, 0.827) | 0.201 (0.198, 0.204) | < 0.001 | ||
| Fusion-CNN | 0.398 (0.378, 0.418) | 0.870 (0.866, 0.874) | 0.233 (0.220, 0.246) | < 0.001 | ||
| Fusion-LSTM | 0.374 (0.365, 0.383) | 0.837 (0.834, 0.840) | 0.211 (0.207, 0.215) | < 0.001 | ||
| Fusion-LSTM | 0.372 (0.352, 0.392) | 0.828 (0.824, 0.832) | 0.209 (0.207, 0.211) | < 0.001 | ||
| Fusion-LSTM | < 0.001 |
The bold in the table is maximum values of that evaluation metrics
Long length of stay prediction on MIMIC-III. U, T, S represents unstructured data, temporal signals, and static information respectively
| Model | Model inputs | F1 | AUROC | AUPRC | ||
|---|---|---|---|---|---|---|
| Baseline models | LR | 0.668 (0.658, 0.678) | 0.735 (0.732, 0.738) | 0.615 (0.611, 0.619) | 1 | |
| LR | 0.686 (0.683, 0.689) | 0.736 (0.732, 0.740) | 0.614 (0.610, 0.618) | 0.5643 | ||
| LR | 0.703 (0.699, 0.707) | 0.773 (0.770, 0.776) | 0.642 (0.637, 0.647) | < 0.001 | ||
| RF | 0.523 (0.462, 0.584) | 0.695 (0.689, 0.701) | 0.586 (0.577, 0.595) | < 0.001 | ||
| RF | 0.568 (0.479, 0.657) | 0.651 (0.642, 0.660) | 0.559 (0.553, 0.565) | < 0.001 | ||
| RF | 0.537 (0.533, 0.541) | 0.718 (0.714, 0.722) | 0.597 (0.591, 0.603) | < 0.001 | ||
| Deep models | Fusion-CNN | 0.674 (0.667, 0.681) | 0.748 (0.745, 0.751) | 0.640 (0.635, 0.645) | < 0.001 | |
| Fusion-CNN | 0.695 (0.683, 0.707) | 0.742 (0.741, 0.743) | 0.635 (0.632, 0.638) | < 0.001 | ||
| Fusion-CNN | < 0.001 | |||||
| Fusion-LSTM | 0.690 (0.684, 0.696) | 0.757 (0.756, 0.758) | 0.644 (0.643, 0.645) | < 0.001 | ||
| Fusion-LSTM | 0.702 (0.697, 0.707) | 0.746 (0.745, 0.747) | 0.637 (0.634, 0.640) | < 0.001 | ||
| Fusion-LSTM | 0.716 (0.711, 0.721) | 0.778 (0.776, 0.780) | 0.660 (0.657, 0.663) | < 0.001 |
The bold in the table is maximum values of that evaluation metrics
30-day readmission prediction on MIMIC-III. U, T, S represents unstructured data, temporal signals, and static information respectively
| Model | Model inputs | F1 | AUROC | AUPRC | ||
|---|---|---|---|---|---|---|
| Baseline models | LR | 0.144 (0.136, 0.152) | 0.649 (0.646, 0.652) | 0.071 (0.062, 0.080) | 1 | |
| LR | 0.142 (0.133, 0.151) | 0.638 (0.634, 0.642) | 0.070 (0.056, 0.084) | < 0.001 | ||
| LR | 0.144 (0.137, 0.151) | 0.660 (0.657, 0.663) | 0.072 (0.059, 0.085) | < 0.001 | ||
| RF | 0.123 (0.113, 0.133) | 0.575 (0.559, 0.591) | 0.060 (0.054, 0.066) | < 0.001 | ||
| RF | 0.117 (0.105, 0.129) | 0.557 (0.539, 0.575) | 0.059 (0.056, 0.062) | < 0.001 | ||
| RF | 0.118 (0.111, 0.125) | 0.560 (0.543, 0.577) | 0.059 (0.056, 0.062) | < 0.001 | ||
| Deep models | Fusion-CNN | 0.155 (0.146, 0.164) | 0.657 (0.650, 0.664) | 0.077 (0.073, 0.081) | 0.0208 | |
| Fusion-CNN | 0.163 (0.160, 0.166) | 0.663 (0.660, 0.666) | 0.078 (0.077, 0.079) | < 0.001 | ||
| Fusion-CNN | 0.671 (0.668, 0.674) | < 0.001 | ||||
| Fusion-LSTM | 0.149 (0.146, 0.152) | 0.653 (0.651, 0.655) | 0.074 (0.071, 0.077) | 0.0158 | ||
| Fusion-LSTM | 0.158 (0.154, 0.162) | 0.641 (0.635, 0.647) | 0.075 (0.072, 0.078) | 0.0076 | ||
| Fusion-LSTM | 0.160 (0.151, 0.169) | 0.079 (0.076, 0.082) | < 0.001 |
The bold in the table is maximum values of that evaluation metrics
P value matrix of various model performances (AUROC) for in-hospital mortality prediction. U, T, S represents unstructured data, temporal signals, and static information respectively
| LR | RF | Fusion-CNN | Fusion-LSTM | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| LR | 1 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | |
| < 0.001 | 1 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | 0.5644 | 0.7432 | < 0.001 | 0.0047 | 0.3918 | < 0.001 | ||
| < 0.001 | < 0.001 | 1 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | 0.0018 | < 0.001 | < 0.001 | < 0.001 | ||
| RF | < 0.001 | < 0.001 | < 0.001 | 1 | < 0.001 | 1 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | |
| < 0.001 | < 0.001 | < 0.001 | < 0.001 | 1 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | ||
| < 0.001 | < 0.001 | < 0.001 | 1 | < 0.001 | 1 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | ||
| Fusion-CNN | < 0.001 | 0.5644 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | 1 | 0.5428 | < 0.001 | < 0.001 | 0.6591 | < 0.001 | |
| < 0.001 | 0.7432 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | 0.5428 | 1 | < 0.001 | < 0.001 | 0.232 | < 0.001 | ||
| < 0.001 | < 0.001 | 0.0018 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | 1 | < 0.001 | < 0.001 | 0.6062 | ||
| Fusion-LSTM | < 0.001 | 0.0047 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | 1 | 0.0011 | < 0.001 | |
| < 0.001 | 0.3918 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | 0.6591 | 0.232 | < 0.001 | 0.0011 | 1 | < 0.001 | ||
| < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | 0.6062 | < 0.001 | < 0.001 | 1 | ||
P value matrix of various model performances (AUROC) for long length of stay prediction. U, T, S represents unstructured data, temporal signals, and static information respectively
| LR | RF | Fusion-CNN | Fusion-LSTM | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| LR | 1 | 0.5643 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | |
| 0.5643 | 1 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | 0.0024 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | ||
| < 0.001 | < 0.001 | 1 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | 0.0073 | ||
| RF | < 0.001 | < 0.001 | < 0.001 | 1 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | |
| < 0.001 | < 0.001 | < 0.001 | < 0.001 | 1 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | ||
| < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | 1 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | ||
| Fusion-CNN | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | 1 | < 0.001 | < 0.001 | < 0.001 | 0.1134 | < 0.001 | |
| < 0.001 | 0.0024 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | 1 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | ||
| < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | 1 | < 0.001 | < 0.001 | 0.002 | ||
| Fusion-LSTM | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | 1 | < 0.001 | < 0.001 | |
| < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | 0.1134 | < 0.001 | < 0.001 | < 0.001 | 1 | < 0.001 | ||
| < 0.001 | < 0.001 | 0.0073 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | 0.002 | < 0.001 | < 0.001 | 1 | ||
P value matrix of various model performances (AUROC) for 30-day readmission prediction. U, T, S represents unstructured data, temporal signals, and static information respectively
| LR | RF | Fusion-CNN | Fusion-LSTM | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| LR | 1 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | 0.0208 | < 0.001 | < 0.001 | 0.0158 | 0.0076 | < 0.001 | |
| < 0.001 | 1 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | 0.2449 | < 0.001 | ||
| < 0.001 | < 0.001 | 1 | < 0.001 | < 0.001 | < 0.001 | 0.3156 | 0.0954 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | ||
| RF | < 0.001 | < 0.001 | < 0.001 | 1 | 0.073 | 0.116 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | |
| < 0.001 | < 0.001 | < 0.001 | 0.073 | 1 | 0.7452 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | ||
| < 0.001 | < 0.001 | < 0.001 | 0.116 | 0.7452 | 1 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | ||
| Fusion-CNN | 0.0208 | < 0.001 | 0.3156 | < 0.001 | < 0.001 | < 0.001 | 1 | 0.0661 | 0.0011 | 0.1757 | 0.0012 | < 0.001 | |
| < 0.001 | < 0.001 | 0.0954 | < 0.001 | < 0.001 | < 0.001 | 0.0661 | 1 | 0.0011 | < 0.001 | < 0.001 | < 0.001 | ||
| < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | 0.0011 | 0.0011 | 1 | < 0.001 | < 0.001 | 0.07 | ||
| Fusion-LSTM | 0.0158 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | 0.1757 | < 0.001 | < 0.001 | 1 | < 0.001 | < 0.001 | |
| 0.0076 | 0.2449 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | 0.0012 | < 0.001 | < 0.001 | < 0.001 | 1 | < 0.001 | ||
| < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | < 0.001 | 0.07 | < 0.001 | < 0.001 | 1 | ||
Fig. 3Comparison of model running time with different inputs