| Literature DB >> 34989682 |
Ni Wang1,2, Muyu Wang1,2, Yang Zhou3, Honglei Liu1,2, Lan Wei4, Xiaolu Fei4, Hui Chen1,2.
Abstract
BACKGROUND: Sequential information in electronic medical records is valuable and helpful for patient outcome prediction but is rarely used for patient similarity measurement because of its unevenness, irregularity, and heterogeneity.Entities:
Keywords: acute myocardial infarction; deep learning; electronic medical records; health data; informatics; machine learning; natural language processing; outcome prediction; patient similarity; time series
Mesh:
Year: 2022 PMID: 34989682 PMCID: PMC8778569 DOI: 10.2196/30720
Source DB: PubMed Journal: J Med Internet Res ISSN: 1438-8871 Impact factor: 5.428
Figure 1Study workflow: (a) Sequential similarity calculation for timestamped event sequence and time series data, (b) similarity calculation for cross-sectional information, (c) patient similarity measurement based on the weighted sum of similarities calculated in parts a and b, and (d) validation. AMI: acute myocardial infarction; kNN: k-nearest neighbors based on the proposed patient similarity measurement; kNNEucli: k-nearest neighbors based on the Euclidean distance; LR: logistic regression; RF: random forest; RNN: recurrent neutral network; LSTM: long short-term memory network; DTW: dynamic time warping.
Figure 2A case study of a patient’s clinical trajectory. All clinical events including laboratory tests, radiological examinations, and procedures are listed sequentially according to patient timeline. The multiple values of each laboratory test comprise the time series data shown in the line chart on the right side of the figure.
The construction of similarity-based predictive models based on different patient similarities.
| Similarity used | |||||||||||
|
|
|
|
|
|
| ||||||
|
|
| Yes | Yes | Yes | No | No | |||||
|
|
| Yes | Yes | Yes | No | No | |||||
|
|
| Yes | Yes | Yes | No | No | |||||
|
|
|
|
|
|
| ||||||
|
|
| Yes | No | No | Yes | No | |||||
|
|
| No | Yes | No | No | Yes | |||||
| Cross-sectional information–based similarity ( | Yes | Yes | Yes | Yes | Yes | ||||||
Basic characteristics of acute myocardial infarction patients in MIMIC-III data set and the private data set.
| Characteristic | MIMIC-III data set (n=3010), n (%) | Private data set (n=1846), n (%) | |
|
|
|
| |
|
| Age ≥60 years | 2408 (80.0) | 1131 (61.3) |
|
| Male gender | 1855 (61.6) | 1343 (72.8) |
|
| Married | 1583 (52.6) | 1815 (98.3) |
|
|
|
| |
|
| Urban Employee Basic | N/Aa | 1422 (77.0) |
|
| Medicare | 2030 (67.4) | N/A |
|
|
|
| |
|
| Laboratory test | 1,044,886 (347) | 349,563 (189) |
|
| Radiological examination | 19,171 (6) | 5827 (3) |
|
| Procedure | 19,630 (7) | 13,049 (7) |
|
|
|
| |
|
| Acute myocardial infarction-cause readmission, n (%) | 554 (18.4) | 100 (5.4) |
|
| All-cause in-hospital mortality, n (%) | 245 (8.2) | 132 (7.2) |
|
| Length of hospital stay, day, mean (standard deviation) | 10.0 (6.24) | 11.4 (5.85) |
aN/A: not applicable.
The predictive performance of 100 independent rounds of the outcome prediction on the MIMIC-III data seta.
| Model | Mortality | Readmission | ||
|
| AUROCb | F1-score | AUROC | F1-score |
| Euclidean distance | 0.756 (0.022) | 0.280 (0.030) | 0.592 (0.019) | 0.332 (0.019) |
| Logistic regression | 0.796 (0.024) | 0.336 (0.037) | 0.608 (0.022) | 0.347 (0.019) |
| Random forest | 0.834 (0.015) | 0.362 (0.033) | 0.579 (0.015) | 0.327 (0.020) |
| Long short-term memory network | 0.809 (0.022) | 0.356 (0.043) | 0.595 (0.020) | 0.339 (0.017) |
| Recurrent neural network | 0.814 (0.018) | 0.338 (0.039) | 0.590 (0.018) | 0.337 (0.018) |
| 0.816 (0.023) | 0.373 (0.047) | 0.566 (0.022) | 0.315 (0.027) | |
| 0.746 (0.026) | 0.295 (0.035) | 0.536 (0.026) | 0.295 (0.048) | |
| 0.878 (0.017) | 0.386 (0.041) | 0.623 (0.019) | 0.350 (0.018) | |
| 0.882 (0.016) | 0.401 (0.044) | 0.620 (0.018) | 0.350 (0.018) | |
| 0.883 (0.015) | 0.406 (0.050) | 0.620 (0.019) | 0.351 (0.019) | |
aMean: standard deviation.
bAUROC: area under the receiver operating characteristic curve.
Figure 3Heatmaps showing the pairwise comparisons among models for predicting mortality (A and C) and readmission (B and D) based on the public (A and B) and private (C and D) dataset. Number in each cell is the percent of times that the model in row had a higher performance than the model in column after 100 experiments. The performance is considered significantly higher if the number is greater than or equal to 0.95, and the corresponding cell is highlighted in color. KNNEucli: Euclidean distance k–nearest neighbor; KNND: kNN built on the dynamic time warping (DTW) -based trend similarity (ie, k–nearest neighborD); KNNH: kNN built on the Haar-based trend similarity (ie, k–nearest neighborH); KNNE: kNN built on the sequence similarity (ie, k–nearest neighborE); KNNEH: kNN built on the sequence similarity and Haar-based trend similarity (ie, k–nearest neighborEH); KNNED: kNN built on the sequence similarity and DTW-based trend similarity (ie, k–nearest neighborED); LR: logistic regression; RF: random forest; RNN: recurrent neutral network; LSTM: long short-term memory.
Figure 4The predictive performance of all models for predicting inpatient mortality (A and C) and readmission (B and D) in terms of area under the receiver operating characteristic curve (A and B) and F1-score (C and D). Stars (☆) indicate the highest predictive performances. No prediction was made at admission for LSTM, RNN, KNNH, KNND, KNNEH, and KNNED because of no available temporal information at that time. KNNEucli: Euclidean distance k–nearest neighbor; KNND: kNN built on the dynamic time warping (DTW) -based trend similarity (ie, k–nearest neighborD); KNNH: kNN built on the Haar-based trend similarity (ie, k–nearest neighborH); KNNE: kNN built on the sequence similarity (ie, k–nearest neighborE); KNNEH: kNN built on the sequence similarity and Haar-based trend similarity (ie, k–nearest neighborEH); KNNED: kNN built on the sequence similarity and DTW-based trend similarity (ie, k–nearest neighborED); LR: logistic regression; RF: random forest; RNN: recurrent neutral network; LSTM: long short-term memory.