| Literature DB >> 28096065 |
Joon Lee1.
Abstract
BACKGROUND: With a large-scale electronic health record repository, it is feasible to build a customized patient outcome prediction model specifically for a given patient. This approach involves identifying past patients who are similar to the present patient and using their data to train a personalized predictive model. Our previous work investigated a cosine-similarity patient similarity metric (PSM) for such patient-specific predictive modeling.Entities:
Keywords: critical care; forecasting; patient similarity; predictive analytics; random forest
Year: 2017 PMID: 28096065 PMCID: PMC5285604 DOI: 10.2196/medinform.6690
Source DB: PubMed Journal: JMIR Med Inform
List of predictor variables.
| Category | Predictor variables |
| Demographics | Age, gender |
| Administrative information | Admission type (elective, urgent, emergency), ICUa service type (MICUb, SICUc, CCUd, CSRUe) |
| Vital signs (min. and max. every 6 hours during the first 24 hours in the ICU) | Heart rate, mean blood pressure, systolic blood pressure, SpO2, spontaneous respiratory rate, body temperature |
| Labs (min. and max. from the first 24 hours in the ICU | Hematocrit, white blood cell count, serum glucose, serum HCO3, serum potassium, serum sodium, blood urea nitrogen, and serum creatinine |
| Intervention (yes/no during the first 24 hours in the ICU) | Vasopressor therapy, mechanical ventilation, or continuous positive airway pressure |
| Others (from the first 24 hours in the ICU) | Worst Glasgow Coma Scale score, total urinary output every 6 hours |
aICU: intensive care unit.
bMICU: medical intensive care unit.
cSICU: surgical intensive care unit.
dCCU: coronary care unit.
eCSRU: cardiac surgery recovery unit.
Figure 1Random forest patient similarity metric formula.
Figure 2Mortality prediction performance measured in area under the receiver operating characteristic curve as a function of the number of similar patients. Mean and 95% confidence interval from 10-fold cross-validation are shown.
Figure 3Mortality prediction performance measured in area under the precision-recall curve as a function of the number of similar patients. Mean and 95% confidence interval from 10-fold cross-validation are shown.
Best predictive performance from each random forest patient similarity metric (PSM) model in terms of mean area under the receiver operating characteristic curve and area under the precision-recall curve in comparison with cosine PSM and traditional models with no PSM. All cosine PSM results are from Lee et al [14].
| Number of similar patients at best predictive performance | Best predictive performance, mean (95% CI) | |||||||||
| AUROCa | AUPRCb | AUROC | AUPRC | |||||||
| RFc PSMd | Cosine PSM | RF PSM | Cosine PSM | RF PSM | Cosine PSM | No PSM | RF PSM | Cosine PSM | No PSM | |
| DCe | 260 | 100 | 230 | 60 | 0.801 (0.792-0.811) | 0.797 (0.791-0.803) | 0.693 (0.679-0.707) | 0.429 (0.409-0.449) | 0.393 (0.378-0.407) | 0.280 (0.263-0.297) |
| LRf | 5000 | 6000 | 9000 | 6000 | 0.824 (0.815-0.832) | 0.830 (0.825-0.836) | 0.811 (0.799-0.821) | 0.460 (0.437-0.484) | 0.474 (0.460-0.488) | 0.449 (0.430-0.468) |
| DTg | 5000 | 2000 | 7000 | 4000 | 0.779 (0.775-0.784) | 0.753 (0.742-0.764) | 0.721 (0.705-0.738) | 0.352 (0.337-0.367) | 0.347 (0.335-0.358) | 0.339 (0.324-0.353) |
| RF | 15000 | — | 4000 | — | 0.839 (0.835-0.844) | — | 0.839 (0.835-0.844) | 0.507 (0.527-0.486) | — | 0.505 (0.487-0.523) |
| CSRFh | — | — | — | — | 0.832 (0.821-0.843) | — | — | 0.496 (0.520-0.471) | — | — |
aAUROC: area under the receiver operating characteristic curve.
bAUPRC: area under the precision-recall curve.
cRF: random forest.
dPSM: patient similarity metric.
eDC: death counting.
fLR: logistic regression.
gDT: decision tree.
hCSRF: case-specific random forest.
Figure 4Box plot comparing area under the receiver operating characteristic curves (AUROCs) from all 5 models. For death counting, logistic regression, decision tree, and random forest, the performance from the number of similar patients corresponding to the maximum mean AUROC is shown.
Figure 5Box plot comparing area under the precision-recall curves (AUPRCs) from all 5 models. For death counting, logistic regression, decision tree, and random forest, the performance from the number of similar patients corresponding to the maximum mean AUPRC is shown.