| Literature DB >> 30944914 |
Amie J Barda1,2, Victor M Ruiz1,2, Tony Gigliotti3, Fuchiang Rich Tsui1,2,4,5,6,7,8.
Abstract
OBJECTIVES: We aimed to gain a better understanding of how standardization of laboratory data can impact predictive model performance in multi-site datasets. We hypothesized that standardizing local laboratory codes to logical observation identifiers names and codes (LOINC) would produce predictive models that significantly outperform those learned utilizing local laboratory codes.Entities:
Keywords: heart failure; hospital readmission; logical observation identifiers names and codes; medical informatics/standards; predictive modeling
Year: 2019 PMID: 30944914 PMCID: PMC6435008 DOI: 10.1093/jamiaopen/ooy063
Source DB: PubMed Journal: JAMIA Open ISSN: 2574-2531
Figure 1.Example of LOINC mapping for potassium laboratory tests. A manual mapping from 1 hospital (left) was extended to map local laboratory test codes from 13 hospitals to LOINC. After mapping, we had an “non-standardized” dataset, where laboratory tests were identified via the unmapped, local laboratory test codes and a “standardized” dataset, where laboratory tests were identified via a LOINC code.
Features constructed to summarize laboratory test results each patient visit
| Included features | Summary of results for | |||
|---|---|---|---|---|
| All lab tests | Each categorical lab test | Each continuous lab test | ||
| Average # of tests per day (# tests/length of stay) | X | |||
| % Abnormal tests for most recent tests | X | |||
| % Abnormal tests | X | X | ||
| Flag (normal/abnormal) for most recent test | X | |||
| Most recent test result | XH | XH | ||
| Second most recent test result (if median test count >1) | XH | XH | ||
| First test result (if median test count >2) | XH | X | ||
| Baseline result (mean/mode of values prior to most recent) (if median test count>1) | X | XH | ||
| Nadir (min) result (if median test count>2) | XH | |||
| Apex (max) result (if median test count>2) | XH | |||
| Difference between most recent test result and…. | Second most recent test result | XH | ||
| First test result | X | |||
| Apex result | XH | |||
| Nadir result | XH | |||
| Baseline result | XH | |||
| % change between most recent test result and… | Second most recent test result | XH | ||
| First test result | X | |||
| Apex result | XH | |||
| Nadir result | XH | |||
| Baseline result | XH | |||
| Slope between most recent test result and… | Second most recent test result | XH | ||
| First test result | X | |||
| Apex result | XH | |||
| Nadir result | XH | |||
| Baseline result | XH | |||
X: feature was derived for dataset; H: feature was originally described in Hauskrecht et al.
Tests with “NA” flags were not included in these computations.
30-Day heart failure readmission model descriptions, evaluations, and comparisons. Prior to feature selection, there were 10,032 and 1881 features from non-standardized dataset (local codes) and standardized dataset (LOINC) respectively.
| # | Feature selection | Classifier | Dataset | Number of features | AUC (95% CI) | |
|---|---|---|---|---|---|---|
| 1 | Information gain | Logistic regression | Non-standardized (Local codes) | 1154 | 0.538 (0.516–0.559) | |
| 2 | Standardized (LOINC codes) | 388 | 0.573 (0.551–0.594) | |||
| 3 | Naïve Bayes | Non-standardized (Local codes) | 1154 | 0.560 (0.539–0.582) | ||
| 4 | Standardized (LOINC codes) | 388 | 0.603 (0.583–0.624) | |||
| 5 | Random forest | Non-standardized (Local codes) | 1154 | 0.590 (0.570–0.612) | 0.036 | |
| 6 | Standardized (LOINC codes) | 388 | 0.605 (0.585–0.626) | |||
| 7 | Correlation-based feature selection | Logistic regression | Non-standardized (Local codes) | 57 | 0.566 (0.545–0.587) | |
| 8 | Standardized (LOINC codes) | 46 | 0.601 (0.580–0.622) | |||
| 9 | Naïve Bayes | Non-standardized (Local codes) | 57 | 0.571 (0.550–0.592) | ||
| 10 | Standardized (LOINC codes) | 46 | 0.607 (0.586–0.628) | |||
| 11 | Random forest | Non-standardized (Local codes) | 57 | 0.561 (0.539–0.582) | ||
| 12 | Standardized (LOINC codes) | 46 | 0.602 (0.581–0.622) |
Note: Bolded P-values indicate significant differences in model performance.
Figure 2.LOINC mapping coverage and description of training and test datasets. “R” and “NR” stand for the classification as “Readmitted” or “Not Readmitted”, respectively.