| Literature DB >> 36040784 |
Amber C Kiser1, Karen Eilbeck1, Jeffrey P Ferraro2, David E Skarda3,4, Matthew H Samore2,5, Brian Bucher1,4.
Abstract
BACKGROUND: With the widespread adoption of electronic healthcare records (EHRs) by US hospitals, there is an opportunity to leverage this data for the development of predictive algorithms to improve clinical care. A key barrier in model development and implementation includes the external validation of model discrimination, which is rare and often results in worse performance. One reason why machine learning models are not externally generalizable is data heterogeneity. A potential solution to address the substantial data heterogeneity between health care systems is to use standard vocabularies to map EHR data elements. The advantage of these vocabularies is a hierarchical relationship between elements, which allows the aggregation of specific clinical features to more general grouped concepts.Entities:
Keywords: data heterogeneity; electronic health records; machine learning; model transferability; standard vocabularies
Year: 2022 PMID: 36040784 PMCID: PMC9472055 DOI: 10.2196/39057
Source DB: PubMed Journal: JMIR Med Inform
Figure 1Example of the aggregation of baseline features to grouped concepts. Multiple ICD diagnosis codes describing “urinary tract infections,” including 10 used only in Hospital A, 5 used only in Hospital B, 11 used at both Hospital A and B, and 61 not used in either hospital, can be aggregated to 1 single CCS code. CCS: Clinical Classification Software; ICD: International Classification of Diseases.
Figure 2Example of data aggregation. ICD diagnosis codes were manually aggregated into single-level CCS codes. LOINC observations were aggregated into LOINC groups, consisting of a single possible level. Medi-Span consisted of 5 different possible levels of aggregation. Medi-Span drug names were grouped into the highest level of aggregation—Medi-Span drug groups. CCS: Clinical Classification Software; ICD: International Classification of Disease; LOINC: Logical Observation Identifiers Names and Codes.
Figure 3Flow of data through the study with the derivation for the final difference-in-difference (DiD) metric. Final evaluation steps to calculate the DiD included (1) performance difference between the internal and external validations for the baseline model; (2) performance difference between the internal and external validations for the grouped model; and (3) difference in the performance differences between the baseline and grouped models. AUC: area under the receiver operating characteristic curve.
Study demographics for both internal and external data sets.
| Characteristic | Hospital A (internal; N=5775) | Hospital B (external; N=15,434) | |||
| Age at time of surgery (years), mean (SD) | 52.6 (16.6) | 53.4 (18.1) | .01 | ||
| Gender, male, n (%) | 2765 (47.9) | 7576 (49.1) | .12 | ||
|
| |||||
|
| American Indian or Alaska Native | 86 (1.5) | 59 (0.4) | <.001 | |
|
| Asian | 81 (1.4) | 192 (1.2) | .40 | |
|
| Black or African American | 65 (1.1) | 127 (0.8) | .05 | |
|
| Native Hawaiian or Pacific Islander | 34 (0.6) | 147 (1) | .05 | |
|
| White | 5275 (91.3) | 14,216 (92.1) | .07 | |
|
| Unknown or not reported | 234 (4.1) | 693 (4.5) | .18 | |
| Ethnicity, Hispanic, n (%) | 575 (10) | 1384 (9) | .03 | ||
|
| |||||
|
| 0-29999 (skin/soft tissue) | 968 (16.8) | 2020 (13.1) | <.001 | |
|
| 30000-39999 (cardiovascular) | 594 (10.3) | 2222 (14.4) | <.001 | |
|
| 40000-49999 (gastrointestinal) | 4172 (72.2) | 10,796 (69.9) | .001 | |
|
| 50000-59999 (genitourinary) | 27 (0.5) | 99 (0.6) | .17 | |
|
| 60000-69999 (nervous system) | 14 (0.2) | 297 (1.9) | <.001 | |
| Inpatient or outpatient status, inpatient, n (%) | 2831 (49) | 7837 (50.8) | .02 | ||
|
| |||||
|
| Diabetes mellitus | 822 (14.2) | 2144 (13.9) | .54 | |
|
| Current smoker within 1 year | 799 (13.8) | 2248 (14.6) | .18 | |
|
| Dyspnea | 498 (8.6) | 373 (2.4) | <.001 | |
|
| Functional heath status | 71 (1.2) | 376 (2.4) | <.001 | |
|
| Being ventilator-dependent | 20 (0.3) | 149 (1) | <.001 | |
|
| History of severe chronic obstructive pulmonary disease | 128 (2.2) | 417 (2.7) | .05 | |
|
| Ascites within 30 days prior to surgery | 8 (0.1) | 114 (0.7) | <.001 | |
|
| Congestive heart failure within 30 days prior to surgery | 24 (0.4) | 123 (0.8) | .004 | |
|
| Hypertension requiring medication | 1940 (33.6) | 5455 (35.3) | .02 | |
|
| Acute renal failure | 9 (0.2) | 53 (0.3) | .03 | |
|
| Currently requiring or on dialysis | 100 (1.7) | 283 (1.8) | .66 | |
|
| Disseminated cancer | 187 (3.2) | 246 (1.6) | <.001 | |
|
| Open wound with or without infection | 287 (5) | 512 (3.3) | <.001 | |
|
| Steroid or immunosuppressant use for chronic condition | 351 (6.1) | 644 (4.2) | <.001 | |
|
| >10% loss of body weight in the 6 months prior to surgery | 145 (2.5) | 372 (2.4) | .71 | |
|
| Bleeding disorder | 151 (2.6) | 1013 (6.6) | <.001 | |
Prevalence of selected outcomes in each hospital system.
| Outcome | Hospital A (N=5775), n (%) | Hospital B (N=15,434), n (%) | |
| Surgical site infection | 291 (5) | 761 (4.9) | .77 |
| Pneumonia | 44 (0.8) | 171 (1.1) | .03a |
| Sepsis | 175 (3) | 400 (2.6) | .09 |
| Urinary tract infection | 50 (0.9) | 125 (0.8) | .75 |
aPneumonia was significantly more prevalent in Hospital B (P<.05).
Difference-in-difference (DiD) metrics for each outcome. Means are based on 1000 bootstrapped iterations with 95% CIs. A positive DiD indicates that the grouped model resulted in a reduced drop in performance compared to the baseline model.
| Outcome, metric | Top baseline algorithm | Top grouped algorithm | Baseline internal validation, mean (95% CI) | Baseline external validation, mean (95% CI) | Grouped internal validation, mean (95% CI) | Grouped external validation, mean (95% CI) | DiD, mean (95% CI) | ||
|
| SVMb | LRc |
| ||||||
|
| AUCd |
|
| 0.906 (0.904-0.908) | 0.763 (0.762-0.764) | 0.904 (0.903-0.906) | 0.833 (0.833-0.834) | 0.072 (0.070-0.074) | <.001 |
|
|
|
| 0.501 (0.499-0.503) | 0.300 (0.299-0.302) | 0.476 (0.474-0.478) | 0.376 (0.375-0.376) | 0.100 (0.097-0.103) | <.001 | |
|
| LR | SVM |
| ||||||
|
| AUC |
|
| 0.953 (0.949-0.957) | 0.683 (0.682-0.685) | 0.994 (0.994-0.995) | 0.973 (0.973-0.974) | 0.250 (0.247-0.252) | <.001 |
|
|
|
| 0.504 (0.498-0.509) | 0.302 (0.299-0.305) | 0.456 (0.452-0.461) | 0.467 (0.465-0.468) | 0.212 (0.206-0.218) | <.001 | |
|
| LR | RFe |
| ||||||
|
| AUC |
|
| 0.964 (0.963-0.964) | 0.890 (0.889-0.891) | 0.948 (0.946-0.949) | 0.883 (0.883-0.884) | 0.008 (0.007-0.010) | <.001 |
|
|
|
| 0.469 (0.467-0.472) | 0.050 (0.050-0.050) | 0.419 (0.416-0.422) | 0.092 (0.092-0.093) | 0.091 (0.089-0.093) | <.001 | |
|
| SVM | LR |
| ||||||
|
| AUC |
|
| 0.898 (0.895-0.900) | 0.886 (0.885-0.887) | 0.936 (0.934-0.939) | 0.929 (0.928-0.930) | 0.006 (0.002-0.009) | .002 |
|
|
|
| 0.153 (0.148-0.158) | 0.063 (0.061-0.064) | 0.244 (0.241-0.246) | 0.225 (0.224-0.226) | 0.073 (0.068-0.077) | <.001 | |
aSSI: surgical site infection.
bSVM: support vector machine.
cLR: logistic regression.
dAUC: area under the receiver operating characteristic curve.
eRF: random forest.
fUTI: urinary tract infection.
Number of features in each category (diagnosis, medication, and laboratory) for Hospital A, Hospital B, and those shared between them.
| Features | Training Set (Hospital A), n | External Set (Hospital B), n | Shared, n | |
|
| ||||
|
| Total | 9559 | 7926 | 5275 |
|
| ICDa diagnosis codes | 7708 | 6859 | 4392 |
|
| Medi-Span drug names | 1311 | 531 | 531 |
|
| LOINCb codes | 540 | 536 | 352 |
|
| ||||
|
| Total | 805 | 817 | 805 |
|
| CCSc diagnosis codes | 287 | 287 | 287 |
|
| Medi-Span drug groups | 94 | 94 | 94 |
|
| LOINC groups | 424 | 436 | 424 |
aICD: International Classification of Diseases.
bLOINC: Logical Observation Identifiers Names and Codes.
cCCS: Clinical Classification Software.
Difference-in-difference (DiD) metrics for the comparison between baseline and granular models and the comparison between baseline and grouped models. A positive DiD indicates the comparison model resulted in a reduced drop in performance compared to the baseline model.
| Metric, outcome | Granular comparison, DiD (95% CI) | Grouped comparison, DiD (95% CI) | |
|
| |||
|
| SSIb | 0.035 (0.033-0.037) | 0.072 (0.070-0.074) |
|
| Pneumonia | 0.226 (0.223-0.229) | 0.250 (0.247-0.252) |
|
| Sepsis | 0.015 (0.013-0.017) | 0.008 (0.007-0.010) |
|
| UTIc | –0.049 (–0.052 to –0.045) | 0.006 (0.002-0.009) |
|
| |||
|
| SSI | 0.017 (0.014-0.020) | 0.100 (0.097-0.103) |
|
| Pneumonia | 0.186 (0.179-0.193) | 0.212 (0.206-0.218) |
|
| Sepsis | 0.026 (0.023-0.028) | 0.091 (0.089-0.093) |
|
| UTI | 0.039 (0.035-0.043) | 0.073 (0.068-0.077) |
aAUC: area under the receiver operating characteristic curve.
bSSI: surgical site infection.
cUTI: urinary tract infection.
Comparison of models developed from baseline data with models developed from the combination of baseline and grouped data. The difference-in-difference (DiD) reflects the AUC and F1-score for surgical site infection. A positive DiD indicates the combination model resulted in a smaller drop in performance than the baseline model.
| Combination | Medications | Laboratory tests | Diagnosis codes | AUCa, DiD (95% CI) | ||
| Combination 1 | Grouped | Baseline | Baseline | 0.054 (0.052-0.057) | 0.072 (0.069-0.074) | <.001 |
| Combination 2 | Baseline | Grouped | Baseline | 0.012 (0.010-0.014) | 0.046 (0.043-0.049) | <.001 |
| Combination 3 | Baseline | Baseline | Grouped | 0.049 (0.047-0.051) | 0.134 (0.131-0.137) | <.001 |
aAUC: area under the receiver operating characteristic curve.