| Literature DB >> 35614485 |
Jenna Marie Reps1, Ross D Williams2, Martijn J Schuemie3, Patrick B Ryan3, Peter R Rijnbeek2.
Abstract
BACKGROUND: Prognostic models that are accurate could help aid medical decision making. Large observational databases often contain temporal medical data for large and diverse populations of patients. It may be possible to learn prognostic models using the large observational data. Often the performance of a prognostic model undesirably worsens when transported to a different database (or into a clinical setting). In this study we investigate different ensemble approaches that combine prognostic models independently developed using different databases (a simple federated learning approach) to determine whether ensembles that combine models developed across databases can improve model transportability (perform better in new data than single database models)?Entities:
Keywords: Ensemble learning; Model transportability; Observational data; Patient-level prediction; Prognostic model
Mesh:
Year: 2022 PMID: 35614485 PMCID: PMC9134686 DOI: 10.1186/s12911-022-01879-6
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 3.298
Summary of the five databases used in this study
| Name | Type | Description | Start | End | Size (million lives) |
|---|---|---|---|---|---|
| IBM Medicare Supplemental Beneficiaries (MDCR) | US Claims | Patients aged 65 or older with supplemental healthcare | 2000–01–01 | 2019–12–31 | 10 |
| IBM Medicaid (MDCD) | US Claims | Patients with government subsidized healthcare | 2006–01–01 | 2018–12–31 | 28 |
| Optum® De-Identified Clinformatics® Data Mart Database (Optum Claims) | US Claims | Patients of all ages | 2000–05–01 | 2019–12–31 | 84 |
| IBM Commercial Claims and Encounters (CCAE) | US Claims | The patients in this database are aged 65 or younger. They are employees who receive health insurance through their employer and their dependents | 2000–01–01 | 2019–12–31 | 152 |
| Optum® de-identified Electronic Health Record Dataset (Optum EHR) | US EHR | Patients of all ages | 2006–01–01 | 2019–03–31 | 96 |
Fig. 1The leave-one-database-out design used to evaluate the transportability of the Level 1 models trained using a single database and the Level 2 ensembles that combine multiple Level 1 models. The figure shows that five different combinations were used, where four of the five databases were used to develop the models and the final database was used to fairly evaluate the transportability of the models. In addition, a model was trained using the left-out database to calculate the internal validation that could be considered the ‘internal benchmark’ performance for the database given sufficient training data. We compared how similar the external validation of each model was with the ‘internal benchmark’
The outcome counts and percentage of target population who develop the outcome during the tine-at-risk
| Outcome | CCAE (N ~ 499,678) (%) | MDCR (N ~ 160,956) (%) | MDCD (N ~ 469,302) (%) | Optum EHR (N ~ 499,881) (%) | Optum Claims (N ~ 499,753) (%) |
|---|---|---|---|---|---|
| Acute liver injury | 14,875 (3.35) | 7226 (5.4) | 21,654 (5.47) | 18,535 (4.18) | 18,619 (4.31) |
| Acute myocardial infarction | 1494 (0.3) | 935 (0.59) | 3800 (0.83) | 816 (0.16) | 1298 (0.26) |
| Alopecia | 10,672 (2.32) | 7569 (5.64) | 20,597 (5.2) | 16,597 (3.69) | 16,571 (3.75) |
| Constipation | 4170 (0.85) | 6399 (4.39) | 9210 (2.05) | 10,192 (2.13) | 10,282 (2.16) |
| Decreased libido | 491 (0.1) | 1080 (0.69) | 905 (0.19) | 287 (0.06) | 708 (0.14) |
| Delirium | 174 (0.03) | 510 (0.32) | 86 (0.02) | 267 (0.05) | 91 (0.02) |
| Diarrhea | 1661 (0.34) | 130 (0.08) | 785 (0.17) | 1210 (0.24) | 1603 (0.32) |
| Fracture | 509 (0.1) | 963 (0.61) | 894 (0.19) | 381 (0.08) | 758 (0.15) |
| Gastrointestinal hemorrhage | 985 (0.2) | 1298 (0.81) | 1666 (0.36) | 356 (0.07) | 1021 (0.2) |
| Hyponatremia | 19,754 (4.65) | 7824 (5.95) | 33,518 (9.82) | 24,043 (5.65) | 23,304 (5.67) |
| Hypotension | 380 (0.08) | 1153 (0.74) | 636 (0.14) | 230 (0.05) | 683 (0.14) |
| Hypothyroidism | 297 (0.06) | 642 (0.4) | 1056 (0.23) | 162 (0.03) | 333 (0.07) |
| Insomnia | 3046 (0.62) | 2086 (1.38) | 2468 (0.53) | 3049 (0.62) | 4114 (0.85) |
| Ischemic stroke all inpatient | 3120 (0.64) | 1824 (1.19) | 2655 (0.57) | 2775 (0.56) | 4139 (0.85) |
| Nausea | 2722 (0.56) | 4071 (2.77) | 4033 (0.89) | 4368 (0.9) | 5846 (1.22) |
| Open angle glaucoma | 6117 (1.33) | 3853 (2.83) | 5374 (1.22) | 8786 (2.03) | 9943 (2.33) |
| Seizure | 184 (0.04) | 67 (0.04) | 307 (0.07) | 94 (0.02) | 199 (0.04) |
| Suicide and suicidal ideation | 10,221 (2.13) | 993 (0.62) | 21,518 (5.09) | 9957 (2.1) | 8063 (1.67) |
| Tinnitus | 2628 (0.53) | 4276 (2.87) | 5082 (1.12) | 6920 (1.44) | 7643 (1.62) |
| Ventricular arrhythmia and sudden cardiac death | 20,806 (4.91) | 6846 (5.12) | 27,233 (6.92) | 23,655 (5.6) | 23,772 (5.89) |
| Vertigo | 2577 (0.53) | 748 (0.47) | 2269 (0.49) | 2341 (0.48) | 2782 (0.57) |
CCAE/Optum EHR/Optum claims contained more than 500,000 pharmaceutically treated depressed patients so we sampled 500,000 patients from each of these databases
A small number of the 500,000 patients sampled were excluded because the index date was the last time the patient was observed in the data (so they had no follow-up)
Characteristics of the target population (patients with depression initiating treatment) per database
| CCAE | MDCD | MDCR | Optum Claims | Optum EHR | |
|---|---|---|---|---|---|
| Mean age | 41 | 35 | 75 | 50 | 49 |
| Male (%) | 30.8 | 25.9 | 32.2 | 31.7 | 29.2 |
| Mean number outpatient visits in prior year | 16.3 | 31.2 | 26.8 | 16.6 | 32.4 |
| Frequency of patients experiencing condition in prior year | |||||
| Pain | 0.60 | 0.74 | 0.74 | 0.66 | 0.57 |
| Anxiety | 0.41 | 0.50 | 0.28 | 0.42 | 0.43 |
| Acute inflammatory disease | 0.32 | 0.36 | 0.24 | 0.31 | 0.18 |
| Neoplastic disease | 0.22 | 0.14 | 0.46 | 0.27 | 0.17 |
| Essential hypertension | 0.25 | 0.31 | 0.69 | 0.40 | 0.37 |
| Obesity | 0.11 | 0.19 | 0.11 | 0.13 | 0.17 |
| Heart disease | 0.09 | 0.14 | 0.46 | 0.20 | 0.18 |
| Diabetes mellitus | 0.09 | 0.14 | 0.27 | 0.16 | 0.16 |
| Urinary tract infectious disease | 0.09 | 0.14 | 0.16 | 0.12 | 0.07 |
| Anemia | 0.07 | 0.12 | 0.20 | 0.12 | 0.11 |
Fig. 2Box plots showing the difference between the external validation AUROC minus the internal validation AUROC per non-ensemble (Level 1 model) and ensemble method (Level 2 model) across the five databases. The rows represent the external database (the database that was excluded from the model/ensemble development) that was used to fairly evaluate the models/ensembles. The x-axis represents the model/ensemble technique. Box plots centered around 0 with a small range indicate highly transportable and consistent external discriminative performance. The dashed vertical lines separate the non-ensembles, the fusion ensembles, the mixture of expert ensembles and the stacking ensembles
Fig. 3Box plots of calibration-in-the-large (observed risk—mean predicted risk) values for each non-ensemble (Level 1 model) and ensemble (Level 2 model) when externally validated. The rows represent the external database (the database that was excluded from the model/ensemble development) that was used to fairly evaluate the models/ensembles. The x-axis represents the model/ensemble technique. Box plots centered around 0 with a small range indicate excellent external calibration performance. The dashed vertical lines separate the non-ensembles, the fusion ensembles, the mixture of expert ensembles and the stacking ensembles
Fig. 4The distribution of calibration gradient (slope) values for each non-ensemble (Level 1 model) and ensemble (Level 2 model) when externally validated. The rows represent the external database (the database that was excluded from the model/ensemble development) that was used to fairly evaluate the models/ensembles. The x-axis represents the model/ensemble technique. Box plots centered around 1 with a small range indicate excellent external calibration performance. The dashed vertical lines separate the non-ensembles, the fusion ensembles, the mixture of expert ensembles and the stacking ensembles