| Literature DB >> 33185672 |
Honghan Wu1,2, Huayu Zhang3, Andreas Karwath4,5, Zina Ibrahim2,6, Ting Shi7, Xin Zhang8, Kun Wang9, Jiaxing Sun9, Kevin Dhaliwal10, Daniel Bean6, Victor Roth Cardoso4,5, Kezhi Li1, James T Teo11, Amitava Banerjee1, Fang Gao-Smith12,13, Tony Whitehouse12,13, Tonny Veenith12,13, Georgios V Gkoutos4,5,14, Xiaodong Wu9,15, Richard Dobson1,2,6, Bruce Guthrie16.
Abstract
OBJECTIVE: Risk prediction models are widely used to inform evidence-based clinical decision making. However, few models developed from single cohorts can perform consistently well at population level where diverse prognoses exist (such as the SARS-CoV-2 [severe acute respiratory syndrome coronavirus 2] pandemic). This study aims at tackling this challenge by synergizing prediction models from the literature using ensemble learning.Entities:
Keywords: COVID-19; decision support; ensemble learning; model synergy; risk prediction
Mesh:
Year: 2021 PMID: 33185672 PMCID: PMC7717299 DOI: 10.1093/jamia/ocaa295
Source DB: PubMed Journal: J Am Med Inform Assoc ISSN: 1067-5027 Impact factor: 4.497
Figure 1.Validation cohorts, prognosis models, and ensemble learning. (A) The 4 validation cohorts. A.1 shows cohort size and mortalities; A.2 shows follow-ups aligned with wave 1 periods of China and the United Kingdom (red indicates high new daily cases); A.3 shows age distributions; A.4-A.7 show distributions of bloods and vitals. (B) Timeline of follow-up periods of derivation cohorts of all individual prediction models. (C) Illustrative diagram of ensemble learning by combining 3 linear models for binary classification. KCH: King’s College Hospital; UHB: University Hospitals Birmingham.
Figure 2.Architecture of the proposed ensemble learning framework. At the center is the ensemble method taking 7 individual models as input (top left) and synergizing them based on their competence on target cohorts. Four international COVID-19 cohorts (top right) were included in this study for evaluation of ensemble learning (bottom). KCH: King’s College Hospital; UHB: University Hospitals Birmingham.
The baselines of poor prognosis and death subgroups vs not poor prognosis and survival subgroups of 4 cohorts.
| Wuhan01 Cohort (n = 2869) | Wuhan02 Cohort (n = 357) | KCH Cohort (n = 1475) | UHB Cohort (n = 693) | |||||
|---|---|---|---|---|---|---|---|---|
| Not Poor Prognosis (n = 2738) | Poor Prognosis (n = 131) | Did Not Die (n = 194) | Died (n = 163) | Not Poor Prognosis (n = 949) | Poor Prognosis (n = 526) | Not Poor Prognosis (n = 477) | Poor Prognosis (n = 216) | |
| Age, y | 60 (49-68) | 70 (63-78) | 51 (37-62) | 69 (62-77) | 69 (54-81) | 75 (60-86) | 72 (57-82) | 70 (56-80) |
| Male | 1389 (50.7) | 84 (64.1) | 91 (46.9) | 118 (72.4) | 514 (54.2) | 330 (62.7) | 254 (53.2) | 144 (66.7) |
|
| ||||||||
|
Red cell distribution width (percentage) | 12.9 (12.3-13.5) | 13.0 (12.5-14.0) | 12.0 (11.8-12.7) | 12.9 (12.3-13.9) | – | – | 13.7 (12.7-15.4) | 13.9 (13.2-15.1) |
|
Albumin (g/L) | 38.3 (35.5-40.7) | 31.6 (28.7-35.0) | 37.5 (34.2-40.2) | 30.1 (27.6-33.0) | 38.0 (35.0-41.0) | 36.0 (33.0-39.0) | 31.0 (26.0-35.0) | 28.0 (22.0-32.0) |
|
C-reactive protein (mg/L) | 2.1 (0.8-7.3) | 59.9 (14.2-120.0) | 19.5 (3.8-49.8) | 114.1 (61.9-178.8) | 72.5 (28.8-127.9) | 112.2 (56.8-216.5) | 83.0 (42.0-140.2) | 180.0 (102.5-267.0) |
|
Serum blood urea nitrogen (mmol/L) | 4.3 (3.6-5.4) | 6.8 (5.0-11.0) | – | – | – | – | 6.3 (4.5-10.4) | 8.1 (5.4-13.1) |
|
Lymphocyte count (109/L) | 1.5 (1.1-1.9) | 0.7 (0.5-1.1) | 1.1 (0.8-1.5) | 0.6 (0.4-0.8) | 1.0 (0.7-1.4) | 0.9 (0.6-1.4) | 0.9 (0.7-1.3) | 0.9 (0.6-1.2) |
|
Direct bilirubin (umol/L) | 3.3 (2.5-4.4) | 5.4 (3.5-7.2) | 3.5 (2.5-4.7) | 6.2 (4.4-9.2) | – | – | 10.0 (7.0-14.0) | 11.0 (8.0-20.0) |
|
Lactate dehydrogenase (IU/L) | 174.6 (150.3-210.2) | 332.2 (244.9-461.0) | 250.0 (202.2-310.5) | 567.0 (427.5-762.0) | – | – | 316.5 (245.8-411.0) | 436.0 (340.0-623.0) |
|
Serum sodium (mmol/L) | 141.6 (140.0-143.2) | 139.8 (137.4-143.4) | 139.2 (136.5-141.2) | 138.9 (135.8-143.6) | – | – | 137.0 (134.0-140.0) | 138.0 (135.0-143.0) |
|
Neutrophil count (109/L) | 3.5 (2.7-4.5) | 6.7 (4.8-9.9) | – | – | 5.1 (3.7-7.4) | 6.6 (4.5-9.4) | 4.7 (3.4-6.7) | 6.7 (4.8-9.4) |
|
Oxygen saturation (percentage) | 97.8 (97.0-98.2) | 96.6 (94.5-97.7) | – | – | 19 (18-20) | 23 (20-28) | 94.0 (93.0-96.0) | 92.0 (88.0-94.0) |
Values are median (interquartile range) or n (%). Poor prognosis is defined as either intensive care unit stay or death. Wuan02 does not have intensive care unit stay data; therefore, its analysis only compared death/survival instead.
KCH: King’s College Hospital; UHB: University Hospitals Birmingham.
Seven prognosis prediction models.
| Shi | Xie | Dong | Levy | Yan | Gong | Lu | |
|---|---|---|---|---|---|---|---|
| Outcome | Poor prognosis | Death | Poor prognosis | Death | Death | Poor prognosis | Death |
| Model type | Scoring | Logistic regression | Nomogram | NOCOSa | Decision tree | Nomogram | Scoring |
| Region | Zhejiang | Wuhan | Anhui, Beijing | New York | Wuhan | Wuhan, Guangzhou | Wuhan |
| Derivation cohort size | 487 | 299 | 208 | 11,095 | 375 | 189 | 577 |
| Age, y | 46 (27-65) | 65 (54-73) | 44 (28-60) | 65 (54-77) | 59 (42-75) | 49 (35-63) | 55 (39-66) |
| Follow-up period (in 2020) | Unknown to February 17 | January 1 to Feb01 | January 20 to March 18 | March 01 to May 05 | January 10 to February 18 | January 20 to March 02 | January 21 to February 05 |
| Mortality rate | – | 51.84% | – | 23.40% | 41.33% | – | 6.76% |
| Poor prognosis rate | 10.06% | – | 19.23% | – | – | 14.81% | 17.33% |
Values are median (interquartile range) or n (%). For outcomes, poor prognosis is defined as severities including length of stay, intensive care unit stay, or categories of treatments. For model type, scoring refers to models that calculate a sum from scores predefined to individual predictor values; logistic regression and decision tree refers to models in which these computational models are used; nomogram refers to models represented as a 2-dimensional graphical calculating diagram. aCustomized model.
Figure 3.Validation results of discrimination, clinical usefulness, and calibration. (A) Discrimination performances: median (95% confidence interval [CI]). (B) Positive predictive value (PPV), sensitivity, and specificity of all models validated on cohort-specific prediction rate. Models that could not achieve expected prediction rates were excluded. (C) Calibration results on 4 validation cohorts: median (95% CI) where empty cells are for those models that were not validated because they were derived from the same hospital data. KCH: King’s College Hospital; UHB: University Hospitals Birmingham.
Net reclassification improvements of Ensemble model compared with the best individual model on each validation cohort
| Wuhan01 (Ensemble vs Xie) | Wuhan02 (Ensemble vs Dong) | KCH (Ensemble vs Levy) | UHB (Ensemble vs Levy) | |||||
|---|---|---|---|---|---|---|---|---|
| Event | No Event | Event | No Event | Event | No Event | Event | No Event | |
| Higher | 13 | 132 | 26 | 10 | 51 | 77 | 15 | 42 |
| Lower | 7 | 124 | 16 | 17 | 48 | 74 | 11 | 37 |
| Total | 432 | 2,438 | 127 | 230 | 642 | 833 | 325 | 368 |
| Net reclassification improvements | 1.72% | 4.83% | 0.83% | 2.59% | ||||
KCH: King’s College Hospital; UHB: University Hospitals Birmingham.