| Literature DB >> 35466079 |
Laila Rasmy1, Masayuki Nigo2, Bijun Sai Kannadath3, Ziqian Xie1, Bingyu Mao1, Khush Patel1, Yujia Zhou1, Wanheng Zhang4, Angela Ross1, Hua Xu1, Degui Zhi5.
Abstract
BACKGROUND: Predicting outcomes of patients with COVID-19 at an early stage is crucial for optimised clinical care and resource management, especially during a pandemic. Although multiple machine learning models have been proposed to address this issue, because of their requirements for extensive data preprocessing and feature engineering, they have not been validated or implemented outside of their original study site. Therefore, we aimed to develop accurate and transferrable predictive models of outcomes on hospital admission for patients with COVID-19.Entities:
Mesh:
Year: 2022 PMID: 35466079 PMCID: PMC9023005 DOI: 10.1016/S2589-7500(22)00049-8
Source DB: PubMed Journal: Lancet Digit Health ISSN: 2589-7500
Figure 1CovRNN prediction tasks
Visit i represents the index visit. Visit i–1 represents the visit before the index visit.
Figure 2Model development and external validation datasets
CRWD=Cerner Real-World COVID-19 Q3 Dataset. OPTUM=Optum deidentified COVID-19 electronic health record dataset.
Descriptive statistics for CRWD and OPTUM extracted cohorts
| Median age at index visit, years | 57 (36–72) | 60 (44–72) | |
| Sex | |||
| Female | 130 540 (52·6%) | 18 237 (50·5%) | |
| Male | 116 653 (47·0%) | 17 885 (49·5%) | |
| Race and ethnicity | |||
| White | 168 606 (68·0%) | 19 704 (54·5%) | |
| African American | 36 762 (14·8%) | 7 836 (21·7%) | |
| Asian | 5494 (2·2%) | 930 (2·6%) | |
| American Indian or Alaska Native | 4285 (1·7%) | NA | |
| Hispanic | 72 068 (29·1%) | 5782 (16·0%) | |
| Comorbidities | |||
| Hypertension | 114 387 (46·1%) | 22 035 (61·0%) | |
| Diabetes | 64 023 (25·8%) | 12 942 (35·8%) | |
| Congestive heart failure | 36 040 (14·5%) | 6568 (18·2%) | |
| Chronic kidney disease | 34 789 (14·0%) | 7517 (20·8%) | |
| Cancer | 19 145 (7·7%) | 5094 (14·1%) | |
| In-hospital mortality | 13 607 (5·5%) | 4831 (13·4%) | |
| Median time to event, days | 8 (4–16) | 5 (3–10) | |
| Mechanical ventilation | 33 505 (13·5%) | 9582 (26·5%) | |
| Intubated on first day (index date) | 17 811 (7·2%) | 4466 (12·4%) | |
| Median time to event, days | 2 (1–5) | 3 (2–7) | |
| Prolonged hospital stay | 46 421 (18·7%) | 12 457 (34·5%) | |
| Median length of stay, days | 3 (1–6) | 5 (3–10) | |
| Total number of unique features | 123 642 | 67 128 | |
| Number of health-care systems | 87 | 197 | |
Data are median (IQR) or n (%). CRWD=Cerner Real-World COVID-19 Q3 Dataset. NA=not applicable. OPTUM=Optum deidentified COVID-19 electronic health record dataset.
Data do not add up to N totals as some patients fell into the other or unknown category.
This race did not appear in OPTUM.
Model performance on different CRWD test sets
| Logistic regression | Light gradient boost machine | CovRNN binary prediction | CovRNN survival prediction | Logistic regression | Light gradient boost machine | CovRNN binary prediction | CovRNN survival prediction | Logistic regression | Light gradient boost machine | CovRNN binary prediction | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Multi-hospital test set | 48 781 | 90·3% (89·8–90·8) | 91·5% (91·1–92·0) | 93·0% (92·6–93·4) | 86·0% (85·1–86·9) | 89·5% (89·1–89·9) | 91·2% (90·8–91·5) | 92·9% (92·6–93·2) | 92·6% (92·2–93·0) | 80·0% (79·5–80·4) | 81·7% (81·3–82·2) | 86·5% (86·2–86·9) |
| Hospital 1 | 3469 | 88·8% (86·9–90·5) | 91·0% (89·5–92·4) | 91·8% (90·3–93·2) | 86·0% (83·2–88·5) | 86·7% (85·1–88·4) | 88·4% (87·0–89·9) | 91·5% (90·2–92·8) | 90·8% (89·4–92·2) | 77·3% (75·5– 79·1) | 78·5% (76·7–80·2) | 87·2% (85·8–88·4) |
| Hospital 2 | 706 | 94·6% (91·9–96·9) | 95·1% (92·7–97·2) | 97·0% (95·2–98·6) | 91·6% (87·5–94·8) | 93·5% (90·7–95·8) | 95·6% (93·8–97·1) | 96·0% (94·2–97·7) | 93·8% (91·4–96·0) | 80·9% (76·9– 84·7) | 84·3% (80·5–87·7) | 88·3% (85·6–90·9) |
Data are area under the receiver operating characteristic curve (95% CI), unless otherwise indicated.
Unlike the binary outcomes used in the other models, CovRNN survival prediction uses time-to-event outcomes and the concordance index (95% CI) is shown. CRWD=Cerner Real-World COVID-19 Q3 Dataset.
Performance of CovRNN models on the OPTUM test set before and after fine-tuning
| In-hospital mortality binary prediction | 88·6% | 87·0% | 91·3% |
| Mechanical ventilation binary prediction | 90·4% | 72·5% | 91·5% |
| Prolonged hospital stay (>7 days) binary prediction | 78·1% | 68·0% | 81·0% |
| In-hospital mortality survival prediction | 86·1% | 77·1% | 88·9% |
| Mechanical ventilation survival prediction | 90·2% | 69·2% | 93·7% |
Data are area under the receiver operating characteristic curve, unless otherwise indicated. All data are based on evaluation in the OPTUM test set. CRWD=Cerner Real-World COVID-19 Q3 Dataset. OPTUM=Optum deidentified COVID-19 electronic health record dataset.
Unlike the binary classifications used in other models, values for the survival models represent the concordance index.
Figure 3Kaplan-Meier curves in the stratified survival analysis
In-hospital mortality (A) and mechanical ventilation (B) in the multi-hospital test set of the Cerner Real-World COVID-19 Q3 Dataset. In-hospital mortality (C) and mechanical ventilation (D) in the test set of the Optum deidentified COVID-19 electronic health record dataset. Stratification of patients is according to their predicted survival score over time in days since admission. Shaded areas indicate 95% CIs calculated on the logarithmic scale from the SEs of the Kaplan–Meier estimator with the centre values corresponding to the Kaplan–Meier estimate.
Figure 4Subgroup analysis using the CRWD multi-hospital test set
(A) Age group. (B) Comorbidity. (C) US census region. (D) Race. AUROC=area under the receiver operating characteristic curve. CRWD=Cerner Real-World COVID-19 Q3 Dataset.
Figure 5Calibration plots for the CRWD validation set, CRWD multi-hospital test set, and OPTUM test set
(A) In-hospital mortality. (B) Mechanical ventilation. (C) Prolonged hospital stay. CRWD=Cerner Real-World COVID-19 Q3 Dataset. OPTUM=Optum deidentified COVID-19 electronic health record dataset.