| Literature DB >> 34951865 |
Fang He1,2, John H Page3, Kerry R Weinberg2, Anirban Mishra2.
Abstract
BACKGROUND: The current COVID-19 pandemic is unprecedented; under resource-constrained settings, predictive algorithms can help to stratify disease severity, alerting physicians of high-risk patients; however, there are only few risk scores derived from a substantially large electronic health record (EHR) data set, using simplified predictors as input.Entities:
Keywords: COVID-19; machine learning; predictive algorithm; prognostic model
Mesh:
Year: 2022 PMID: 34951865 PMCID: PMC8785956 DOI: 10.2196/31549
Source DB: PubMed Journal: J Med Internet Res ISSN: 1438-8871 Impact factor: 5.428
Figure 1Patient attrition diagram. ∧With relevant COVID-19 diagnosis codes or tested positive for SARS-CoV-2. *Non-exclusive critera: overlapping was allowed.
Figure 2Model development and validation framework including data sampling and corresponding sensitivity analyses.
Demographic and clinical characteristics of hospitalized patients with COVID-19 at baseline and admission.
| Characteristic | Training data set (n=14,336) | Validation data set (n=10,752) | Test data set (n=10,752) | Prospective test data set (n=14,863) | ||
| Mean (SD) age at baseline, years | 60.9 (17.2) | 60.9 (17.2) | 60.8 (17.1) | 63.8 (16.8) | ||
|
|
|
|
|
| ||
|
| 18-34 | 1231 (8.59) | 920 (8.56) | 911 (8.47) | 1015 (6.83) | |
|
| 35-49 | 2383 (16.62) | 1780 (16.56) | 1840 (17.11) | 1893 (12.74) | |
|
| 50-64 | 4337 (30.25) | 3193 (29.70) | 3293 (30.63) | 4110 (27.65) | |
|
| 65-74 | 2922 (20.38) | 2296 (21.35) | 2141 (19.91) | 3325 (22.37) | |
|
| 75-84 | 2165 (15.10) | 1589 (14.78) | 1606 (14.94) | 2943 (19.80) | |
|
| 85+ | 1298 (9.05) | 974 (9.06) | 961 (8.94) | 1577 (10.61) | |
|
| ||||||
|
| Male | 7473 (52.13) | 5619 (52.26) | 5629 (52.35) | 7645 (51.44) | |
|
| Female | 6863 (47.87) | 5133 (47.74) | 5123 (47.65) | 7218 (48.56) | |
|
| ||||||
|
| African American | 3466 (24.18) | 2669 (24.82) | 2668 (24.81) | 1867 (12.56) | |
|
| Asian | 368 (2.57) | 268 (2.49) | 276 (2.57) | 216 (1.45) | |
|
| White | 7779 (54.26) | 5734 (53.33) | 5795 (53.90) | 10,832 (72.88) | |
|
| Other/Unknown | 2723 (18.99) | 2081 (19.35) | 2013 (18.72) | 1948 (13.11) | |
|
| ||||||
|
| East North Central | 3778 (26.35) | 2942 (27.36) | 2908 (27.05) | 4174 (28.08) | |
|
| East South Central | 1010 (7.05) | 708 (6.58) | 754 (7.01) | 1205 (8.11) | |
|
| Middle Atlantic | 3221 (22.47) | 2488 (23.14) | 2423 (22.54) | 1290 (8.68) | |
|
| Mountain | 496 (3.46) | 355 (3.30) | 363 (3.38) | 923 (6.21) | |
|
| New England | 1042 (7.27) | 705 (6.56) | 763 (7.10) | 769 (5.17) | |
|
| Pacific | 475 (3.31) | 331 (3.08) | 317 (2.95) | 345 (2.32) | |
|
| South Atlantic/West South Central | 2454 (17.12) | 1802 (16.76) | 1810 (16.83) | 2120 (14.26) | |
|
| West North Central | 1396 (9.74) | 1067 (9.92) | 1060 (9.86) | 3594 (24.18) | |
|
| Other/Unknown | 464 (3.24) | 354 (3.29) | 354 (3.29) | 443 (2.98) | |
| BMI at baseline (kg/m2), mean (SD) | 31.0 (8.5) | 30.9 (8.3) | 31.2 (8.6) | 31.6 (8.7) | ||
|
|
|
|
|
| ||
|
| Underweight | 352 (2.46) | 235 (2.19) | 221 (2.06) | 304 (2.05) | |
|
| Healthy weight | 2526 (17.62) | 1873 (17.42) | 1833 (17.05) | 2283 (15.36) | |
|
| Overweight | 3697 (25.79) | 2878 (26.77) | 2838 (26.40) | 3679 (24.75) | |
|
| Obese | 3041 (21.21) | 2247 (20.90) | 2344 (21.80) | 3228 (21.72) | |
|
| Morbidly obese | 3679 (25.66) | 2739 (25.47) | 2742 (25.50) | 4069 (27.38) | |
|
| Unknown | 1041 (7.26) | 780 (7.25) | 774 (7.20) | 1300 (8.75) | |
|
| ||||||
|
| Cerebrovascular disease | 676 (4.72) | 502 (4.67) | 501 (4.66) | 894 (6.01) | |
|
| Chronic kidney disease | 2808 (19.59) | 2058 (19.14) | 2040 (18.97) | 3127 (21.04) | |
|
| Congestive heart failure | 2137 (14.91) | 1534 (14.27) | 1553 (14.44) | 2369 (15.94) | |
|
| Coronary artery disease | 2430 (16.95) | 1797 (16.71) | 1800 (16.74) | 2969 (19.98) | |
|
| Diabetes mellitus | 4831 (33.70) | 3636 (33.82) | 3586 (33.35) | 5408 (36.39) | |
|
| Hypertension | 8173 (57.01) | 6091 (56.65) | 6063 (56.39) | 8852 (59.56) | |
|
| Solid tumor | 830 (5.79) | 606 (5.64) | 619 (5.76) | 1052 (7.08) | |
|
| Transplant history | 28 (0.20) | 16 (0.15) | 20 (0.19) | 12 (0.08) | |
|
| ||||||
|
| All-cause mortality | 1769 (12.34) | 1326 (12.33) | 1327 (12.34) | 1782 (11.99) | |
|
| Intensive care unit admission | 2813 (19.62) | 2181 (20.28) | 2148 (19.98) | 2422 (16.30) | |
|
| Acute respiratory distress syndrome (respiratory failure) | 7276 (50.75) | 5500 (51.15) | 5384 (50.07) | 7009 (47.16) | |
|
| Extracorporeal membrane oxygenation (mechanical ventilation) | 1962 (13.69) | 1535 (14.28) | 1498 (13.93) | 1483 (9.98) | |
|
| ||||||
|
| Diastolic blood pressure (mmHg)b | 73.0 (56.0-90.0) | 73.0 (56.0-90.0) | 73.0 (56.0-90.0) | 73.0 (56.0-90.0) | |
|
| Systolic blood pressure (mmHg)b | 125.0 (100.0-154.0) | 125.0 (101.0-155.0) | 125.0 (101.0-154.0) | 128.0 (103.0-159.0) | |
|
| Pulse (bpm)b | 85.0 (64.0-110.0) | 85.0 (64.0-110.0) | 85.0 (64.0-110.0) | 81.0 (61.0-107.6) | |
|
| Respiratory rate (breaths/minute)b | 19.0 (16.0-28.0) | 19.0 (16.0-28.0) | 19.0 (16.0-28.0) | 18.0 (16.0-25.0) | |
|
| Temperature (oC)b | 36.8 (36.3-37.9) | 36.8 (36.3-37.9) | 36.8 (36.3-37.8) | 36.7 (36.2-37.7) | |
|
|
| |||||
|
| Alkaline phosphatase (IU/L) | 77.0 (49.0-137.0) | 76.0 (49.0-136.0) | 76.0 (48.0-135.0) | 78.0 (50.0-134.0) | |
|
| Alanine aminotransferase (IU/L) | 28.0 (12.0-79.0) | 29.0 (12.0-80.0) | 28.0 (12.0-79.0) | 27.0 (12.0-68.0) | |
|
| Aspartate aminotransferase (IU/L) | 37.0 (18.0-95.0) | 36.0 (18.0-97.0) | 36.0 (18.0-95.0) | 34.0 (18.0-80.0) | |
|
| Albumin (g/dL) | 3.5 (2.7-4.2) | 3.6 (2.7-4.2) | 3.6 (2.7-4.2) | 3.6 (2.8-4.2) | |
|
| Anion gap (mEq/L) | 12.0 (7.0-17.0) | 12.0 (7.0-17.0) | 12.0 (7.0-17.0) | 12.0 (7.0-16.0) | |
|
| Blood urea nitrogen (mg/dL) | 16.0 (8.0-47.0) | 17.0 (8.0-46.0) | 16.0 (8.0-47.0) | 18.0 (9.0-44.0) | |
|
| Bicarbonate (mmol/L) | 24.0 (19.0-29.0) | 24.0 (19.0-29.0) | 24.0 (19.0-29.0) | 24.0 (19.0-29.0) | |
|
| Bilirubin total (mg/dL) | 0.6 (0.3-1.2) | 0.6 (0.3-1.2) | 0.6 (0.3-1.2) | 0.6 (0.3-1.1) | |
|
| C-reactive protein (mg/dL) | 85.0 (10.3-229.0) | 82.2 (11.0-218.0) | 82.0 (10.2-220.0) | 73.0 (10.0-206.6) | |
|
| Chloride (mmol/L) | 101.0 (94.0-108.0) | 101.0 (94.0-108.0) | 101.0 (94.0-108.0) | 101.0 (94.0-107.0) | |
|
| Glucose (mg/dL) | 120.0 (91.0-242.0) | 121.0 (92.0-236.0) | 121.0 (92.0-240.6) | 122.0 (91.2-244.0) | |
|
| Hemoglobin (g/dL) | 13.2 (10.0-15.5) | 13.2 (10.1-15.6) | 13.2 (10.2-15.7) | 13.2 (10.1-15.6) | |
|
| Lymphocyte (%) | 14.1 (5.4-30.0) | 14.8 (5.6-30.7) | 14.6 (5.8-30.2) | 14.1 (5.3-30.0) | |
|
| Monocyte (%) | 7.1 (3.1-12.9) | 7.0 (3.2-12.6) | 7.1 (3.2-12.7) | 7.8 (3.6-13.1) | |
|
| Neutrophil (%) | 75.8 (57.0-88.0) | 75.0 (57.0-88.0) | 75.2 (57.0-88.0) | 75.0 (57.0-88.0) | |
|
| Platelet count (x109/L) | 210.0 (125.0-351.0) | 210.0 (127.0-348.0) | 211.0 (126.0-351.0) | 205.0 (124.0-335.0) | |
|
| Potassium (mmol/L) | 3.9 (3.3-4.8) | 3.9 (3.3-4.8) | 3.9 (3.3-4.8) | 3.9 (3.3-4.7) | |
|
| Protein total (g/dL) | 7.2 (6.2-8.2) | 7.2 (6.2-8.2) | 7.3 (6.2-8.2) | 7.1 (6.2-8.0) | |
|
| Red cell distribution width coefficient of variation (%) | 13.9 (12.4-17.0) | 13.8 (12.4-16.9) | 13.8 (12.4-17.0) | 13.8 (12.4-16.7) | |
|
| Sodium (mmol/L) | 136.0 (130.0-141.0) | 136.0 (131.0-142.0) | 136.0 (131.0-141.0) | 136.0 (131.0-141.0) | |
|
| Oxygen saturation pulse oximeter (%) | 96.0 (91.0-99.0) | 96.0 (90.0-99.0) | 96.0 (91.0-99.0) | 95.0 (90.0-99.0) | |
|
| Oxygen saturation pulse oximeterb (%) | 95.0 (87.0-99.0) | 95.0 (87.0-99.0) | 95.0 (87.0-99.0) | 95.0 (87.0-99.0) | |
|
| Oxygen saturation pulse oximeterc (%) | 93.0 (84.0-97.0) | 93.0 (84.0-97.0) | 93.0 (84.0-97.0) | 92.0 (83.0-97.0) | |
|
| White blood cell count (x109/L) | 7.1 (4.0-14.1) | 7.1 (4.0-13.9) | 7.0 (4.0-13.8) | 6.9 (3.9-13.5) | |
aNon-exhaustive list.
bFirst measurement on the day of hospital admission.
cMinimum measurement on the day of hospital admission.
Summary of model performances (AUCa and Brier Score) on test data set and postdevelopment prospective test data set in the final analysis. The full model uses all the available 210 covariates with less than 30% (15,211/50,703) missingness (excluding postadmission treatment) among the study cohort (n=50,703); the parsimonious N10 model only uses 10 predictors prefiltered from the automatic predictor selection.
| Outcome and model | AUCa (95% CI) | Brier score (95% CI) | ||||
|
| Test data set, % | Prospective test data set, % | Test data set | Prospective test data set | ||
|
|
|
|
|
| ||
|
| Full model | 88.7 (88.4-89.0) | 85.4 (85.1-85.7) | 0.071 (0.070-0.072) | 0.079 (0.078-0.080) | |
| N10 model | 87.6 (87.2-87.9) | 84.3 (84.0-84.6) | 0.074 (0.073-0.075) | 0.081 (0.080-0.081) | ||
|
|
|
|
|
| ||
|
| Full model | 79.7 (79.4-80.1) | 77.7 (77.3-78.0) | 0.123 (0.122-0.124) | 0.115 (0.114-0.115) | |
| N10 model | 73.6 (73.2-74.0) | 73.5 (73.2-73.9) | 0.138 (0.137-0.139) | 0.123 (0.122-0.124) | ||
|
|
|
|
|
| ||
|
| Full model | 82.3 (82.0-82.5) | 80.7 (80.5-80.9) | 0.172 (0.171-0.173) | 0.180 (0.179-0.181) | |
| N10 model | 79.5 (79.2-79.7) | 78.1 (77.9-78.3) | 0.185 (0.184-0.186) | 0.192 (0.191-0.193) | ||
|
|
|
|
|
| ||
|
| Full model | 83.6 (83.3-84.0) | 81.1 (80.8-81.5) | 0.090 (0.089-0.091) | 0.074 (0.074-0.075) | |
| N10 model | 78.1 (77.7-78.5) | 76.6 (76.2-77.1) | 0.101 (0.100-0.101) | 0.081 (0.081-0.082) | ||
aAUC: area under the receiver operating characteristic curve.
bRefers to composite of respiratory failure and acute respiratory distress syndrome.
cRefers to composite of invasive mechanical ventilation and extracorporeal membrane oxygenation.
Figure 3Receiver operating characteristics (AUROC) curves on four prediction outcomes in final analysis: (a) all-cause mortality; (b) respiratory failure including ARDS; (c) ICU admission; (d) invasive mechanical ventilation including ECMO. Full model is colored in black, parsimonious model with ten input variables is colored in orange. Solid line represents model performance on test dataset (n=10,752); dashed line represents post-development prospective test dataset (n=14,863). ARDS: acute respiratory distress syndrome. ECMO: extracorporeal membrane oxygenation.
Figure 4Calibration curve (number of bins = 10) on four prediction outcomes in final analysis: (a) all-cause mortality; (b) respiratory failure including ARDS; (c) ICU admission; (d) invasive mechanical ventilation including ECMO. Full model is colored in black, parsimonious model with ten input variables is colored in orange. Solid line represents calibration on test dataset (n=10,752); dashed line represents calibration on post-development prospective test dataset (n=14,863). ARDS: acute respiratory distress syndrome. ECMO: extracorporeal membrane oxygenation.
Comparison with existing risk scores evaluated on test data sets to predict 28-day all-cause mortality. Sensitivity and specificity were evaluated at 2 different thresholds.
| Risk score | AUCa (95% CI), % | Threshold 1b | Threshold 2c | nd | ||
| Sensitivity, % | Specificity, % | Sensitivity, % | Specificity, % |
| ||
| Acute Physiology and Chronic Health Evaluation II | 72.3 (69.5-74.9) | 66.2 | 68.5 | 92.4 | 26.0 | 1769 |
| Respiratory Rate-Oxygenation Index | 68.5 (67.0-70.0) | 28.2 | 92.7 | 54.2 | 78.3 | 16,640 |
| CURB-65 | 78.7 (77.6-79.7) | 36.2 | 92.4 | 77.2 | 69.1 | 15,001 |
| E-CURB | 81.9 (80.3-83.3) | 63.4 | 83.4 | 87.3 | 61.3 | 5772 |
| National Early Warning Score 2 score | 82.9 (81.7-84.2) | 51.6 | 91.2 | 75.0 | 77.0 | 14,112 |
| Coronavirus Clinical Characterization Consortium Mortality score | 82.2 (80.7-83.5) | 62.3 | 83.8 | 71.8 | 75.7 | 6979 |
| Baseline model | 73.8 (73.2-74.5) | 44.8 | 83.4 | 80.2 | 54.9 | 25,615 |
| Full model | 89.2 (88.1-90.3) | 63.1 | 92.2 | 85.2 | 76.4 | 8493 |
| N10 model | 88.9 (88.0-90.0) | 65.9 | 90.9 | 81.4 | 79.3 | 10,688 |
aAUC: area under the receiver operating characteristic curve.
bThreshold 1 is a clinically relevant threshold that identifies patients for dexamethasone treatment; costs of FP and FN are expressed in terms of mortality risk.
cThreshold 2 is derived from a cost-agnostic approach and is located at the point on the area under the receiver operating characteristic curve that maximizes the Youden index.
dNumber of hospitalized patients in the test data set and the postdevelopment test data set with complete case.
Figure 5Decision curve analysis of standardized net benefit across different risk thresholds. Dotted line represents the scenario if everyone is treated; dashed line represents the scenario if none is treated.
.24, observed mortality rate = 787/1517, 51.88%). Scenario-based threshold can be substituted with appropriate clinical trial insights according to different treatment options.
Mortality rate comparison across different risk groups on the test and postdevelopment prospective test data sets. Three risk groups were defined as (1) low-to-intermediate–risk group (P≤.13), (2) high risk (.13 .24). The threshold probabilities are obtained from receiver operating characteristic analysis, which (1) maximizes the Youden index (P=.13), or (2) defined by clinical utility of dexamethasone (P=.24) from the RECOVERYa trial.
| Risk group | Test data set | Prospective test data set | ||
| Patients, n (%) | Deaths, n (%) | Patients, n (%) | Deaths, n (%) | |
| Low–intermediate | 8065 (75.01) | 315 (3.91) | 11,049 (74.34) | 512 (4.63) |
| High | 1170 (10.88) | 225 (19.23) | 1743 (11.73) | 327 (18.76) |
| Very high | 1517 (14.11) | 787 (51.88) | 2071 (13.93) | 943 (45.53) |
aRECOVERY: Randomised Evaluation of COVID-19 Therapy.
.24). The threshold probabilities are obtained from receiver operating characteristic analysis, which (1) maximizes the Youden index (P=.13), or (2) defined by clinical utility of dexamethasone (P=.24) from the RECOVERYa trial.