| Literature DB >> 34933880 |
Jahir M Gutierrez1, Maksims Volkovs1, Tomi Poutanen1, Tristan Watson1, Laura C Rosella2.
Abstract
BACKGROUND: The COVID-19 pandemic has led to an increased demand for health care resources and, in some cases, shortage of medical equipment and staff. Our objective was to develop and validate a multivariable model to predict risk of hospitalization for patients infected with SARS-CoV-2.Entities:
Mesh:
Year: 2021 PMID: 34933880 PMCID: PMC8695533 DOI: 10.9778/cmajo.20210036
Source DB: PubMed Journal: CMAJ Open ISSN: 2291-0026
Figure 1:Electronic medical records used for model development. The date of diagnosis of SARS-CoV-2 infection is used as the index date. From this date, a look-ahead period of 30 days is used to look for the outcome of hospitalization related to COVID-19. Besides including demographic information, independent predictor variables were constructed by aggregating 2 years of medical records (e.g., past health care utilization, laboratory results and drug prescriptions) up to 30 days before the index date. The complete list of predictor variables calculated can be found in Appendix 1, Supplementary Table 1 (available at www.cmajopen.ca/content/9/4/E1223/suppl/DC1). The icons used in this figure are freely available at www.flaticon.com and were downloaded from this site on Jan. 17, 2021.
Figure 2:Flow diagram of study cohort (derivation and validation). The ICES COVID-19 cohort was last updated on Nov. 7, 2020, and it includes patients with index (diagnosis) dates between Feb. 2, 2020, and Nov. 5, 2020. Patients with an index date after Oct. 5, 2020, or currently living in a long-term care facility were excluded. Included patients were followed up for 30 days for the outcome of hospitalization for COVID-19.
Baseline characteristics of patients included in the study
| Characteristic | No. (%) of patients | Standardized difference (hospitalized – not hospitalized) | ||
|---|---|---|---|---|
| All patients | Hospitalized | Not hospitalized | ||
| Age, yr, median (IQR) | 45 (31–58) | 64 (54–77) | 43 (30–56) | 1.175 |
| No. of comorbidities, median (IQR) | 1 (0–3) | 3 (2–6) | 1 (0–3) | 0.935 |
| Male | 17 428 (48.0) | 1453 (56.3) | 15 975 (47.3) | 0.179 |
| Female | 18 895 (52.0) | 1130 (43.7) | 17 765 (52.7) | −0.179 |
| Asthma | 5460 (15.0) | 480 (18.6) | 4980 (14.8) | 0.103 |
| Cancer | 1453 (4.0) | 297 (11.5) | 1156 (3.4) | 0.311 |
| Chronic heart failure | 831 (2.3) | 275 (10.6) | 556 (1.6) | 0.381 |
| COPD | 1959 (5.4) | 457 (17.7) | 1502 (4.5) | 0.432 |
| Diabetes | 5273 (14.5) | 940 (36.4) | 4333 (12.8) | 0.568 |
| Hypertension | 8994 (24.8) | 1477 (57.2) | 7517 (22.3) | 0.763 |
| Hospitalized for COVID-19 | 2583 (7.1) | 2583 (100) | 0 (0) | NA |
| Died from COVID-19 | 906 (2.5) | 543 (21.0) | 364 (1.1) | 0.67 |
Note: COPD = chronic obstructive pulmonary disease, IQR = interquartile range, NA = not applicable.
Unless otherwise stated.
The variable “no. of comorbidities” accounts for the following conditions (see Model development under Methods): acute myocardial infarction, arrhythmia, arthritis, asthma, cancer, chronic heart failure, colitis, COPD, coronary disease, diabetes, hypertension, osteoarthritis, osteoporosis and kidney disease.
Baseline characteristics of patients in the development and validation sets
| Characteristic | No. (%) of patients | Standardized difference | |
|---|---|---|---|
| Development set | Validation set | ||
| Age, yr, median (IQR) | 44 (31–58) | 45 (31–58) | −0.015 |
| No. of comorbidities, median (IQR) | 1 (0–3) | 1 (0–3) | −0.009 |
| Male | 13 995 (48.2) | 3433 (47.3) | 0.018 |
| Female | 15 063 (51.8) | 3832 (52.7) | 0.003 |
| Asthma | 4376 (15.1) | 1084 (14.9) | 0.004 |
| Cancer | 1163 (4.0) | 290 (4.0) | 0.001 |
| Chronic heart failure | 668 (2.3) | 163 (2.2) | 0.004 |
| COPD | 1549 (5.3) | 410 (5.6) | −0.014 |
| Diabetes | 4202 (14.5) | 1071 (14.7) | −0.008 |
| Hypertension | 7181 (24.7) | 1813 (25.0) | −0.006 |
| Hospitalized for COVID-19 | 2043 (7.0) | 540 (7.4) | −0.016 |
| Died from COVID-19 | 719 (2.5) | 187 (2.6) | −0.006 |
Note: COPD = chronic obstructive pulmonary disease, IQR = interquartile range.
Unless otherwise stated.
The variable “no. of comorbidities” accounts for the following conditions (see Model development under Methods): acute myocardial infarction, arrhythmia, arthritis, asthma, cancer, chronic heart failure, colitis, COPD, coronary disease, diabetes, hypertension, osteoarthritis, osteoporosis and kidney disease.
Variables included in final XGBoost model ranked by SHAP values of importance
| Predictor variable | SHAP value |
|---|---|
| Age | 0.7567 |
| Days since last creatinine blood test | 0.1320 |
| Geographical latitude | 0.1299 |
| Days since last basophils test | 0.1196 |
| Male | 0.1196 |
| No. of family doctor visits in the last 2 yr | 0.1165 |
| No. of comorbidities | 0.1072 |
| No. of unique drug subclasses taken in the last 2 yr | 0.0845 |
| Highest recorded level of creatinine in the last 2 yr | 0.0773 |
| No. of diagnostic radiology studies in the last 2 yr | 0.0381 |
| Average measurement of neutrophils in blood in the last 2 yr | 0.0289 |
| No. of doctor visits in the last 2 yr | 0.0237 |
| Median level of neutrophils in the last 2 yr | 0.0165 |
| Average level of leukocytes in the last 2 yr | 0.0144 |
| No. of creatinine tests in the last 2 yr | 0.0144 |
| Highest recorded level of hemoglobin in blood in the last 2 yr | 0.0021 |
| History of chronic kidney disease | 0.0021 |
| Days since last mean corpuscular hemoglobin test in the last 2 yr | 0.0010 |
Note: SHAP = Shapley Additive Explanation, XGBoost = Extreme Gradient Boosting.
SHAP values represent the weighted average of marginal contributions for each predictive variable included in the XGBoost model.
Figure 3:Extreme Gradient Boosting (XGBoost) model performance. The final model was trained with 18 features extracted from the ICES COVID-19 data source. (A) The blue line shows the receiver operating characteristic curve. (B) Calibration curve of the final XGBoost model on the validation data set, where each blue dot (bins in the histogram) corresponds to a decile of predicted risk.
Figure 4:Comparison of recall at top percentiles. The final Extreme Gradient Boosting (XGBoost) model recall (percentage of true hospitalizations recovered in validation data set) was compared against 4 empirical rules.