| Literature DB >> 34235131 |
Brett Snider1, Edward A McBean1, John Yawney2, S Andrew Gadsden1, Bhumi Patel1.
Abstract
The Severe Acute Respiratory Syndrome Coronavirus 2 pandemic has challenged medical systems to the brink of collapse around the globe. In this paper, logistic regression and three other artificial intelligence models (XGBoost, Artificial Neural Network and Random Forest) are described and used to predict mortality risk of individual patients. The database is based on census data for the designated area and co-morbidities obtained using data from the Ontario Health Data Platform. The dataset consisted of more than 280,000 COVID-19 cases in Ontario for a wide-range of age groups; 0-9, 10-19, 20-29, 30-39, 40-49, 50-59, 60-69, 70-79, 80-89, and 90+. Findings resulting from using logistic regression, XGBoost, Artificial Neural Network and Random Forest, all demonstrate excellent discrimination (area under the curve for all models exceeded 0.948 with the best performance being 0.956 for an XGBoost model). Based on SHapley Additive exPlanations values, the importance of 24 variables are identified, and the findings indicated the highest importance variables are, in order of importance, age, date of test, sex, and presence/absence of chronic dementia. The findings from this study allow the identification of out-patients who are likely to deteriorate into severe cases, allowing medical professionals to make decisions on timely treatments. Furthermore, the methodology and results may be extended to other public health regions.Entities:
Keywords: COVID-19; SHapley; XGBoost; artificial intelligence; mortality
Mesh:
Year: 2021 PMID: 34235131 PMCID: PMC8255789 DOI: 10.3389/fpubh.2021.675766
Source DB: PubMed Journal: Front Public Health ISSN: 2296-2565
Characteristics of 57,390 Ontario residents with COVID-19.
|
|
|
|
|---|---|---|
| Age | Age in years, as of Jan 1, 2020 | 0–105 |
| Test date | Test date | Feb 22–Oct 20 |
| Sex | Indicator Variable for sex | 26,861 (M = 1, F = 0) |
| Hypertension | Chronic hypertension, as of Jan 1, 2020 | 15,778 (0, 1) |
| LTC resident | LTC resident, as of Jan 1, 2020 | 5,179 (0, 1) |
| Chronic_dementia | Chronic dementia diagnosed, as of Jan 1, 2020 | 4,746 (0, 1) |
| Chronic_odd | Chronic diabetes diagnosed as of Jan 1, 2020 | 9,002 (0, 1) |
| Ethnic concentration quint. | Calculated from ontario marginalization index, based on census designation. Refers to Visible minorities and/or recent immigrants (0–5 ranging from least diverse to most diverse) | (0–5) |
| Commuter concentration quint | % of people that commute within Census designated area - converted to quintiles (5 being the highest, 0 referring to missing DA info). | (0–5) |
| Median income quint. | Median income within census-designated area - converted to quintiles (0–5 ranging from Lowest income to highest income, 0 referring to missing DA info). | (0–5) |
| Charl | Charlson co-morbidity index. Only 2,059 patients with charl above 0. | (0–10) |
| Household size quint. | Avg. household size within Census-designated area - converted to quintiles (5 being the Highest, 0 = missing DA info). | (0–5) |
| CKD | Chronic kidney disease. | 2,523 (0, 1) |
| Cancer | Cancer index | 2,995 (0–1) |
| Chronic_copd | Chronic obstructive pulmonary disease | 4,030 (0–1) |
| Chronic_asthma | Asthma | 9,100 (0–1) |
| Chronic_chf | Congestive heart failure | 2,257 (0–1) |
| Stroke | If patient suffered a stroke previous to Jan 1, 2020 | 1,016 (0–1) |
| Cardiac ISCH | Cardiac ischemia | 1,916 (0–1) |
| Rural | Indicator if a patient lives in a rural residence | 1,746 (0–1) |
| Chronic_ra | Rheumatoid arthritis | 567 (0–1) |
| Tia | Transient Ischemic Attack | 722 (0–1) |
| Immuno_comp | Immuno-compromised | 237 (0–1) |
| Thala | History of Thalassemia | 36 (0–1) |
| Cases recovered | 54,568 | |
| Cases died | 2,822 |
Comparison of models employed in the base case analyses.
|
|
|
|---|---|
| Logit | 0.9518 |
|
|
|
| Random forest | 0.948 |
| Neural net | 0.9475 |
The bold values represent the accuracy for the model (i.e., XGBoost) which is used primarily in this paper to explore the importance of variables.
Confusion matrix and statistics.
|
|
|
|
|---|---|---|
| Alive | 10,710 | 353 |
| Dead | 203 | 211 |
Figure 1SHAP summary plot for XGBoost model.