| Literature DB >> 35035282 |
Samira Ziyadidegan1, Moein Razavi2, Homa Pesarakli3, Amir Hossein Javid4, Madhav Erraguntla1.
Abstract
The COVID-19 disease spreads swiftly, and nearly three months after the first positive case was confirmed in China, Coronavirus started to spread all over the United States. Some states and counties reported high number of positive cases and deaths, while some reported lower COVID-19 related cases and death. In this paper, the factors that could affect the risk of COVID-19 infection and death were analyzed in county level. An innovative method by using K-means clustering and several classification models is utilized to determine the most critical factors. Results showed that longitudinal coordinate and population density, latitudinal coordinate, percentage of non-white people, percentage of uninsured people, percent of people below poverty, percentage of Elderly people, number of ICU beds per 10,000 people, percentage of smokers were the most significant attributes. © This is a U.S. government work and not under copyright protection in the U.S.; foreign copyright protection may apply 2021.Entities:
Keywords: COVID-19; K-means clustering; Meteorological variables; Multinomial logistic regression; SARS-CoV-2
Year: 2022 PMID: 35035282 PMCID: PMC8747889 DOI: 10.1007/s00477-021-02148-0
Source DB: PubMed Journal: Stoch Environ Res Risk Assess ISSN: 1436-3240 Impact factor: 3.821
Fig. 1COVID-19 positive case and death counts maps for all states of the US, a COVID-19 positive case counts map, b COVID-19 Death counts map
Fig. 2Methodology used in the study
List of all the parameters used in the study
| Category | Parameters |
|---|---|
| COVID-19 | Positive rate, death rate |
| Location-based | Longitudinal coordinates, latitudinal coordinates, percent of rural areas |
| Meteorological | BA Climate zone |
| Health | Number of ICU beds per 10,000, percent of smokers, percent of adults with obesity, percent of people uninsured, percent of adults with diabetes |
| Demographic | Percent of elderly population, percent of non-white population, percent of people below poverty, population density |
Fig. 3Correlation matrix for all parameters
Fig. 5Clustering output plot
Fig. 4Elbow Method
Fig. 6Clustering Results: a Positive Case Rate; b Death Rate
Cluster attributes
| Cluster number | Counts | Mean positive rate | SD positive rate | Mean death rate | SD death rate |
|---|---|---|---|---|---|
| 1 | 306 | 8.94 × 10–2 | 2.43 × 10–2 | 2.68 × 10–3 | 0.07 × 10–2 |
| 2 | 1293 | 6.78 × 10–2 | 1.60 × 10–2 | 1.04 × 10–3 | 0.05 × 10–2 |
| 3 | 1528 | 3.50 × 10–2 | 1.91 × 10–2 | 5.14 × 10–4 | 0.05 × 10–2 |
Cross validation and test accuracy of the classification models used
| Method | Cross validation accuracy | Test accuracy |
|---|---|---|
| Multinomial logistic regression | 59.93% | 60.99% |
| LDA | 59.55% | 60.90% |
| QDA | 49.96% | 48.07% |
| KNN | 83.16% | 76.86% |
| SVM Linear | 60.36% | 62.24% |
| SVM Radial | 94.13% | 80.18% |
| SVM Polynomial | 94.10% | 79.10% |
Random Forest model (indicated in boldface) performed the best among the classification models
Fig. 7Feature importance plots for Random Forest model by a The Mean Decrease Accuracy b Mean Decrease in Gini
Fig. 8SHAP-values for a Class 1 (High-Risk Cluster), b Class 2 (Medium-Risk Cluster), c Class 3 (Low-Risk Cluster)
Fig. 9Plots showing SHAP values for different values of the parameters among three classes