| Literature DB >> 35206938 |
Abrar Almalki1, Balakrishna Gokaraju1, Yaa Acquaah1, Anish Turlapaty2.
Abstract
COVID-19, or SARS-CoV-2, is considered as one of the greatest pandemics in our modern time. It affected people's health, education, employment, the economy, tourism, and transportation systems. It will take a long time to recover from these effects and return people's lives back to normal. The main objective of this study is to investigate the various factors in health and food access, and their spatial correlation and statistical association with COVID-19 spread. The minor aim is to explore regression models on examining COVID-19 spread with these variables. To address these objectives, we are studying the interrelation of various socio-economic factors that would help all humans to better prepare for the next pandemic. One of these critical factors is food access and food distribution as it could be high-risk population density places that are spreading the virus infections. More variables, such as income and people density, would influence the pandemic spread. In this study, we produced the spatial extent of COVID-19 cases with food outlets by using the spatial analysis method of geographic information systems. The methodology consisted of clustering techniques and overlaying the spatial extent mapping of the clusters of food outlets and the infected cases. Post-mapping, we analyzed these clusters' proximity for any spatial variability, correlations between them, and their causal relationships. The quantitative analyses of the health issues and food access areas against COVID-19 infections and deaths were performed using machine learning regression techniques to understand the multi-variate factors. The results indicate a correlation between the dependent variables and independent variables with a Pearson correlation R2-score = 0.44% for COVID-19 cases and R2 = 60% for COVID-19 deaths. The regression model with an R2-score of 0.60 would be useful to show the goodness of fit for COVID-19 deaths and the health issues and food access factors.Entities:
Keywords: COVID-19; GIS; Gilford County; North Carolina; machine learning; regression
Year: 2022 PMID: 35206938 PMCID: PMC8871757 DOI: 10.3390/healthcare10020324
Source DB: PubMed Journal: Healthcare (Basel) ISSN: 2227-9032
Figure 1Study area.
Figure 2Methodology graph.
Figure 3COVID-19 cases in Guilford County.
Figure 4COVID-19 deaths distribution.
Figure 5Scatterplot matrix graph using cases as dependent variable.
Figure 6Scatterplot matrix graph using deaths as dependent variable.
Figure 7Spatial autocorrelation for COVID-19 cases.
Figure 8Spatial autocorrelation for COVID-19 deaths.
OLS results for COVID-19 cases and deaths.
| Measures | COVID-19 Cases | COVID-19 Deaths |
|---|---|---|
| Moran’s Index | 0.118617 | 0.005965 |
| Expected Index | −0.009259 | −0.009259 |
| Variance | 0.000575 | 0.000551 |
| Z-score | 5.3314423 | 6.48788 |
| 0.000000 | 0.516475 |
Figure 9The local Moran’s on COVID-19 cases in Guilford County.
Figure 10The local Moran’s on COVID-19 deaths in Guilford County.
Figure 11OLS on COVID-19 cases in Guilford County.
Figure 12OLS on COVID-19 deaths in Guilford County.
Figure 13Geographically weighted regression on COVID-19 cases.
Figure 14Geographically weighted regression on COVID-19 deaths.
Regression models’ parameters.
| Model | Parameters |
|---|---|
| Linear Regression Model | copy_X = True,fit_intercept = True,n_jobs = None,normalize = False. |
| Random Forest Regression Model | bootstrap = True,ccp_alpha = 0.0,critrion = ‘mse’,max_depth = None,max_features = ‘ato’,max_leaf_nodes = None,max_saples = None,min_impurity_decrease = 0.0,min_imprity_split = None,min_samples_leaf = 1,min_samples_split = 2,min_weight_fraction_leaf = 0.0,n_estimtors = 100,n_jobs = None,oob_score = False,random_state = None,verbose = 0, warm_start = False) |
| K-Nearest Neighbor Regression Model | lgorithm’:’auto’,’leaf_size’:30,’metric’:’minkowski’,’metric_params’: None, ‘n_jobs’: None,’n_neighbors’: 5, ‘p’: 2, ‘weights’: ‘uniform’ |
Figure 15Correlation matrix with heatmap.
R-square value of regression models.
| Root Mean Square Error | ||
|---|---|---|
| Models | CVID-19 Cases | COVID-19 Deaths |
| Linear regression for multioutput Regression | 0.146 | 0.141 |
| K-nearest neighbors for multioutput regression | 0.208 | 0.147 |
| Random forest for multioutput regression | 0.186 | 0.175 |
| Support Vector Regression | 0.168 | 0.127 |
Root square error (RMSE) values of regression models.
| Correlation Coefficient | ||
|---|---|---|
| Models | CVID-19 Cases | COVID-19 Deaths |
| Linear regression for multioutput regression | 0.446 | 0.508 |
| K-nearest neighbors for multioutput regression | −0.085 | 0.466 |
| Random forest for multioutput regression | 0.137 | 0.239 |
| Support Vector Regression | 0.290 | 0.601 |
Figure 16Support vector regression model.