| Literature DB >> 32545581 |
Abolfazl Mollalo1, Kiara M Rivera1, Behzad Vahedi2.
Abstract
Prediction of the COVID-19 incidence rate is a matter of global importance, particularly in the United States. As of 4 June 2020, more than 1.8 million confirmed cases and over 108 thousand deaths have been reported in this country. Few studies have examined nationwide modeling of COVID-19 incidence in the United States particularly using machine-learning algorithms. Thus, we collected and prepared a database of 57 candidate explanatory variables to examine the performance of multilayer perceptron (MLP) neural network in predicting the cumulative COVID-19 incidence rates across the continental United States. Our results indicated that a single-hidden-layer MLP could explain almost 65% of the correlation with ground truth for the holdout samples. Sensitivity analysis conducted on this model showed that the age-adjusted mortality rates of ischemic heart disease, pancreatic cancer, and leukemia, together with two socioeconomic and environmental factors (median household income and total precipitation), are among the most substantial factors for predicting COVID-19 incidence rates. Moreover, results of the logistic regression model indicated that these variables could explain the presence/absence of the hotspots of disease incidence that were identified by Getis-Ord Gi* (p < 0.05) in a geographic information system environment. The findings may provide useful insights for public health decision makers regarding the influence of potential risk factors associated with the COVID-19 incidence at the county level.Entities:
Keywords: COVID-19 (Coronavirus); GIS; United States; artificial neural networks; multilayer perceptron
Mesh:
Year: 2020 PMID: 32545581 PMCID: PMC7344609 DOI: 10.3390/ijerph17124204
Source DB: PubMed Journal: Int J Environ Res Public Health ISSN: 1660-4601 Impact factor: 3.390
Figure 1The topology of MLP neural network.
Figure 2Locations of hotspots of COVID-19 incidence identified by Getis-Ord Gi*, continental United States.
Comparative performance of the employed models (single run) to predict COVID-19 rates across the continental United States.
| Model | Accuracy Assessment | ||
|---|---|---|---|
| RMSE | r | MAE | |
| Linear Regression | 0.992517 | 0.295885 | 0.577808 |
| MLP (1 hidden layer) | 0.722409 | 0.645481 | 0.355843 |
| MLP (2 hidden layers) | 0.839806 | 0.466981 | 0.39755 |
Figure 3Comparison of actual and predicted values of the dependent variable (z-scores) for holdout samples using the one-hidden-layer MLP.
Figure 4The relative importance of the top 10 variables to the COVID-19 incidence rate, using sensitivity analysis by one hidden layer MLP, continental United States.
Results of the logistic regression model in explaining the presence/absence of the hotspots (p < 0.05) of COVID-19 incidence rate, continental United States.
| Coefficient (B) | Standard Error | Wald Test | Degree of Freedom | Significance | Exp (B) | |
|---|---|---|---|---|---|---|
| Constant | −2.763 | 0.086 | 1036.109 | 1 | 0.000 | 0.063 |
| Median household income | 0.403 | 0.079 | 26.139 | 1 | 0.000 | 1.497 |
| Max terrain slope | −0.270 | 0.093 | 8.432 | 1 | 0.004 | 0.763 |
| Precipitation | 0.337 | 0.080 | 17.817 | 1 | 0.000 | 1.400 |
| Pancreatitis cancer | 0.636 | 0.095 | 44.672 | 1 | 0.000 | 1.889 |
| Hodgkin’s Disease | 0.409 | 0.100 | 16.596 | 1 | 0.000 | 1.505 |
| Leukemia | −0.550 | 0.089 | 38.241 | 1 | 0.000 | 0.577 |
| Cardiovascular | −0.414 | 0.118 | 12.350 | 1 | 0.000 | 0.661 |