| Literature DB >> 30626123 |
Abolfazl Mollalo1, Liang Mao2, Parisa Rashidi3, Gregory E Glass4,5.
Abstract
Despite the usefulness of artificial neural networks (ANNs) in the study of various complex problems, ANNs have not been applied for modeling the geographic distribution of tuberculosis (TB) in the US. Likewise, ecological level researches on TB incidence rate at the national level are inadequate for epidemiologic inferences. We collected 278 exploratory variables including environmental and a broad range of socio-economic features for modeling the disease across the continental US. The spatial pattern of the disease distribution was statistically evaluated using the global Moran's I, Getis⁻Ord General G, and local Gi* statistics. Next, we investigated the applicability of multilayer perceptron (MLP) ANN for predicting the disease incidence. To avoid overfitting, L1 regularization was used before developing the models. Predictive performance of the MLP was compared with linear regression for test dataset using root mean square error, mean absolute error, and correlations between model output and ground truth. Results of clustering analysis showed that there is a significant spatial clustering of smoothed TB incidence rate (p < 0.05) and the hotspots were mainly located in the southern and southeastern parts of the country. Among the developed models, single hidden layer MLP had the best test accuracy. Sensitivity analysis of the MLP model showed that immigrant population (proportion), underserved segments of the population, and minimum temperature were among the factors with the strongest contributions. The findings of this study can provide useful insight to health authorities on prioritizing resource allocation to risk-prone areas.Entities:
Keywords: Artificial neural networks; Tuberculosis; geographic information system; hotspot detection; multilayer perceptron
Mesh:
Year: 2019 PMID: 30626123 PMCID: PMC6338935 DOI: 10.3390/ijerph16010157
Source DB: PubMed Journal: Int J Environ Res Public Health ISSN: 1660-4601 Impact factor: 3.390
Figure 1Topological architecture of multi-layer perceptron neural network (MLPNN) used in this study [63].
Figure 2Spatial distribution of training, cross-validation, and test data used for modeling log (STIR).
Figure 3The frequency of TB cases (left) and the cumulative TB incidence rate (right) across the continental US (2006–2010).
Figure 4Hotspot map for the STIR in the continental US identified by hotspot analysis (Getis–Ord Gi*) technique, 2006–2010.
Top 10 states with the largest number of hotspot counties (p < 0.10) of smoothed tuberculosis (TB) incidence rate (STIR) in the continental US, 2006–2010.
| Rank | State | No. Hotspot Counties | Percentage (#hotspots/#counties) |
|---|---|---|---|
| 1 | Georgia | 57 | 35.8% |
| 2 | Texas | 30 | 11.8% |
| 3 | North Carolina | 23 | 23.0% |
| 4 | Louisiana | 22 | 34.3% |
| 5 | Florida | 20 | 29.9% |
| 6 | California | 17 | 22.7% |
| 7 | South Carolina | 17 | 37% |
| 8 | Arkansas | 12 | 16.0% |
| 9 | Mississippi | 12 | 14.6% |
| 10 | Alabama | 10 | 14.9% |
Pearson correlation analysis between selected variables for modeling STIR, continental US.
| POP730 | LFE330 | IPE110 | POP778 | Min Temp | SPR440 | HIS305 | RHI820 | |
|---|---|---|---|---|---|---|---|---|
|
| 1.000 | 0.051 | 0.041 | 0.064 | −0.124 | −0.078 | −0.138 | −0.024 |
|
| 0.051 | 1.000 | 0.018 | 0.057 | 0.136 | −0.499 | −0.186 | −0.040 |
|
| 0.041 | 0.018 | 1.000 | 0.266 | −0.231 | −0.108 | 0.066 | 0.384 |
|
| 0.064 | 0.057 | 0.266 | 1.000 | −0.005 | 0.091 | −0.390 | 0.248 |
|
| −0.124 | 0.136 | −0.231 | −0.005 | 1.000 | 0.066 | −0.032 | 0.308 |
|
| −0.078 | −0.499 | −0.108 | 0.091 | 0.066 | 1.000 | −0.015 | 0.003 |
|
| −0.138 | −0.186 | 0.066 | −0.390 | −0.032 | −0.015 | 1.000 | 0.403 |
|
| −0.024 | −0.040 | 0.384 | 0.248 | 0.308 | 0.003 | 0.403 | 1.000 |
Results of linear regression (LR) model for modeling log (STIR), continental US.
| R | R Square | Adjusted R Square | Change Statistics | Durbin–Watson | |||||
|---|---|---|---|---|---|---|---|---|---|
| R Square Change | F | df1 | df2 | Sig. | |||||
| LR | 0.666 a | 0.443 | 0.440 | 0.443 | 184.246 | 8 | 1854 | 0.000 | 2.041 |
a. Predictors: (Constant), POP73, LFE330, IPE110, POP778, Min Temp, SPR440, HIS305, RHI820. Dependent Variable: log (STIR).
Effects of environment and socio-economic factors on the log (STIR) using LR model.
| Unstandardized Coefficients | Standardized Coefficients |
| Sig. | 95.0% Confidence Interval for B | Collinearity Statistics | |||
|---|---|---|---|---|---|---|---|---|
|
|
| Beta | Lower Bound | Upper Bound | Tolerance | VIF | ||
|
| 0.001 | 0.009 | 0.993 | −0.198 | 0.200 | |||
|
| −0.007 | −0.294 | −11.117 | 0.000 | −0.009 | −0.006 | 0.429 | 2.328 |
|
| −0.023 | −0.166 | −7.929 | 0.000 | −0.029 | −0.017 | 0.683 | 1.463 |
|
| 0.013 | 0.210 | 9.809 | 0.000 | 0.010 | 0.016 | 0.653 | 1.532 |
|
| 0.083 | 0.282 | 12.621 | 0.000 | 0.070 | 0.095 | 0.602 | 1.661 |
|
| 0.012 | 0.140 | 6.662 | 0.000 | 0.008 | 0.015 | 0.677 | 1.477 |
|
| −0.009 | −0.097 | −4.703 | 0.000 | −0.013 | −0.005 | 0.701 | 1.426 |
|
| −0.019 | −0.145 | −5.976 | 0.000 | −0.026 | −0.013 | 0.508 | 1.968 |
|
| −0.015 | −0.080 | −4.489 | 0.000 | −0.021 | −0.008 | 0.950 | 1.053 |
VIF: Variance inflation Factor.
Figure 5The Normal P-P Plot of LR model.
Comparison of multi-layer perceptron (MLP; one and two hidden layers), and LR model’ performance for predicting log (STIR) in the continental US.
| Model | Training | Cross-Validation | Test | ||||||
|---|---|---|---|---|---|---|---|---|---|
| MAE | RMSE | R | MAE | RMSE | R | MAE | RMSE | R | |
| LR | 0.27 | 0.35 | 0.66 | 0.27 | 0.36 | 0.65 | 0.28 | 0.36 | 0.61 |
| MLP (1 hidden layer) | 0.25 | 0.33 | 0.70 | 0.26 | 0.35 | 0.67 | 0.27 | 0.35 | 0.63 |
| MLP (2 hidden layers) | 0.26 | 0.34 | 0.69 | 0.26 | 0.35 | 0.65 | 0.27 | 0.36 | 0.62 |
Figure 6Scatter plot of observed and predicted log (STIR) (by single hidden layer MLP model) for test data in the continental US.
Figure 7The contribution of input features on predicting log (STIR) according to sensitivity analysis of single hidden layer MLP. RMSE: Root mean square error.