| Literature DB >> 19266093 |
Chandrajit Chatterjee1, Ram Rup Sarkar.
Abstract
Malaria is one of the most severe problems faced by the world even today. Understanding the causative factors such as age, sex, social factors, environmental variability etc. as well as underlying transmission dynamics of the disease is important for epidemiological research on malaria and its eradication. Thus, development of suitable modeling approach and methodology, based on the available data on the incidence of the disease and other related factors is of utmost importance. In this study, we developed a simple non-linear regression methodology in modeling and forecasting malaria incidence in Chennai city, India, and predicted future disease incidence with high confidence level. We considered three types of data to develop the regression methodology: a longer time series data of Slide Positivity Rates (SPR) of malaria; a smaller time series data (deaths due to Plasmodium vivax) of one year; and spatial data (zonal distribution of P. vivax deaths) for the city along with the climatic factors, population and previous incidence of the disease. We performed variable selection by simple correlation study, identification of the initial relationship between variables through non-linear curve fitting and used multi-step methods for induction of variables in the non-linear regression analysis along with applied Gauss-Markov models, and ANOVA for testing the prediction, validity and constructing the confidence intervals. The results execute the applicability of our method for different types of data, the autoregressive nature of forecasting, and show high prediction power for both SPR and P. vivax deaths, where the one-lag SPR values plays an influential role and proves useful for better prediction. Different climatic factors are identified as playing crucial role on shaping the disease curve. Further, disease incidence at zonal level and the effect of causative factors on different zonal clusters indicate the pattern of malaria prevalence in the city. The study also demonstrates that with excellent models of climatic forecasts readily available, using this method one can predict the disease incidence at long forecasting horizons, with high degree of efficiency and based on such technique a useful early warning system can be developed region wise or nation wise for disease prevention and control activities.Entities:
Mesh:
Year: 2009 PMID: 19266093 PMCID: PMC2648889 DOI: 10.1371/journal.pone.0004726
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Flowchart for multi step regression model with step-wise induction of variables.
Xi is the starting variable exhibiting highest coefficient of determination (R2) with the dependent variable and the initial pair wise relation is the rth order polynomial, while Xj is the subsequent variable (with pth order relation) with second highest R2.
Figure 2Dendrogram for normalized data set of zonal P. vivax deaths.
Figure 3The plot of fit of the predicted values to the observed data of SPR values.
Straight line with white squares-observed values of SPR; Dashed line with black circles-predicted values of SPR. Error bars are of 95% confidence intervals of the predicted response. Black ellipses-forecasted values of SPR (autoregressive forecasts) and Black square-observed SPR value for January, 2005.
Figure 4Residual plots for the fitted model.
(a) for SPR values and (b) for total P.vivax deaths.
ANOVA table of Regression.
| Source | Degrees of Freedom (DF) | Sum Squares (SS) | Mean Squares (MS) | F-values (Calculated) | F-values (Tabulated)–at 5% significance level (two-tailed distribution) |
|
| 8 | 0.1224 | 0.0153 | 6.56 | 2.32 |
|
| 26 | 0.0693 | 0.00267 | ||
|
| 34 | 0.1917 |
Figure 5The plot of fit of the predicted values to the observed data of P.vivax Deaths.
Straight line with white squares-observed values of P.vivax Deaths; Dashed line with black circles-predicted values of P.vivax Deaths from the model. Error bars are of 95% confidence intervals of the predicted response. Black ellipses-forecasted values of P.vivax Deaths and Black squares-observed values of P.vivax Deaths for November and December, 2006.
The final model fits for individual clusters.
| CLUSTERS | Variables selected | Model Equations proposed for the clusters (Y = scaled P.V. deaths) | Coefficient of Determination (R2) and ANOVA Test |
|
| Minimum Temperature (x1), Maximum Humidity (x2), Total Rainfall (x3), | Y = 8.19E-04+2.73E-05 x1−1.55E-08 x1 3+8.638E-06 x2−5.43E-12 x2 4−3.9E-07 x3+2.859E-09 x3 2−1.39E-14 x3 4 | R2 = 75.03%F-value: 1.71d.f.: (7, 4)Sig.-F: 0.31 |
|
| Minimum Temperature (x1), Total Rainfall (x3), Maximum Temperature (x5) | Y = 4.899E-03−5.34E-04 x1+1.174E-05 x1 2+ 3.992E-06 x3−2.53E-08 x3 2+4.637E-11 x3 3+ 5.177E-06 x5 2−1.16E-07 x5 3 | R2 = 77.2%F-value: 1.93d.f.: (7, 4)Sig.-F: 0.27 |
|
| Minimum Humidity (x4), Maximum Temperature (x5) | Y = −3.08E-04+8.628E-06 x4−8.658E-08 x4 2+7.613E-06 x5−1.90E-12 x5 5 | R2 = 34.2%F-value: 0.91d.f.: (4, 7)Sig.-F: 0.51 |
|
| Maximum Humidity (x2), Total Rainfall (x3), Minimum Humidity (x4), | Y = 5.958E-04−3.64E-06 x2−4E-07 x3+1.533E-10 x3 2+3.913E-18 x3 5−1.07E-07 x4 2+2.318E-13 x4 5 | R2 = 69.59%F-value: 1.91d.f.: (6, 5)Sig.-F: 0.25 |
|
| Total Rainfall (x3), Minimum Humidity (x4), | Y = −1.14E-03+1.861E-06 x3−2.08E-08 x3 2+6.552E-11 x3 3−1.48E-16 x3 5+4.413E-05 x4−3.49E-07 x4 2 | R2 = 50.34%F-value: 0.84d.f.: (6, 5)Sig.-F: 0.58 |
The ANOVA of regression yields the F-values, which are significant at the 5% level of significance for corresponding degrees of freedoms of the F-distribution.
Figure 6Plot of fits of predictions based on the proposed model against the observed values for 5 clusters.
(a) cluster I, (b) cluster II, (c) cluster III, (d) cluster IV, (e) cluster V (Straight line with white squares-observed P.vivax Deaths; Dashed line with black circles-predicted values of P.vivax Deaths from the model). Error bars are of 95% confidence intervals of the predicted response.