| Literature DB >> 17892584 |
Marlies H Craig1, Brian L Sharp, Musawenkosi L H Mabaso, Immo Kleinschmidt.
Abstract
BACKGROUND: Several malaria risk maps have been developed in recent years, many from the prevalence of infection data collated by the MARA (Mapping Malaria Risk in Africa) project, and using various environmental data sets as predictors. Variable selection is a major obstacle due to analytical problems caused by over-fitting, confounding and non-independence in the data. Testing and comparing every combination of explanatory variables in a Bayesian spatial framework remains unfeasible for most researchers. The aim of this study was to develop a malaria risk map using a systematic and practicable variable selection process for spatial analysis and mapping of historical malaria risk in Botswana.Entities:
Mesh:
Year: 2007 PMID: 17892584 PMCID: PMC2082025 DOI: 10.1186/1476-072X-6-44
Source DB: PubMed Journal: Int J Health Geogr ISSN: 1476-072X Impact factor: 3.918
Figure 1Malaria prevalence data. Malaria prevalence of infection in 1 to 14 year old children, in Botswana, during the 1961/62 national survey.
Figure 2Month of survey during the 1961/62 national malaria survey.
Results of uni-variate analysis from Stage 1. Odds Ratios (AIC in parentheses) from univariate logistic regression analysis of 50 different environmental variables from 7 themes, against malaria prevalence. P-values were non-significant (n.s.), <0.05(*), <0.01(**) or <0.0005 (***), n = 122. The equation was logit(prevalence) = coefficient × co-variate + constant. NDVI = normalized difference vegetation index.
| Annual mean (total for rainfall) | 1.0085 (27.6)** | 4.22 (13.6)*** | 1.094 (12.8)*** | 1.091 (28.9)* | 1.07 (31.3) n.s. | |
| Annual maximum (highest monthly value) | 1.045 (20.8)*** | 3.034 (23.3)*** | 1.067 (11.7)*** | 1.090 (25.7)*** | 10.4 (32.2) n.s. | |
| Annual minimum (lowest monthly value) | 3.29 (13.9)*** | 1.11 (17.1)*** | 1.1048 (29.8)* | 1.06 (32.7) n.s. | ||
| Annual range (highest minus lowest month) | 0.52 (27.1)** | 1.12 (15.8)*** | 1.14 (30.8)* | 1.03 (32.7) n.s. | ||
| Standard deviation (Appendix) | 1.03 (21.9)*** | 0.54 (25.0)*** | 0.54 (14.7)** | 1.073 (26.6)*** | 1.03 (32.8) n.s. | |
| Proportional standard deviation (Appendix)‡ | 61.8 (13.0)*** | -214 (17.3)*** | 0.004 (33.4) n.s. | 0.1 (26.8)*** | 43.3 (32.9) n.s. | |
| Summer mean (total for rainfall) Dec–Mar | 1.012 (22.9)*** | 2.59 (27.1)*** | 1.065 (11.6)*** | 1.078 (28.9)* | ||
| Winter mean (total for rainfall) Apr–Oct | 0.88 (14.8)*** | 3.22 (12.0)*** | 1.11 (16.0)*** | 1.097 (28.6)** | ||
| Concentration (see Appendix) | 1.39 (13.3)*** | |||||
| Number of months >80 mm (>60 & >40 mm n.s.) | 1.81 (26.6)** | |||||
| Number of months >16°C | 2.72 (18.9)*** | |||||
| Number of months >165 (other cut-offs were n.s.) | 1.13 (31.5) n.s. | |||||
| Total in months with more than 80 mm | 1.0059 (24.0)*** | |||||
| Total degree months above 16°C | 1.050 (15.7)*** | |||||
| Effective temperature (Appendix) | 21.8 (12.6)*** | |||||
| Mean daily minimum of coldest month | 2.29 (21.4)*** | |||||
| Elevation | 0.997 (29.7)** | |||||
| Log distance to perennial water (m) | 0.56 (21.6)*** | |||||
| Log distance to perennial/non-perennial water (m) | 0.72 (30.5)** | |||||
| Land cover (binary; moist | 4.76 (25.5)*** | |||||
| Month of survey (binary; peak season April/May | 8.67 (29.4)*** | |||||
‡ The co-efficients, not the Odds Ratios, are shown, as the unit is a fraction, and the Odds Ratio near zero (= exp(co-efficient)).
§ Radiance units for NDVI (fractions from 0 to 1) are translated to a byte-compatible scale from 1 to 256.
Figure 3Flow diagram of staged variable selection procedure.
Figure 4Plots of malaria prevalence against fourteen potential explanatory variables. Scatter – and box plots of candidate environmental explanatory variables used in step-wise procedures. Malaria prevalence in 1 to 14 year old children, Botswana, 1961/62, is shown on the Y axis on a logit scale. (A) annual maximum rainfall (mm); (B) winter (April – October) total rainfall (mm); (C) rainfall concentration (%); (D) winter (April – October) mean temperature (°C); (E) annual maximum temperature (°C); (F) temperature proportional standard deviation (°C); (G) elevation (m); (H) annual maximum NDVI; (I) NDVI standard deviation; (J) summer (December–March) mean vapour pressure (hPa); (K) vapour pressure standard deviation (hPa); (L) log distance to permanent water (m); (M) land cover: dry/low risk, moist/high risk areas; (N) start month of survey.
Results of bootstrap step-wise procedures. Variables included in the candidate lists of Stage 3 and Stage 5, and their selection frequency (fq), in four separate automated stepwise backward variable exclusion procedures, each time against 1000 bootstrap samples of the malaria prevalence data.
| annual maximum * | 904 | annual maximum | 560 | annual maximum | 533 | annual maximum | 914 | |
| summer total † | 821 | |||||||
| number of months >80 mm | 760 | |||||||
| SD | 726 | |||||||
| total in months >80 mm | 716 | |||||||
| annual total | 612 | |||||||
| winter total | 749 | |||||||
| proportional SD | 642 | |||||||
| winter mean * | 885 | winter mean | 993 | winter mean | 878 | winter mean | 665 | |
| annual mean † | 914 | |||||||
| summer mean | 885 | |||||||
| number of months >16°C | 681 | |||||||
| mean in months >16°C | 670 | |||||||
| annual maximum | 665 | |||||||
| winter minimum | 627 | |||||||
| effective | 615 | |||||||
| annual minimum | 558 | |||||||
| proportional SD * | 754 | proportional SD | 897 | proportional SD | 544 | proportional SD | 624 | |
| SD | 786 | |||||||
| annual range | 537 | |||||||
| annual maximum | 660 | |||||||
| SD | 495 | |||||||
| summer mean | 441 | |||||||
| annual maximum | 567 | |||||||
| SD | 469 | |||||||
| 874 | elevation | 988 | elevation | 819 | elevation | 994 | ||
| 616 | ||||||||
| 988 | land cover | 996 | land cover | 997 | land cover | 996 | ||
| 527 | ||||||||
NDVI – normalized difference vegetation index; SD – standard deviation
* Variables selected into Stage 4 model
† Variables selected into Stage 5 model
‡ Example: Five alternative rainfall indicators, listed in candidate list 1 under Stage 5, were strongly correlated with – and had been excluded in favour of – the annual maximum in Stage 2. In Stage 5, all six competing rainfall indicators were included in the candidate list, along with the other variables of the Stage 4 model. Of the six competitors the most frequently selected was summer total. In Stage 5 summer total therefore replaced annual maximum rainfall.
Figure 5Distribution of coefficients of fourteen candidate variables in 1000 stepwise bootstrap models. Frequency histograms of coefficients obtained in automated backward stepwise exclusion regression analysis against 1000 bootstrap samples of the malaria prevalence data in Stage 3. In each case the vertical black line indicates coefficient = 0. (A) annual maximum rainfall (mm); (B) winter (April – October) total rainfall (mm); (C) rainfall concentration (%); (D) winter (April – October) mean temperature (°C); (E) annual maximum temperature (°C); (F) temperature proportional standard deviation (°C); (G) elevation (m); (H) annual maximum NDVI; (I) NDVI standard deviation; (J) summer (December–March) mean vapour pressure (hPa); (K) vapour pressure standard deviation (hPa); (L) log distance to permanent water (m); (M) land cover: dry/low risk, moist/high risk areas; (N) start month of survey: main season (April–May).
Results of the Stage 5 non-spatial model. Odds ratios, z-scores, and confidence interval estimated from non-spatial regression against four variables, fitted on derivation data only (n = 81, AIC = 8.06).
| rainfall summer total (per 100 mm) | 2.33 | 6.9 | <0.0005 | 1.84 | 2.99 |
| temperature annual mean (per °C) | 8.85 | 9.05 | <0.0005 | 5.53 | 14.15 |
| elevation (per 100 m) | 1.68 | 3.8 | <0.0005 | 1.28 | 2.20 |
| high risk land cover | 0.188 | -5 | <0.0005 | 0.098 | 0.361 |
Figure 6Predicted . Predicted versus observed prevalence, on a logit scale, for the derivation (crosses) and validation (squares) data of the Stage 5 non-spatial model, and for the median (closed circles) and upper/lower confidence interval (spikes) of the Stage 6 spatial model.
Results of the Stage 6 spatial model. Odds ratios and confidence interval estimated from Stage 6 spatial model, fitted on all prevalence data (n = 122).
| rainfall summer total (per 100 mm) | 2.01 | 1.49 | 2.70 |
| temperature annual mean (per °C) | 5.75 | 4.14 | 8.08 |
| elevation (per 100 m) | 1.82 | 1.49 | 2.22 |
Φ = 0.003, 95% CI = 0, 0.0174, σ2 = 0.77, 95% credible interval (0.53, 1.14)
Figure 7Maps of predicted malaria prevalence and covariates. Predicted pre-control childhood malaria prevalence maps for Botswana, resulting from (A) the stage 5 non-spatial model and (B) the stage 6 spatial model; 118 survey sites are shown; (C) the upper and lower 95% CI of the spatial model. Co-variates used in the models: (D) annual mean temperature, C; (E) summer total rainfall, mm; (F) elevation, m; (G) land cover categories, high-risk/low-risk. Lines represent district boundaries.