Literature DB >> 35627854

Accounting for Sampling Weights in the Analysis of Spatial Distributions of Disease Using Health Survey Data, with an Application to Mapping Child Health in Malawi and Mozambique.

Sheyla Rodrigues Cassy^1,2, Samuel Manda^3,4, Filipe Marques^2,5, Maria do Rosário Oliveira Martins⁶.

Abstract

Most analyses of spatial patterns of disease risk using health survey data fail to adequately account for the complex survey designs. Particularly, the survey sampling weights are often ignored in the analyses. Thus, the estimated spatial distribution of disease risk could be biased and may lead to erroneous policy decisions. This paper aimed to present recent statistical advances in disease-mapping methods that incorporate survey sampling in the estimation of the spatial distribution of disease risk. The methods were then applied to the estimation of the geographical distribution of child malnutrition in Malawi, and child fever and diarrhoea in Mozambique. The estimation of the spatial distributions of the child disease risk was done by Bayesian methods. Accounting for sampling weights resulted in smaller standard errors for the estimated spatial disease risk, which increased the confidence in the conclusions from the findings. The estimated geographical distributions of the child disease risk were similar between the methods. However, the fits of the models to the data, as measured by the deviance information criteria (DIC), were different.

Entities: Chemical

Keywords: Bayesian spatial smoothing; child malnutrition, fever and diarrhea; disease mapping; sub-Saharan Africa; survey sampling weights

Mesh：

Year: 2022 PMID： 35627854 PMCID： PMC9140664 DOI： 10.3390/ijerph19106319

Source DB: PubMed Journal: Int J Environ Res Public Health ISSN： 1660-4601 Impact factor: 4.614

1. Introduction

In epidemiology and public health, the methods for mapping disease have long been used to estimate spatial patterns of disease risk. Statistical advances in the methods have included spatial smoothing of disease risk to produce interpretable maps, and extensions to include temporal components as well as individual and geographical-level data. Estimation of geographical patterns of diseases in low-resource settings is increasingly important in guiding decision-making on where to allocate resources [1,2]. In sub-Saharan Africa, many disease mapping analyses that use data from complex health surveys fail to account for the survey designs such as disproportionate sampling [2]. In standard survey analyses, disproportionate sampling is corrected in the analysis by using the survey sampling weights that adjust for the disproportionate contribution of each ultimate sampling unit to the whole sample data. Ignoring the sampling weight in disease mapping analyses could lead to biased estimates of the spatial distributions of the disease risk, which could adversely affect policy decisions based on them. Thus, appropriate statistical analysis methods that incorporate sampling weights in the estimation of spatial patterns of diseases are critical [3,4,5,6,7]. In this paper, rather than detailing the epidemiology of child diseases in sub-Saharan Africa, a research topic that has extensively been analysed in several disease-mapping analyses in Africa, we present recent statistical analysis methods for incorporating sampling weights in the estimation of the geographical distributions of disease risk using complex health survey data. The two datasets: 2015-16 Malawi Demographic and Health Survey (2015-16 MDHS) [8], and the 2015 Mozambique Immunization Malaria and HIV/AIDS Key Indicator Survey (IMASIDA 2015) [9], are used for illustrative purposes using the mapping of child malnutrition, fever, and diarrhoea.

2. Methods

2.1. General Notation

Suppose that a finite population is distributed into areas, and a random probability sample survey with size is taken according to a given design. Let and be the population and sample of sizes for area , respectively, such that . Let be the binary indicator for the presence of the disease, taking a value of 1 or 0 on whether or not the individual has the disease in area . It is assumed that is known for each area . Further, we assume that individual has a known probability of being included in the sample. Our interest is to estimate the true area-specific population prevalence , which is defined as: using the accrued sample from area . The area-specific unweighted estimator of the true area prevalence is given by which is calculated as: and its variance is obtained as: In the case of a simple random sampling design without replacement, the estimator (2) is unbiased. However, in complex sampling, it would be inadequate as it does not account for the sample survey design, for example, sampling weights [10].

2.2. The Horvitz–Thompson Estimator

The well-known sample-design based unbiased estimator of the population prevalence is the Horvitz–Thompson (HT) estimator [11], which is given by: where is the set of individuals who are sampled from area , with being the observed value for with , the design weight (i.e., the sampling weights are the inverse probability of inclusion in the sample adjusted for non-response [12]), and given by: is the normalized sampling weight. According to study [4], an estimator of the variance of can be expressed as follows: This HT estimator falls into the group of considered direct estimators, as they are based only on the area sample data [13,14].

2.3. Bayesian Hierarchical Spatial Smoothing Models

Bayesian hierarchical spatial smoothing models have recently gained attention regarding their use in small area estimation instead of direct estimators [4,5]. These methods rely on the assumption that area-specific estimates borrow information from other areas, which makes it possible to find more accurate estimates. Furthermore, this creates the advantage that estimates can be obtained in areas with no samples. The models involve three stages: (i) the likelihood of the response, which is defined conditionally on latent variables (random effects); (ii) the latent variables themselves are given a distribution, and (iii) the specification of prior distributions of all unknown parameters. A three-stage Bayesian hierarchical spatial rmoothing model for the total number of individuals with the disease in area given by uses a binomial distribution for stage one as: In the second stage, we model the between-area variation in using the area random-effects model. In recent times, this has involved incorporating both non-spatial and spatial random effects using the convolution model of Besag–York–Mollié (BYM) [15]. Besides, the standard binomial spatial model, we will describe a series of models based on the BYM model that have been used to perform spatial analyses on the prevalence data from health surveys. Using the Binomial distribution in (7), our first model is a standard spatial modelling approach for count data using health survey data to estimate the spatial distribution of disease risk. It simply links the estimated prevalence of the disease with the two types of area random terms via a logit function as: where is the intercept; is the unstructured random component; is the structured spatial random component; indicates the set of neighbours, and is the number of neighbours for a given area . Here, we adopt the common convention of neighbouring, which considers two areas as neighbours if they share a common boundary. The specification of the structured random effects is based on an intrinsic conditional autoregressive (ICAR) prior [15,16]. We call this, the Binomial Spatial Model, as Model 1. In the third stage, we require priors for and the variances of the random effects. The model in Equation (8) results in the smoothing of extreme area estimates in areas with small sample sizes. However, without the incorporation of sampling weights, the estimated of the spatial patterns of disease risk could be biased.

2.4. Incorprating Survey Sampling Weights in Hierarchical Spatial Model Analysis

Following studies [4,5], various approaches have been proposed to account for the survey sampling weight. Let us consider now Models 2 and 3 which are based on the HT estimator. As the HT estimator could be skewed, the estimates are often transformed to approximately conform to normality, Some of the most common transformations are bases the logit and arcsine functions. We first consider the logit transformation. Thus, Model 2 is given by: where has variance . We will call Model 2, the Logit Normal (LN) spatial model. For the arcsine square-root transformation, given as [5,17], the Arcsine (AS) spatial model leads to the following model specification: and the variance of the arcsine transformation is , where is the effective sample size in area Thus, our Model 3 is the Arcsine square root (AS) spatial model. For Model 4, we consider pseudo-likelihood (PL), which uses a weighted likelihood [5], where the response values are weighted using the normalised design weights. Thus, rather than using the binomial outcomes used in (7) (Model 1), here, we use as: A drawback of the general approach is that the appropriate standard error is not recovered in the case of clustering. Rabe-Hesketh and Skrondal [18] used a pseudo-likelihood method with scaled weights and used sandwich estimation to provide valid standard error estimates within a multilevel framework but did not consider spatial smoothing. We denoted the pseudo-likelihood (PL) spatial model as Model 4. Our Models 5 and 6 are also variations of Model 1, but they now depend on effective sample size and the number of cases. For Model 5, the effective sample size is computed as previously shown and depends on the weighted estimator of prevalence [4]. The effective sample size is the sample size that is required to make the variance under the complex survey design equivalent to that of a simple random sample [4]. Then, the effective number of cases is easily found as . As for Model 6, the effective sample size is obtained by using the design effect in area and is estimated as: where: where is the unbiased direct estimate of the variance of the sample proportion based on the complex sampling design and is the unbiased direct estimate of the variance of the proportion based on the simple random sampling design [19]. As before, this resulted in the effective number of cases in area as .

2.5. Bayesian Inference, Computation, and Model Evaluation

For the Bayesian estimation of the model parameters, we assumed an improper uniform prior for and priors for both the spatial and non-spatial precision parameters and as in [5]. The estimations were done using Integrated Nested Laplace Approximation (INLA), which is implemented in the INLA package within the statistical computer software R [20,21,22]. Detailed description of INLA are provided in Appendix A. Model comparison and selection were carried out using the deviance information criterion (DIC) [23]. DIC value is computed as where is the posterior mean of the deviance which measures the goodness of fit and is the effective number of parameters which penalises for the complexity of the model. Models with the smallest DIC indicated a better model fit.

3. Application

Although Malawi and Mozambique have experienced substantial improvements in child health, preventable child deaths continue to be unacceptably high to achieve the Sustainable Development Goals [24,25,26,27,28,29]. Understanding the local epidemiology of diseases in these two countries is critical for defining and prioritising interventions that can contribute to accelerating the reduction of morbidity and mortality in children under 5 years old in these countries. We used Models 1–6, presented above, to estimate the geographical distribution of stunting, wasting, and underweight among children under 5 years old at the district level in Malawi, and childhood fever and diarrhoea at the province level in Mozambique.

3.1. Data Sources: Malawi and Mozambique

The 2015-16 MDHS was a national, population-based, cross-sectional survey that was conducted between December 2015 and February 2016. Briefly, the 2015-16 MDHS employed a two-stage sampling designed to produce a nationally representative sample at the national level, residence level (urban and rural), and district level. Stratification was made at two levels: the district level (32 districts), and the urban and rural areas. In the first stage, based on the Malawi Population and Housing Census conducted in Malawi in 2008, and updated based on the General Agriculture Census 2009, 850 primary sample units (PSUs) were selected, which were the enumeration areas (EAs), with a probability proportional to their size (size given by the number of households in each enumeration area). Of these PSUs, 173 were in urban areas and 677 were in rural areas. The second stage of sampling involved a systematic selection of 30 households from each urban cluster and 33 households from each rural cluster, yielding a sample size of 27,516 households from the clusters. The response rate was 99%. The methodology used in the 2015-16 MDHS has been reported in detail in [8]. Figure 1a depicts the geospatial arrangement of the districts of Malawi.

Figure 1

Map of Malawi showing the 32 districts (a) and map of Mozambique showing the 11 provinces (b).

The second dataset used was the IMASIDA 2015, which includes information from 7169 households, interviewing 7749 women aged 15 to 59 years and 5283 men aged 15 to 59 years, over 307 EAs, with data collected between June and September 2015 through a two-stage sampling process designed to produce representative estimates at the national, provincial (11 geographic areas: Maputo Province, Maputo City, Inhambane, Gaza, Sofala, Manica, Zambezia, Nampula, Tete, Niassa, and Cabo Delgado), regional (north, centre and south), and the residence of areas (urban and rural), and for women and men aged 15–59 years. The methodology used has been reported in detail elsewhere [9]. Figure 1b depicts the geospatial arrangement of the provinces of Mozambique. All of these datasets are publicly available and can be downloaded at https://dhsprogram.com/ (accessed on 21 July 2021).

3.2. Outcomes

The outcomes considered in this study for childhood in Malawi are three nutritional statuses of children, namely stunting, wasting, and underweight. Anthropometric measurements were used to define the nutritional status of children. Children with a z-score of two standard deviations (−2 SD) below the median of the WHO reference population on height-for-age are categorised as stunted; on weight-for-height as wasted, and on weight-for-age as underweight [24]. Thus, all outcome variables were binary, taking a value of “1” if a child is malnourished (i.e., stunted, wasted, or underweight), and a value of “0” otherwise. Due to missing data on these measurements, only 5149 children were considered for stunting analyses, 5178 for wasting, and 5223 for underweight, respectively. For the Mozambique data, the outcomes considered were the fever and diarrhoea statuses. Children under 5 years old who had their mother answer whether they had diarrhoea or fever within the past 2 weeks were included in the analysis. The remaining children with missing values for the outcomes were excluded from our research. Thus, our analyses included a total of 4972 children under 5 years old for fever and 4980 children for diarrhoea.

3.3. Malawi: District Variation in the Prevalence of Child Malnutrition

The observed prevalence (weighted) of stunting, wasting, and underweight in Malawi among children under 5 years old was 36.82% (95% CI 35.18–38.46), 2.79% (95% CI 2.24–3.33), and 11.58% (95% CI 10.49–12.67), respectively, with the variation across districts that ranged from 15.44% in Mzuzu City to 45.88% in Mchinji for stunting; 1% in Balaka to 9.92% in Nsanje for wasting, and 1.93% in Zomba City to 18.85% in Nsanje for underweight (Table A1 in Appendix B).

Table A1

Summary of the observed (weighted) prevalence rates of stunting, wasting, and underweight per district in Malawi.

	Stunting		Wasting		Underweight
District	N. Respondents	(Stunted, %)	N. Respondents	(Wasted, %)	N. Respondents	(Underweighted,%)
Chitipa	139	43 (33.08)	144	2 (1.39)	141	18 (13.89)
Karonga	142	38 (28.00)	143	2 (1.56)	144	14 (9.18)
Nkhata Bay	149	47 (31.33)	150	1 (0.17)	152	10 (5.89)
Rumphi	147	44 (31.87)	147	3 (1.74)	147	20 (13.75)
Mzimba	158	70 (44.79)	159	5 (3.19)	158	22 (13.45)
Likoma	128	33 (26.99)	128	5 (4.26)	129	11 (9.13)
Mzuzu City	39	7 (15.44)	39	1 (2.72)	39	1 (2.72)
Kasungu	211	73 (35.90)	215	5 (2.73)	216	14 (6.56)
Nkhotakota	202	69 (32.58)	204	7 (1.82)	202	32 (13.05)
Ntchisi	181	68 (40.55)	182	4 (1.80)	186	20 (11.53)
Dowa	199	74 (39.42)	199	2 (1.05)	200	17 (9.31)
Salima	212	78 (36.75)	214	4 (1.60)	213	28 (13.81)
Lilongwe Rural	147	63 (43.28)	149	1 (0.64)	150	14 (9.42)
Mchinji	210	95 (45.88)	214	8 (3.35)	213	26 (12.20)
Dedza	177	72 (41.18)	177	5 (2.85)	179	28 (15.41)
Ntcheu	200	78 (40.77)	199	8 (3.74)	201	26 (12.99)
Lilongwe City	74	15 (19.55)	74	3 (4.14)	74	6 (8.07)
Mangochi	244	107 (44.33)	246	2 (0.90)	255	32 (12.08)
Machinga	247	95 (38.50)	247	9 (3.71)	253	40 (15.58)
Zomba Rural	182	67 (36.90)	179	8 (4.50)	183	22 (11.93)
Chiradzulu	140	48 (33.62)	144	9 (6.53)	145	19 (12.81)
Blantyre Rural	86	28 (32.84)	87	5 (5.53)	86	7 (8.00)
Mwanza	143	46 (31.24)	143	12 (7.03)	150	23 (14.55)
Thyolo	151	54 (34.43)	149	6 (3.78)	150	22 (13.29)
Mulanje	172	66 (36.91)	172	6 (3.58)	174	29 (16.30)
Phalombe	211	68 (33.09)	216	4 (2.03)	212	21 (10.31)
Chikwawa	180	55 (30.18)	181	8 (4.80)	184	21 (11.27)
Nsanje	159	48 (31.70)	161	17 (9.92)	162	27 (18.85)
Balaka	213	69 (32.73)	212	0 (0.00)	214	26 (13.19)
Neno	165	72 (45.28)	164	7 (4.70)	168	30 (18.44)
Zomba City	49	8 (17.75)	48	3 (5.56)	50	1 (1.93)
Blantyre City	92	28 (30.47)	92	2 (2.10)	93	7 (7.57)
Total	5149	1826 (36.82)	5178	164 (2.79)	5223	634 (11.58)

We applied Models 1–6 to estimate the district-level pattern of child growth measures in Malawi. The fit and parameter estimates are presented in Table 1, Table 2 and Table 3. For each child’s growth measurement, the results showed similar estimates for the intercept parameters, except for Model 3 (AS), which was on a different scale. The credible intervals for the intercept parameters were generally narrower when the sampling weights were accounted for. The spatial Model 3 (AS) performed better (DIC = −86.69 for stunting; DIC = −94.76 for wasting; DIC = −89.7 for underweight).

Table 1

A comparison of spatial models for mapping child stunting in Malawi using 2015-16 MDHS.

Parameters	Model 1	Model 2	Model 3	Model 4	Model 5	Model 6
β0 (CI)	−0.605	−0.585	0.634	−0.607	−0.594	−0.609
	(−0.706; −0.547)	(−0.67; −0.505)	(0.608; 0.659)	(−0.691; −0.526)	(−0.68; −0.512)	(−0.694; −0.528)
Sd	0.04	0.042	0.013	0.042	0.043	0.042
σu	0.210	0.170	0.069	0.219	0.188	0.218
σv	0.114	0.115	0.053	0.125	0.125	0.126
−2LL	−135.88	−24.45	12.88	−137.78	−133.44	−137.21
p(D)	18.54	16.47	23.56	19.55	17.64	19.32
DIC	226.85	6.24	−86.69	229.23	222.80	228.22

Table 2

A comparison of spatial models for mapping child wasting in Malawi using 2015-16 MDHS.

Parameters	Model 1	Model 2	Model 3	Model 4	Model 5	Model 6
β0 (CI)	−3.514	−3.458	0.166	−3.577	−3.658	−3.609
	(−3.715; −3.325)	(−3.661; −3.257)	(0.143; 0.19)	(−3.78; −3.386)	(−3.87; −3.457)	(−3.814; −3.416)
Sd	0.099	0.103	0.012	0.1	0.105	0.101
σu2	0.493	0.353	0.061	0.486	0.519	0.467
σv2	0.148	0.121	0.0481	0.144	0.152	0.143
−2LL	−95.22	−48.39	18.11	−94.04	−93.43	−92.31
p(D)	13.94	9.68	22.59	13.13	13.22	12.44
DIC	150.84	60.80	−94.76	148.73	147.06	145.98

Table 3

A comparison of spatial models for mapping child underweight in Malawi using 2015-16 MDHS.

Parameters	Model 1	Model 2	Model 3	Model 4	Model 5	Model 6
β0 (CI)	−2	−1.981	0.344	−2.003	−2.022	−2.008
	(−2.109; –1.897)	(−2.09; −1.877)	(0.32; 0.368)	(−2.112; −1.901)	(−2.133;−1.916)	(−2.116; −1.905)
Sd	0.054	0.054	0.012	0.053	0.055
σu	0.1390	0.1197	0.0634	0.1523	0.1400	0.1420
σv	0.1371	0.1139	0.0495	0.1286	0.1261	0.1321
−2LL	−117.62	−29.98	15.99	−117.38	−113.23	−115.35
p(D)	13.38	10.42	22.91	13.41	12.23	12.91
DIC	197.75	24.96	−89.77	197.28	190.55	193.68

Figure 2 presents maps of the district-level observed and spatially estimated prevalence of stunting. The spatial pattern in the stunting prevalence was smoother compared to the spatial pattern based on the observed prevalence. Generally higher stunting rates were found in the main central districts of the country. For wasting (Figure 3), districts in the southern part of the country bore the most burden. The spatial trend for wasting was similar to that of underweight prevalence (Figure 4).

Figure 2

Maps of the observed (UW: Unweighted, HT: Horvitz–Thompson) and spatial estimated prevalences of stunting (UB: Unadjusted Binomial estimator (Model 1), LN: Logit-normal estimator (Model 2), AN: Arcsine-square root transformation estimator (Model 3), PL: Pseudo-likelihood estimator (Model 4), ES: Effective Sample size estimator (Model 5), and ES-deff: Effective Sample size estimator using design effect (Model 6)) by the district in Malawi using 2015-16 MDHS.

Figure 3

Maps of the observed (UW: unweighted and HT: Horvitz–Thompson) and spatial estimated prevalences of wasting (UB: Unadjusted Binomial estimator (Model 1), LN: Logit-normal estimator (Model 2), AN: Arcsine-square root transformation estimator (Model 3), PL: Pseudo-likelihood estimator (Model 4), ES: Effective Sample size estimator (Model 5), and ES-deff: Effective Sample size estimator using design effect (Model 6)) by the district in Malawi using 2015-16 MDHS.

Figure 4

Maps of the observed (UW: unweighted and HT: Horvitz Thompson) and spatial estimated prevalences of underweight (UB: Unadjusted Binomial estimator (Model 1), LN: Logit-normal estimator (Model 2), AN: Arcsine-square root transformation estimator (Model 3), PL: Pseudo-likelihood estimator (Model 4), ES: Effective Sample size estimator (Model 5), and ES-deff: Effective Sample size estimator using design effect (Model 6)) by the district in Malawi using 2015-16 MDHS.

3.4. Mozambique: Pronvicial Variations in the Prevalence Child Fver and Diarrhoea

A summary of the province’s prevalence of fever and diarrhoea is provided in Table A2 in Appendix B. Overall, about 29.37% (95% CI 26.99–31.87) of children under 5 years old had a fever and 11.11% (95% CI 9.93–12.41) had diarrhoea, with variation across provinces in Mozambique, ranging from 14.37% in Tete to 51.67% in Zambezia for fever, and 6.8% in Tete to 17.19% in Niassa. The model-fit criteria values and parameter estimates are presented in Table 4 and Table 5. The estimates of the intercepts from the models for each condition were similar, except for Model 3 (AS) which was an indifferent scale. Moreover, the estimates of the intercepts were slightly more precise by having narrower credible intervals when sampling weights were accounted for. Furthermore, the spatial Model 3 (AS) was the best fitting model.

Table A2

Summary of the observed (weighted) prevalence rates of fever and diarrhoea per province in Mozambique.

	Fever		Diarrhea
District	N. Respondents	(N, %)	N. Respondents	(N, %)
Niassa	546	146 (30.16)	546	94 (17.19)
Cabo Delgado	380	86 (21.94)	383	37 (9.90)
Nampula	595	228 (39.49)	597	64 (11.28)
Zambezia	555	264 (51.67)	556	90 (16.95)
Tete	448	65 (14.37)	449	39 (6.80)
Manica	479	81 (16.59)	479	43 (8.90)
Sofala	505	109 (21.58)	506	46 (8.43)
Inhambane	323	68 (18.23)	323	27 (7.24)
Gaza	530	141 (27.00)	530	61 (11.79)
Maputo Provincia	340	53 (15.86)	340	23 (8.25)
Maputo Cidade	271	70 (24.99)	271	28 (9.99)
Total	4972	1311 (29.37)	4980	549 (11.11)

Table 4

A comparison of spatial models for mapping child fever in Mozambique using IMASIDA 2015.

Parameters	Model 1	Model 2	Model 3	Model 4	Model 5	Model 6
β0 (CI)	−1.137	−1.123	0.523	−1.129	−1.13	−1.13
	(−1.433; −0.844)	(−1.453; −1.123)	(−0.45; 0.597)	(−1.496; −0.767)	(−1.458; −0.805)	(−1.458; −0.805)
Sd	0.146	0.162	0.036	0.183	0.161
σu	0.1734	0.1863	0.1046	0.1854	0.1848	0.1848
σv	0.4660	0.5123	0.1094	0.5241	0.5172	0.5172
−2LL	−62.86	−16.20	−0.834	−63.98	−61.28	−61.28
p(D)	10.52	10.28	10.66	10.60	10.50	10.50
DIC	89.60	0.135	−37.64	89.54	86.86	86.86

Table 5

A comparison of spatial models for mapping child diarrhoea in Mozambique using IMASIDA 2015.

Parameters	Model 1	Model 2	Model 3	Model 4	Model 5
β0 (CI)	−2.145	−2.14	0.329	−1.945	−2.152
	(−2.319;−1.979)	(−2.335; −1.956)	(0.283; 0.374)	(−2.189; −1.708)	(−2.34; −1.975)
Sd	0.085	0.095	0.023	0.118	0.091
σu2	0.1651	0.1799	0.0748	0.2303	0.1724
σv2	0.2257	0.2231	0.0640	0.3438	0.2442
−2LL	−50.27	−10.40	4.37	−53.70	−48.96
p(D)	8.75	7.90	10.15	9.83	8.64
DIC	81.04	4.30	−39.23	81.71	78.83

Figure 5 and Figure 6 present prevalence maps of fever and diarrhoea, respectively, under different spatial model specifications. Child fever was more concentrated in provinces around the northeastern parts of Mozambique and less in the southern provinces (e.g., Maputo Cidade and Maputo Province). On the other hand, child diarrhoea was higher in most northern provinces, Zambezia, and Niassa provinces, and much lower in the southern parts.

Figure 5

Maps of the observed (UW: unweighted and HT: Horvitz–Thompson) and spatial estimated prevalences of fever (UB: Unadjusted Binomial estimator (Model 1), LN: Logit-normal estimator (Model 2), AN: Arcsine-square root transformation estimator (Model 3), PL: Pseudo-likelihood estimator (Model 4), ES: Effective Sample size estimator (Model 5), and ES-deff: Effective Sample size estimator using design effect (Model 6)) by the province in Mozambique using IMASIDA 2015.

Figure 6

Maps of the observed (UW: unweighted and HT: Horvitz–Thompson) and spatial estimated prevalences of diarrhoea (UB: Unadjusted Binomial estimator (Model 1), LN: Logit-normal estimator (Model 2), AN: Arcsine-square root transformation estimator (Model 3), PL: Pseudo-likelihood estimator (Model 4), and ES: Effective Sample size estimator (Model 5)) by the province in Mozambique using IMASIDA 2015.

4. Discussion and Conclusions

In this paper, we compared several statistical methods and their resulting estimates of spatial distributions of child malnutrition, fever and diarrhoea in Malawi and Mozambique using health survey data, accounting for health survey sampling weights. The results of the study showed that the sampling weight-adjusted methods were the best fitting, a finding similar to previous studies [4,5,6,7]. Even though the estimated spatial pattern was similar, the models that adjusted for the sampling weight produced estimates that had lower variability (narrowed confidence intervals). Thus, using accounting for sampling weights produced estimates of disease risk that had an increased level of confidence. There are some concerning issues arising from our study. Firstly, although the arcsine transformation of the weighted prevalence was preferred for our application, it does not have an intuitive interpretation of the association between the binary outcomes and predictors. Secondly, for the Mozambique case, the study used a province which has a much coarser level of geographical aggregation. This may have concealed variations at some higher spatial resolution needed for local policy decisions. We suggest, for future studies, performing spatial analyses at higher spatial resolutions, for example, at the district level. Thirdly, our analyses used univariate spatial methods for the conditions that could be correlated at ecological levels [30,31]. We are now extending these statistical methods to the estimation of joint spatial patterns of diseases. Finally, we did not perform any simulation study to compare the performance of the studied statistical methods for the estimation of spatial disease patterns using complex health survey data. However, we thought that this was not necessary for this study as we aimed to describe the sampling weights adjusting methods and illustrate their use on typical examples. Other previous research work considered their performances using simulations [4,5,6,7]. In conclusion, we recommend spatial epidemiology researchers consider incorporating survey sampling weights in disease-mapping analyses for estimating the spatial distribution of disease risks based on complex health survey data. The estimates are more precise, thus providing reliable supporting evidence to drive public health policy on targeting resources in areas of most need.

9 in total

1. A comparison of spatial smoothing methods for small area estimation with sampling weights.

Authors: Laina Mercer; Jon Wakefield; Cici Chen; Thomas Lumley
Journal: Spat Stat Date: 2014-05-01

2. The use of sampling weights in Bayesian hierarchical models for small area estimation.

Authors: Cici Chen; Jon Wakefield; Thomas Lumely
Journal: Spat Spatiotemporal Epidemiol Date: 2014-08-05

3. Spatial small area smoothing models for handling survey data with nonresponse.

Authors: K Watjou; C Faes; A Lawson; R S Kirby; M Aregay; R Carroll; Y Vandendijck
Journal: Stat Med Date: 2017-07-02 Impact factor: 2.373

4. Model-based inference for small area estimation with sampling weights.

Authors: Y Vandendijck; C Faes; R S Kirby; A Lawson; N Hens
Journal: Spat Stat Date: 2016-10-14

5. Geostatistical Methods for Disease Mapping and Visualisation Using Data from Spatio-temporally Referenced Prevalence Surveys.

Authors: Emanuele Giorgi; Peter J Diggle; Robert W Snow; Abdisalan M Noor
Journal: Int Stat Rev Date: 2018-04-25 Impact factor: 2.217

6. Spatial modeling of HIV and HSV-2 among women in Kenya with spatially varying coefficients.

Authors: Elphas Okango; Henry Mwambi; Oscar Ngesa
Journal: BMC Public Health Date: 2016-04-22 Impact factor: 3.295

7. Assessing comorbidity and correlates of wasting and stunting among children in Somalia using cross-sectional household surveys: 2007 to 2010.

Authors: Damaris K Kinyoki; Ngianga-Bakwin Kandala; Samuel O Manda; Elias T Krainski; Geir-Arne Fuglstad; Grainne M Moloney; James A Berkley; Abdisalan M Noor
Journal: BMJ Open Date: 2016-03-09 Impact factor: 2.692

Review 8. A Scoping Review of Spatial Analysis Approaches Using Health Survey Data in Sub-Saharan Africa.

Authors: Samuel Manda; Ndamonaonghenda Haushona; Robert Bergquist
Journal: Int J Environ Res Public Health Date: 2020-04-28 Impact factor: 3.390

9. Spatial Co-Morbidity of Childhood Acute Respiratory Infection, Diarrhoea and Stunting in Nigeria.

Authors: Olamide Seyi Orunmoluyi; Ezra Gayawan; Samuel Manda
Journal: Int J Environ Res Public Health Date: 2022-02-06 Impact factor: 3.390

9 in total