| Literature DB >> 25410053 |
Jannah Baker1, Nicole White, Kerrie Mengersen.
Abstract
BACKGROUND: Spatial analysis is increasingly important for identifying modifiable geographic risk factors for disease. However, spatial health data from surveys are often incomplete, ranging from missing data for only a few variables, to missing data for many variables. For spatial analyses of health outcomes, selection of an appropriate imputation method is critical in order to produce the most accurate inferences.Entities:
Mesh:
Year: 2014 PMID: 25410053 PMCID: PMC4287494 DOI: 10.1186/1476-072X-13-47
Source DB: PubMed Journal: Int J Health Geogr ISSN: 1476-072X Impact factor: 3.918
Prior distributions used for parameters in Sensitivity analysis
| Parameter | Model 1 | Parameter | Model 2 | Parameter | Model 3 | Parameter | Model 4 | Parameter | Model 5 |
|---|---|---|---|---|---|---|---|---|---|
| α | N(0,0.01) | α | N(0,0.01) | α | N(0,0.01) | α | N(0,0.01) | α | N(0,0.01) |
| βj;j = 1,…,7 | CAR(1/Ƭβj,R) | βj;j = 1,…,7 | N(0,1/ Ƭβj) | βj;j = 1,…,7 | N(0,σ2 βj) | βj;j = 1,…,7 | N(0,1/ Ƭβj) | βj;j = 1,…,7 | N(0,1/ Ƭβj) |
| Ui;i = 1,…,N | N(α,1/ƬU) | Ui;i = 1,…,N | N(α,1/ ƬU) | Ui;i = 1,…,N | N(α,σ2 U) | Ui;i = 1,…,N | N(α,1/ ƬU) | Ui;i = 1,…,N | N(α,1/ ƬU) |
| Si;i = 1,…,N | CAR(1/ƬS,R) | Si;i = 1,…,N | CAR(1/ƬS,R) | Si;i = 1,…,N | CAR((σ2 S,R) | Si;i = 1,…,N | CAR(1/ƬS,R) | Si;i = 1,…,N | CAR(1/ƬS,R) |
| Ƭβj | Ga(1,0.01) | Ƭβj | Ga(1,0.01) | σβj | U(0.01,5) | Ƭβj | Ga(1,0.01) | Ƭβj | Ga(1,0.01) |
| ƬU | Ga(1,0.01) | ƬU | Ga(1,0.01) | σU | U(0.01,5) | σU | N(0,0.0625)I(0,) | log(σU) | N(0,4) |
| ƬS | Ga(1,0.01) | ƬS | Ga(1,0.01) | σS | U(0.01,5) | ƬS | Ga(1,0.01) | ƬS | Ga(1,0.01) |
α = intercept, j = covariates 1 to 7, βj = vector of coefficients for covariates 1 to 7, i = Local Government Areas (LGAs) 1 to 71, Ui = uncorrelated residual error for LGAs 1 to 71, Si = correlated residual error for LGAs 1 to 71, Ƭβj = vector of precisions for covariate coefficients, ƬU = vector of precisions for uncorrelated residual error, ƬS = vector of precisions for correlated residual error, σβj = vector of standard deviations for covariate coefficients, σU = vector of standard deviations for uncorrelated residual error, σS = vector of standard deviations for correlated residual error, Ga = Gamma distribution, U = Uniform distribution, CAR = CAR normal prior centred around zero, denoted CAR(variance, adjacency neighbourhood weight matrix), R = adjacency neighbourhood weight matrix with diagonal entries equal to number of neighbours; ie. R = m .
Comparison of imputation methods by root mean squared error (RMSE) and bias from cross-validation
| Covariate | N missing | RMSE, mean (sd) | Average bias | Average width of CI | % of CIs including zero bias | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Mean imputation | MVN | CAR prior | ||||||||||
| Poisson | Binomial | Poisson | Binomial | MVN | CAR prior | MVN | CAR prior | MVN | CAR prior | |||
| % over 45yrs of age | 3 | 49.7 | 108.7 (21.2) | 109.6 (19.5) | 46.3 (17.8) | 46.2 (17.9) | 0.012 | 0.041 | 0.897 | 0.345 | 100% | 100% |
| % Overweight/obese | 28 | 26.3 | 73.7 (22.3) | 73.1 (20.6) | 45.4 (18.5) | 45.4 (18.4) | 0.016 | 0.081 | 0.413 | 0.200 | 100% | 75% |
| % Daily smokers | 28 | 25.8 | 53.2 (18.1) | 54.4 (19.3) | 49.8 (21.1) | 49.8 (21.0) | 0.153 | 0.271 | 1.144 | 0.640 | 100% | 68% |
| % Insufficient physical activity | 28 | 36.7 | 90.7 (37.5) | 91.4 (37.1) | 67.0 (41.3) | 67.1 (41.3) | 0.048 | 0.047 | 0.535 | 0.246 | 100% | 93% |
| % Adequate fruit intake | 28 | 34.4 | 67.4 (14.3) | 68.6 (14.7) | 37.5 (22.7) | 37.6 (22.7) | 0.069 | 0.052 | 0.382 | 0.221 | 100% | 91% |
| % Adequate vegetable intake | 32 | 21.9 | 39.1 (18.5) | 39.2 (18.4) | 30.4 (19.6) | 30.6 (19.9) | 0.157 | 0.185 | 1.144 | 0.973 | 100% | 94% |
| Overall | - | 32.5 | 71.1 (11.4) | 72.7 (11.1) | 46.1 (11.7) | 46.1 (11.7) | 0.076 | 0.113 | 0.752 | 0.438 | 100% | 87% |
RMSE = root mean squared error, sd = standard deviation, MVN = Multivariate normal imputation, CAR prior = conditional autoregressive prior imputation; CI = 95% credible interval.
Figure 1Bias for estimated % over 45 years for Local Government Areas (LGAs) with missing data, by 1. Multivariate normal imputation, and 2. Conditional autoregressive ( CAR) priors for covariates; e.g. LGA 23–1 indicates multivariate normal imputation for LGA number 23 and LGA 23–2 indicates imputation with CAR priors for covariates for LGA number 23. LGA = Local Government Area).
Estimates for selected parameters from models included in sensitivity analysis: mean (95% credible intervals)
| Binomial | α | β 1 | β 2 | β 3 | β 4 | β 5 | β 6 | β 7 | σ S 2 | σ U 2 | DIC |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | -2.158 | -0.194 | 0.009 | -0.004 | 0.008 | -0.008 | -0.013 | -0.005 | 0.013 | 0.073 | 667 |
| (-2.368,-1.963) | (-0.240,-0.143) | (-0.001,0.020) | (-0.019,0.012) | (-0.010,0.027) | (-0.022,0.006) | (-0.036,0.011) | (-0.033,0.022) | ||||
| 2 | -2.147 | -0.197 | 0.009 | -0.005 | 0.008 | -0.007 | -0.015 | -0.004 | 0.013 | 0.074 | 667 |
| (-2.415,-1.911) | (-0.253,-0.129) | (-0..002,0.021) | (-0.024, 0.013) | (-0.013, 0.027) | (-0.023, 0.008) | (-0.040,0.008) | (-0.031,0.025) | ||||
| 3 | -2.158 | -0.194 | 0.01 | -0.005 | 0.008 | -0.007 | -0.014 | -0.005 | 0.012 | 0.079 | 666 |
| (-2.374,-1.939) | (-0.248,-0.1416) | (-0.002,0.021) | (-0.022,0.011) | (-0.010,0.026) | (-0.022,0.006) | (-0.038,0.007) | (-0.032,0.022) | ||||
| 4 | -2.155 | -0.194 | 0.009 | -0.003 | 0.008 | -0.007 | -0.014 | -0.004 | 0.012 | 0.076 | 668 |
| (-2.384,-1.951) | (-0.242,-0.138) | (-0.002,0.0196) | (-0.020,0.016) | (-0.010,0.026) | (-0.022,0.008) | (-0.038,0.009) | (-0.030,0.023) | ||||
| 5 | -2.203 | -0.183 | 0.008 | -0.004 | 0.008 | -0.006 | -0.013 | -0.003 | 0.012 | 0.080 | 666 |
| (-2.451,-1.953) | (-0.242,-0.122) | (-0.004,0.020) | (-0.026,0.015) | (-0.013,0.029) | (-0.024,0.011) | (-0.037,0.009) | (-0.032,0.027) | ||||
|
| |||||||||||
| 1 | 0.641 | -0.181 | 0.009 | -0.005 | 0.007 | -0.006 | -0.014 | -0.005 | 0.012 | 0.062 | 671 |
| (0.440,0.854) | (-0.232,-0.134) | (-0.001,0.020) | (-0.022,0.011) | (-0.009,0.024) | (-0.020,0.008) | (-0.035,0.008) | (-0.030,0.022) | ||||
| 2 | 0.615 | -0.174 | 0.008 | -0.004 | 0.008 | -0.005 | -0.011 | -0.004 | 0.012 | 0.061 | 671 |
| (0.434,0.816) | (-0.223,-0.133) | (-0.002,0.018) | (-0.020,0.012) | (-0.008,0.026) | (-0.018,0.007) | (-0.031,0.009) | (-0.028,0.022) | ||||
| 3 | 0.649 | -0.183 | 0.009 | -0.004 | 0.008 | -0.006 | -0.013 | -0.003 | 0.012 | 0.067 | 670 |
| (0.413,0.864) | (-0.236,-0.125) | (-0.002,0.020) | (-0.025,0.014) | (-0.010,0.025) | (-0.021,0.008) | (-0.036,0.011) | (-0.029,0.025) | ||||
| 4 | 0.651 | -0.184 | 0.009 | -0.004 | 0.007 | -0.007 | -0.013 | -0.005 | 0.012 | 0.065 | 672 |
| (0.422,0.883) | (-0.240,-0.129) | (-0.002,0.020) | (-0.023,0.014) | (-0.012,0.025) | (-0.021,0.007) | (-0.036,0.010) | (-0.031,0.022) | ||||
| 5 | 0.646 | -0.182 | 0.009 | -0.003 | 0.008 | -0.007 | -0.012 | -0.006 | 0.011 | 0.066 | 670 |
| (0.441,0.888) | (-0.244,-0.134) | (-0.002,0.020) | (-0.021,0.016) | (-0.012,0.025) | (-0.022,0.008) | (-0.034,0.009) | (-0.030,0.022) |
α = intercept, β1 = coefficient for socio-economic status, β2 = coefficient for % over 45 years of age, β3 = coefficient for % overweight/obese, β4 = coefficient for % daily smokers, β5 = coefficient for % insufficient physical activity, β6 = coefficient for % adequate fruit intake, β7 = coefficient for % adequate vegetable intake, σS 2 = variance of correlated residual error, σU 2 = variance of uncorrelated residual error, DIC = Deviance Information Criteria.
Prior distributions used in models 1–5 are summarised in Table 1.
Figure 2Estimated Relative Risk (RR) and Relative Excess Risk (RER) of type 2 diabetes for Queensland Local Government Areas. RR = relative risk, sd = standard deviation, RER = relative excess risk).
Top 5 LGAs for Relative Risk (RR), Relative Excess Risk (RER) and uncertainty for Relative Risk and Excess Relative Risk
| Smallest estimated RR | Smallest sd(RR) | Smallest estimated RER | Smallest sd (RER) | ||||
|---|---|---|---|---|---|---|---|
| LGA | Estimated RR | LGA | sd(RR) | LGA | Estimated ERR | LGA | sd(RER) |
| 34 | 0.480 | 8 | 0.003 | 47 | 0.962 | 67 | 0.125 |
| 16 | 0.573 | 28 | 0.005 | 32 | 1.021 | 10 | 0.150 |
| 8 | 0.580 | 44 | 0.006 | 67 | 1.255 | 30 | 0.152 |
| 24 | 0.611 | 60 | 0.006 | 34 | 1.442 | 32 | 0.161 |
| 28 | 0.624 | 38 | 0.007 | 24 | 1.452 | 57 | 0.168 |
| Largest estimated RR | Largest sd(RR) | Largest estimated RER | Largest sd(RER) | ||||
| LGA | LGA | sd(RR) | LGA | Estimated RER | LGA | sd(RER) | |
| 63 | 1.857 | 12 | 0.269 | 51 | 2.535 | 68 | 0.566 |
| 35 | 1.933 | 41 | 0.445 | 35 | 2.627 | 70 | 0.575 |
| 51 | 1.966 | 65 | 0.452 | 68 | 2.645 | 41 | 0.580 |
| 12 | 2.450 | 70 | 0.465 | 18 | 3.738 | 23 | 0.587 |
| 18 | 3.073 | 23 | 0.474 | 12 | 4.442 | 12 | 0.647 |
RR = Relative Risk, RER = Relative Excess Risk, sd = standard deviation.
Figure 3Ranked Relative Risk (A) and Relative Excess Risk (B) for Local Government Areas with 95% credible intervals. RR = Relative Risk, RER = Relative Excess Risk).