| Literature DB >> 32929009 |
Douglas R Leasure1, Warren C Jochem2, Eric M Weber3, Vincent Seaman4, Andrew J Tatem2.
Abstract
Population estimates are critical for government services, development projects, and public health campaigns. Such data are typically obtained through a national population and housing census. However, population estimates can quickly become inaccurate in localized areas, particularly where migration or displacement has occurred. Some conflict-affected and resource-poor countries have not conducted a census in over 10 y. We developed a hierarchical Bayesian model to estimate population numbers in small areas based on enumeration data from sample areas and nationwide information about administrative boundaries, building locations, settlement types, and other factors related to population density. We demonstrated this model by estimating population sizes in every 10- m grid cell in Nigeria with national coverage. These gridded population estimates and areal population totals derived from them are accompanied by estimates of uncertainty based on Bayesian posterior probabilities. The model had an overall error rate of 67 people per hectare (mean of absolute residuals) or 43% (using scaled residuals) for predictions in out-of-sample survey areas (approximately 3 ha each), with increased precision expected for aggregated population totals in larger areas. This statistical approach represents a significant step toward estimating populations at high resolution with national coverage in the absence of a complete and recent census, while also providing reliable estimates of uncertainty to support informed decision making.Entities:
Keywords: Bayesian statistics; demography; geographic information systems; international development; remote sensing
Mesh:
Year: 2020 PMID: 32929009 PMCID: PMC7533662 DOI: 10.1073/pnas.1913050117
Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN: 0027-8424 Impact factor: 11.205
Fig. 1.Map of Nigeria showing locations of microcensus surveys as the number of survey locations within each 20-km grid cell. Labeling (R1 to R11) and shading of states indicate regions used for modeling.
Fig. 2.(A–F) Posterior probability distribution for random intercepts and estimates of population density (dashed line) for six microcensus clusters. Gray represents densities for a settlement type nationally (Eq. ). Red represents densities for a settlement type within a region (Eq. ). Green represents densities for a settlement type within a state (Eq. ). Blue represents densities for a settlement type within a local government area (Eq. ). Dots represent observed population densities from microcensus surveys. Settlement types shown include rural (M) and several urban types (A, B, D, and F).
Example state-level population totals
| State | Population | Lower | Upper |
| Abuja | 3,838,085 | 3,311,346 | 4,457,200 |
| Borno | 5,599,020 | 2,687,960 | 10,993,263 |
| Kano | 13,704,940 | 11,836,661 | 16,085,987 |
| Kaduna | 8,623,416 | 7,420,205 | 10,138,022 |
| Lagos | 9,381,532 | 7,221,440 | 13,144,086 |
| Ogun | 9,417,916 | 6,285,736 | 14,275,008 |
| Sokoto | 5,186,534 | 3,325,988 | 7,911,709 |
Population estimates are mean and 95% credible intervals from derived posterior distributions. No microcensus data were available from Borno, Ogun, or Sokoto states.
Estimated covariate effects (untransformed ) on population densities
| Mean | Lower 95% | Lower 80% | Upper 80% | Upper 95% | |
| 0.011 | 0.004 | 0.007 | 0.015 | 0.017 | |
| 0.027 | 0.013 | 0.018 | 0.036 | 0.041 | |
| 0.147 | 0.057 | 0.089 | 0.206 | 0.236 | |
| −0.007 | −0.027 | −0.020 | 0.007 | 0.014 | |
| −0.011 | −0.027 | −0.021 | −0.0003 | 0.005 | |
| −0.006 | −0.016 | −0.013 | 0.001 | 0.005 |
x1 is WorldPop Global population estimates; x2 is school density; x3 is household size; x4 is settled area within 1 km; x5 is residential area within 1 km; and x6 is nonresidential area within 1 km.
Fig. 3.Observed population totals (N) and population densities (D) in surveyed microcensus clusters versus model predictions. Top row shows predictions from the full model, Middle row shows random cross-validation, and Bottom row shows state-by-state cross-validation results. Diagonal lines are 1:1 lines where predictions equal observations.
Analysis of residuals for in-sample posterior predictions and out-of-sample cross-validations (X-val)
| Parameter | Prediction | Bias | Imprecision | Inaccuracy | |
| In sample | 34 (0.06) | 252 (0.50) | 179 (0.38) | 0.38 | |
| X-val random | 36 (0.04) | 284 (0.57) | 199 (0.43) | 0.26 | |
| X-val state | 121 (0.11) | 313 (0.54) | 257 (0.43) | 0.08 | |
| In-sample | 7 (0.06) | 86 (0.50) | 61 (0.38) | 0.57 | |
| X-val random | 8 (0.04) | 96 (0.57) | 67 (0.43) | 0.46 | |
| X-val state | 24 (0.11) | 121 (0.54) | 92 (0.43) | 0.40 |
Residuals (predicted minus observed) were calculated based on the mean of the posterior predicted distribution. Bias is the mean of residuals; imprecision is the SD of residuals; inaccuracy is the mean of absolute residuals; r2 is the squared Pearson correlation coefficient for observed versus predicted values. Values in parentheses are based on scaled residuals (residual/predicted).