| Literature DB >> 29237823 |
Jeremiah J Nieves1, Forrest R Stevens2, Andrea E Gaughan2, Catherine Linard3,4, Alessandro Sorichetta5,6, Graeme Hornby7, Nirav N Patel8, Andrew J Tatem5,6.
Abstract
Geographical factors have influenced the distributions and densities of global human population distributions for centuries. Climatic regimes have made some regions more habitable than others, harsh topography has discouraged human settlement, and transport links have encouraged population growth. A better understanding of these types of relationships enables both improved mapping of population distributions today and modelling of future scenarios. However, few comprehensive studies of the relationships between population spatial distributions and the range of drivers and correlates that exist have been undertaken at all, much less at high spatial resolutions, and particularly across the low- and middle-income countries. Here, we quantify the relative importance of multiple types of drivers and covariates in explaining observed population densities across 32 low- and middle-income countries over four continents using machine-learning approaches. We find that, while relationships between population densities and geographical factors show some variation between regions, they are generally remarkably consistent, pointing to universal drivers of human population distribution. Here, we find that a set of geographical features relating to the built environment, ecology and topography consistently explain the majority of variability in population distributions at fine spatial scales across the low- and middle-income regions of the world.Entities:
Keywords: census; dasymetric; disaggregation; mapping; population; random forests
Mesh:
Year: 2017 PMID: 29237823 PMCID: PMC5746564 DOI: 10.1098/rsif.2017.0401
Source DB: PubMed Journal: J R Soc Interface ISSN: 1742-5662 Impact factor: 4.118
Figure 1.General process of using a random forest to created gridded population maps following Stevens et al. [26], where ‘out-of-bag’ (OOB) data are the approximately one-third of the data not sampled for training any single tree.
Figure 2.Countries for which boundary-matched census data were used in this study, from Africa, Central America and the Caribbean, South America and Southeast Asia.
Sampled countries and selected characteristics including the variance explained by the country-specific random forest model. admin., administrative; avg., average.
| country | ISO | region | census year (admin. level) | admin. units | avg. spatial resolution (km2) | people per unit (thousands) | variance explained |
|---|---|---|---|---|---|---|---|
| Kenya | KEN | Africa | 1999 (5) | 6606 | 9 | 4.3 | 83% |
| Morocco | MAR | Africa | 2004 (4) | 1497 | 16 | 21 | 80% |
| Mali | MLI | Africa | 2009 (4) | 687 | 43 | 22 | 85% |
| Malawi | MWI | Africa | 2008 (2) | 12 557 | 22 | 59 | 79% |
| Namibia | NAM | Africa | 2011 (2) | 5475 | 12.28 | 21 | 96% |
| Nigeria | NGA | Africa | 2006 (2) | 774 | 34 | 205 | 88% |
| Rwanda | RWA | Africa | 2002 (4) | 9183 | 1.68 | 1.2 | 69% |
| Senegal | SEN | Africa | 2009 (4) | 331 | 24 | 37 | 91% |
| Uganda | UGA | Africa | 2002 (4) | 5018 | 7 | 6 | 85% |
| Bolivia | BOL | C. America and Caribbean | 2012 (2) | 112 | 97.7 | 91 | 65% |
| Costa Rica | CRI | C. America and Caribbean | 2011 (3) | 469 | 10.4 | 9.8 | 92% |
| Cuba | CUB | C. America and Caribbean | 2012 (2) | 168 | 25.6 | 68 | 82% |
| Dominican Republic | DOM | C. America and Caribbean | 2010 (3) | 155 | 17.6 | 64 | 86% |
| Guatemala | GTM | C. America and Caribbean | 2012 (2) | 333 | 18.0 | 46 | 80% |
| Haiti | HTI | C. America and Caribbean | 2009 (4) | 570 | 6.9 | 17 | 84% |
| Mexico | MEX | C. America and Caribbean | 2010 (2) | 2456 | 28.0 | 48 | 92% |
| Nicaragua | NIC | C. America and Caribbean | 2012 (3) | 137 | 29.4 | 43 | 79% |
| Panama | PAN | C. America and Caribbean | 2010 (2) | 74 | 31.04 | 49 | 74% |
| Puerto Rico | PRI | C. America and Caribbean | 2010 (1) | 78 | 13.3 | 48 | 74% |
| Argentina | ARG | S. America | 2010 (2) | 526 | 73.0 | 78 | 88% |
| Brazil | BRA | S. America | 2010 (4) | 5565 | 5.1 | 36 | 84% |
| Colombia | COL | S. America | 2013 (4) | 1115 | 32.0 | 42 | 84% |
| Ecuador | ECU | S. America | 2010 (4) | 978 | 16.2 | 15 | 82% |
| Peru | PER | S. America | 2012 (2) | 194 | 81.7 | 155 | 63% |
| Venezuela | VEN | S. America | 2011 (2) | 339 | 51.6 | 87 | 71% |
| Cambodia | KHM | S.E. Asia | 2008 (3) | 1621 | 10.51 | 8.6 | 92% |
| China | CHN | S.E. Asia | 2010 (4) | 2922 | 57.28 | 458 | 95% |
| Indonesia | IND | S.E. Asia | 2010 (4) | 79 277 | 4.91 | 3.0 | 81% |
| Myanmar | MMR | S.E. Asia | 2014 (3) | 326 | 45.29 | 164 | 94% |
| Nepal | NEP | S.E. Asia | 2011 (4) | 3973 | 6.08 | 6.8 | 92% |
| Thailand | THA | S.E. Asia | 2010 (3) | 7416 | 23.67 | 9.0 | 88% |
| Vietnam | VNM | S.E. Asia | 2010 (3) | 688 | 21.85 | 123 | 93% |
Reclassification scheme to standardize covariates into variable classes representing spatial drivers and determinants of population. LC, thematically classified land cover; LU, classified land use; nat., natural; OSM, Open Street Map; semi.-nat., semi-natural; veg., vegetation. Note: The references are not exhaustive, but are characteristic of most models. Any of these covariates could be replaced by a country-specific dataset sourced from a one-off source or country partner. Refer to country-specific metadata files provided with the source download from www.worldpop.org.
| aggregated variable class | drivers, correlates and covariates |
|---|---|
| natural/semi-natural vegetation land cover | LC nat. and semi-nat. veg.—woody [ |
| LC nat. and semi-nat. veg.—shrubs [ | |
| LC nat. and semi-nat. veg.—herbaceous [ | |
| LC nat. and semi-nat. veg.—other mix [ | |
| LC nat. and semi-nat. veg.—aquatic veg. [ | |
| cultivated/managed land cover | LC cultivated terrestrial and managed lands [ |
| natural bare surfaces land cover | LC natural bare surface [ |
| artificial surface land cover | LC urban areas [ |
| LC rural settlement [ | |
| no data | LC no data [ |
| residential land use | LU residential [ |
| non-residential land use | LU industrial [ |
| LU farms [ | |
| protected land use | e.g. protected natural areas [ |
| general classified land use | e.g. multiple classified land uses provided to model as a single covariate [ |
| urban/suburban extents | global human settlement layer [ |
| Schneider MODIS [ | |
| built environment and urban/suburban proxies | LC urban areas+LC rural settlement [ |
| lights at night imagery [ | |
| building footprints [ | |
| classified populated place (hierarchical) | e.g. city, town, village, etc. [ |
| transportation networks | roads [ |
| railways [ | |
| climatic/environmental | elevation and slope [ |
| net primary productivity [ | |
| temperature [ | |
| precipitation [ | |
| facilities and services | schools [ |
| police [ | |
| nutrition [ | |
| health facilities [ | |
| places and POIs | OSM places [ |
| OSM POIs [ | |
| rivers/waterbodies/waterways | LC water [ |
| rivers [ | |
| waterbodies/waterways [ | |
| populated place | e.g. gazetteer-type data [ |
Figure 3.Global variable class weighted rank of importance based upon covariates included in a given country's final model, where zero represents the highest rank. The mean is represented by a white diamond; the median is represented by the black bar; and the whiskers represent the maximum and minimum values within 1.5× the inter-quartile range. See table 2 for descriptions and references for the variable classes. LC, land cover; LU, land use; WDPA, World Database on Protected Areas.
Selected results of the pairwise post hoc Dunn test with Holm's correction for multiple outcomes of global WIR of covariate classes. See table 2 for descriptions and references for the variable classes. LC, land cover; LU, land use. See the electronic supplementary material for results across all classes. Global Kruskal–Wallis results: d.f. = 15, chi-squared = 96.147, p < 0.01. Full precision of the values is provided in the electronic supplementary material.
| variable class | corrected | ||||
|---|---|---|---|---|---|
| built env. and urban/suburb. proxies | climatic/environmental | populated place | transportation networks | urban/suburb. extents | |
| class of pop. place | 5.04 (<0.01) | 5.53 (<0.01) | 2.41 (1.00) | 2.41 (1.00) | 3.43 (0.06) |
| climatic/environmental | 0.30 (1.00) | — | 1.49 (1.00) | 3.20 (0.14) | 0.72 (1.00) |
| facilities and services | 2.06 (1.00) | 2.36 (1.00) | 0.48 (1.00) | 0.16 (1.00) | 1.27 (1.00) |
| cultivated/managed LC | 3.43 (0.37) | 3.20 (0.14) | 1.18 (1.00) | 0.74 (1.00) | 1.98 (1.00) |
| natural/semi-natural vegetation LC | 4.82 (<0.01) | 5.44 (<0.01) | 1.90 (1.00) | 1.76 (1.00) | 2.98 (0.28) |
| nat. bare surfaces LC | 3.19 (0.14) | 3.46 (0.06) | 1.60 (1.00) | 1.27 (1.00) | 2.35 (1.00) |
| general classified LU | 3.58 (0.04) | 3.81 (0.02) | 2.15 (1.00) | 1.93 (1.00) | 2.84 (0.42) |
| non-residential LU | 1.55 (1.00) | 1.71 (1.00) | 0.64 (1.00) | 0.25 (1.00) | 1.16 (1.00) |
| protected LU | 5.52 (<0.01) | 5.91 (<0.01) | 3.19 (0.14) | 3.31 (0.10) | 4.13 (<0.01) |
| residential LU | 3.37 (0.08) | 3.56 (0.04) | 2.16 (1.00) | 1.93 (1.00) | 2.77 (0.52) |
| places and POIs | 2.08 (1.00) | 2.38 (1.00) | 0.51 (1.00) | 0.11 (1.00) | 1.29 (1.00) |
| populated place | 1.26 (1.00) | 1.49 (1.00) | — | 0.68 (1.00) | 0.69 (1.00) |
| rivers/waterbodies/waterways | 4.80 (<0.01) | 5.28 (<0.01) | 2.27 (1.00) | 2.20 (1.00) | 3.27 (0.11) |
| transportation networks | 2.76 (0.52) | 3.20 (0.14) | 0.68 (1.00) | — | 1.61 (1.00) |
| urban/suburban extents | 0.48 (1.00) | 0.72 (1.00) | 0.69 (1.00) | 1.61 (1.00) | — |
Results of the pairwise Dunn test with Holm's correction for differences in WIR of variable class by region within the rivers/waterbodies/waterways class. Corrected Z-score and corrected p-value, in parentheses, are given. Full results for all variable classes between regions, including non-significant findings, are provided in the electronic supplementary material. Kruskal–Wallis results: d.f. = 3, chi-squared = 20.281, p < 0.01.
| region | Africa | C. America and Caribbean | S. America |
|---|---|---|---|
| C. America and Caribbean | 3.78 (<0.01) | — | — |
| S. America | 1.21 (0.45) | 2.32 (0.08) | — |
| S.E. Asia | 0.77 (0.45) | 4.08 (<0.01) | 1.79 (0.22) |
Figure 4.Regional line and dot plot of variable class WIR with the median marked by the dot and the inter-quartile range demarcated by brackets. Note that not all regions have all variable classes. See table 2 for descriptions and references for the variable classes.
Selected results of the pairwise Dunn test with Holm's correction for differences in WIR by region between variable classes. Corrected Z-scores and corrected p-values, in parentheses, are given. Full results between all variable classes within regions, including non-significant findings, are provided in the electronic supplementary material. Full precisions of values are provided in the electronic supplementary material.
| region | variable class | built env. and urban/suburban proxies | climatic/environmental | urban/suburban extents | transportation networks | populated place |
|---|---|---|---|---|---|---|
| S. America | classified populated place | 4.54 (<0.01) | 5.09 (<0.01) | 2.73 (0.63) | 1.69 (1.00) | 2.76 (0.57) |
| natural/semi-natural vegetation LC | 3.73 (0.10) | 3.73 (0.02) | 1.94 (1.00) | 0.46 (1.00) | 2.06 (1.00) | |
| general classified LU | 3.34 (0.09) | 3.56 (0.04) | 2.48 (1.00) | 1.50 (1.00) | 2.57 (0.95) | |
| protected LU | 4.29 (<0.01) | 4.52 (<0.01) | 3.32 (0.10) | 2.55 (1.00) | 3.36 (0.08) | |
| rivers/waterbodies/waterways | 3.82 (0.01) | 4.14 (<0.01) | 2.65 (0.77) | 1.63 (1.00) | 2.72 (0.63) | |
| C. America and Caribbean | classified populated place | 3.85 (0.01) | 1.76 (1.00) | 2.63 (0.88) | 1.84 (1.00) | 0.39 (1.00) |
| natural/semi-natural vegetation LC | 4.62 (<0.01) | 2.30 (1.00) | 3.03 (0.26) | 2.36 (1.00) | 0.61 (1.00) | |
| protected LU | 3.52 (<0.05) | 1.88 (1.00) | 2.66 (0.81) | 1.95 (1.00) | 0.75 (1.00) | |
| rivers/waterbodies/waterways | 5.66 (<0.01) | 3.66 (0.03) | 4.07 (<0.01) | 3.69 (0.03) | 1.75 (1.00) |
Figure 5.Regional variable class weighted rank of importance based upon covariates included in a given country's final model, where zero represents the highest rank. The mean is represented by a white diamond; the median is represented by the black bar; and the whiskers represent the maximum and minimum values within 1.5× the inter-quartile range. See table 2 for descriptions and references for the variable classes.