| Literature DB >> 32139952 |
Jeremiah J Nieves1,2, Alessandro Sorichetta1,2, Catherine Linard1,3, Maksym Bondarenko1,2, Jessica E Steele1,2, Forrest R Stevens1,4, Andrea E Gaughan1,4, Alessandra Carioli1,2, Donna J Clarke1,2, Thomas Esch5, Andrew J Tatem1,2.
Abstract
Mapping urban features/human built-settlement extents at the annual time step has a wide variety of applications in demography, public health, sustainable development, and many other fields. Recently, while more multitemporal urban features/human built-settlement datasets have become available, issues still exist in remotely-sensed imagery due to spatial and temporal coverage, adverse atmospheric conditions, and expenses involved in producing such datasets. Remotely-sensed annual time-series of urban/built-settlement extents therefore do not yet exist and cover more than specific local areas or city-based regions. Moreover, while a few high-resolution global datasets of urban/built-settlement extents exist for key years, the observed date often deviates many years from the assigned one. These challenges make it difficult to increase temporal coverage while maintaining high fidelity in the spatial resolution. Here we describe an interpolative and flexible modelling framework for producing annual built-settlement extents. We use a combined technique of random forest and spatio-temporal dasymetric modelling with open source subnational data to produce annual 100 m × 100 m resolution binary built-settlement datasets in four test countries located in varying environmental and developmental contexts for test periods of five-year gaps. We find that in the majority of years, across all study areas, the model correctly identified between 85 and 99% of pixels that transition to built-settlement. Additionally, with few exceptions, the model substantially out performed a model that gave every pixel equal chance of transitioning to built-settlement in each year. This modelling framework shows strong promise for filling gaps in cross-sectional urban features/built-settlement datasets derived from remotely-sensed imagery, provides a base upon which to create urban future/built-settlement extent projections, and enables further exploration of the relationships between urban/built-settlement area and population dynamics.Entities:
Keywords: Built-settlements; Dasymetric modelling; Population; Random forest; Spatial growth; Urban features
Year: 2020 PMID: 32139952 PMCID: PMC7043396 DOI: 10.1016/j.compenvurbsys.2019.101444
Source DB: PubMed Journal: Comput Environ Urban Syst ISSN: 0198-9715
Fig. 1Generalized concept of “urban” (Part A), the conceptual relations and definition of “built-settlement” (Part B) as related to urban, and the broad, non-exhaustive contributing factors that make these concepts.
Summary of built-settlement transition data by country and period. Areal units here are pixels (~100 m) as that is the unit handled by the model which looks at relative areal changes as opposed to absolute areal changes.
| Country | Average Spatial Resolution | Period | Initial Non-Built Area (pixels) | Period Transition Prevalence |
|---|---|---|---|---|
| Panama | 10.9 km | 2000–2005 | 8,901,004 | 0.03% |
| 2005–2010 | 8,898,679 | 0.09% | ||
| 2010–2015 | 8,890,339 | 0.75% | ||
| Switzerland | 3.9 km | 2000–2005 | 6,816,510 | 1.56% |
| 2005–2010 | 6,710,069 | 0.08% | ||
| 2010–2015 | 6,704,973 | 0.01% | ||
| Uganda | 12.2 km | 2000–2005 | 28,231,555 | 0.07% |
| 2005–2010 | 28,210,425 | 0.04% | ||
| 2010–2015 | 28,200,084 | 0.04% | ||
| Vietnam | 21.7 km | 2000–2005 | 40,108,425 | 0.11% |
| 2005–2010 | 40,063,545 | 0.18% | ||
| 2010–2015 | 39,990,858 | 0.38% |
Average spatial resolution is the square root of the average subnational area, in km, and can be thought of as analogous to pixel resolution with smaller values indicating finer areal data and vice versa (Tobler et al., 1997).
Data used for estimating the annual number of non-BS to BS transitions at the unit level (i.e. demand quantification), predicting the pixel level probability surface of those transitions, and performing the spatial allocation procedures of the model.
| Covariate | Variable Name (s) in Random Forest | Description | Use | Time Point (s) | Original Spatial Resolution (s) | Data Source (s) |
|---|---|---|---|---|---|---|
| Built-settlement | esa_cls190 | Binary BS extents | Demand Quantification and Spatial Allocation | 2000 | 10 arc sec | ( |
| DTE Built-settlement | esa_cls190_dst_ | Distance to the nearest BS edge | Spatial Allocation | 2000 | 10 arc sec | ( |
| Proportion Built-settlement 1,5,10,15 | esa_cls190_prp_< | Proportion of pixels that are BS within 1,5,10, or 15 pixel radius | Spatial Allocation | 2000 | 10 arc sec | ( |
| Elevation | Topo | Elevation of terrain | Spatial Allocation | 2000 – Time Invariant | 3 arc sec | ( |
| Slope | Slope | Slope of terrain | Spatial Allocation | 2000 – Time Invariant | 3 arc sec | ( |
| DTE Protected Areas Category 1 | wdpa_cat1_dst_2015 | Distance to the nearest level 1 protected area edge | Spatial Allocation | 2015 | Vector | ( |
| Water | – | Areas of water to restrict areas of model prediction | Restrictive Mask | 5 arc sec | ( | |
| Subnational Population | – | Annual population by sub-national units | Demand Quantification | 2000–2020, annually | Vector | ( |
| Weighted Lights-at-Night (LAN) | – | Annual lagged and sub-national unit normalized LAN | Spatial Allocation | 2000–2016, annually | 30 arc sec (2000−2011) | DMSP ( |
| Travel Time 50 k | tt50k | Travel time to the nearest city centre containing at least 50,000 people | Spatial Allocation | 2000 | 30 arc sec | ( |
| Urban Accessibility 2015 | urbanaccessibility_2015 | Travel time to the nearest city edge | Spatial Allocation | 2015 | 30 arc sec | ( |
| ESA CCI Land Cover (LC) Class | ccilc_dst< | Distance to nearest edge of individual land cover classes | Spatial Allocation | 2000 | 10 arc sec | ( |
| Distance to OpenStreet Map (OSM) Rivers | osmriv_dst | Distance to nearest OSM river feature | Spatial Allocation | 2017 | Vector | (OpenStreetMap |
| Distance to OpenStreet Map (OSM) Roads | osmroa_dst | Distance to nearest OSM road feature | Spatial Allocation | 2017 | Vector | (OpenStreetMap |
| Average Precipitation | wclin_prec | Mean Precipitation | Spatial Allocation | 1950–2000 | 30 arc sec | ( |
| Average Temperature | wclim_temp | Mean temperature | Spatial Allocation | 1950–2000 | 30 arc sec | ( |
Some classes were collapsed: 10–30 → 11; 40–120 → 40; 150–153 → 150; 160–180 → 160 (Sorichetta et al., 2015).
Covariates involved in Demand Quantification were used to determine the demand for non-BS to BS transitions at the subnational unit level for every given year. Covariates involved in Spatial Allocation were either used as predictive covariates in the random forest calculated probabilities of transition (see d) or as a post-random forest year specific weight on those probabilities and the spatial allocation of transitions within each given unit area. Covariates used as restrictive masks prevented transitions from being allocated to these areas.
In the dasymetric modelling process, the 2000, 2005, 2010, and 2015 binary BS data were utilized as observed points, but only derived covariates for 2000 were utilized in the random forest as predictive covariates.
Used as predictive covariates in the random forest calculated probabilities of transition.
Fig. 2High-level example overview of the BSGM modelling framework process for interpolation using four RS-based observed years (2000, 2005, 2010, 2015) and predicting for all unobserved years in between. Note, example maps and numbers are not to scale.
Classification agreement metrics. The F1-score is interpreted as the harmonic mean of precision and recall. TP is “True Positive”, FP is “False Positive”, FN is “False Negative”, and TN is “True Negative.”
| Metric | Equation | Range and Interpretation |
|---|---|---|
| Recall (Sensitivity) ( | 0 (no recall) – 1 (perfect recall) | |
| Specificity ( | 0 (no specificity) – | |
| Quantity Disagreement (R.G. | 0 (no disagreement) – | |
| Allocation Disagreement (R.G. | 0 (no disagreement) – | |
| F1 score | 0 (worst) – 1 (best) |
Fig. 3Receiver Operator Curve (left plots) and Precision Recall Curves (right plots) with the RF model performance, blue lines, against a random model (red lines), and a perfect model (green lines), for each modelled country. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 4Random forest covariate importance as measured by the average log decrease in the Gini impurity when the covariate is used as the splitting criteria at nodes, for Swizerland (CHE) ESA, Panama (PAN) ESA, Uganda (UGA) ESA, and Vietnam (VNM). Higher values indicate better predictive performance of covariate. Refer to Table 2 for covariate names.
Proportion of transition pixels predicted correctly by the BSGM modelling framework by year for Switzerland (CHE, Panama (PAN), Uganda (UGA), and Vietnam (VNM). Modelled extents with proportions greater than or equal to 0.80 are highlighted in green.
Fig. 5Pixel-level quantity and allocation disagreement of BSGM and naive models for Switzerland (CHE), Panama (PAN), Uganda (UGA), and Vietnam (VNM) as compared to a naive model, given in yellow and red. Full annual contingency data and metrics in supplemental materials. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 6Contour density plot of unit-level F1 scores by country across all predicted years. Created using a two-dimensional kernel density estimation (Venables & Ripley, 2002).
Fig. 7Selected BSGM-based BS extent and ESA RS-based BS-extent used for validation across the four countries for the approximate mid-point years of each period – 2003, 2008, 2013. “ESA Only” represents BS pixels in the validation dataset not classified as BS pixels in the corresponding BSGM-based BS extent. “BSGMi Only” represents BS pixels in the BSGM-based BS extent not classified as BS pixels in the validation dataset.