| Literature DB >> 29725149 |
Warren C Jochem1,2, Tomas J Bird1,2, Andrew J Tatem1,2.
Abstract
Remote sensing techniques are now commonly applied to map and monitor urban land uses to measure growth and to assist with development and planning. Recent work in this area has highlighted the use of textures and other spatial features that can be measured in very high spatial resolution imagery. Far less attention has been given to using geospatial vector data (i.e. points, lines, polygons) to map land uses. This paper presents an approach to distinguish residential settlement types (regular vs. irregular) using an existing database of settlement points locating structures. Nine data features describing the density, distance, angles, and spacing of the settlement points are calculated at multiple spatial scales. These data are analysed alone and with five common remote sensing measures on elevation, slope, vegetation, and nighttime lights in a supervised machine learning approach to classify land use areas. The method was tested in seven provinces of Afghanistan (Balkh, Helmand, Herat, Kabul, Kandahar, Kunduz, Nangarhar). Overall accuracy ranged from 78% in Kandahar to 90% in Nangarhar. This research demonstrates the potential to accurately map land uses from even the simplest representation of structures.Entities:
Keywords: Big data; Land use; Machine learning; Point pattern analysis; Texture; Urban morphology
Year: 2018 PMID: 29725149 PMCID: PMC5863080 DOI: 10.1016/j.compenvurbsys.2018.01.004
Source DB: PubMed Journal: Comput Environ Urban Syst ISSN: 0198-9715
Fig. 1Seven study provinces within Afghanistan. Training locations are selected from within land use maps of the provincial capitals.
Reclassification of land use subtypes to produce regular and irregular settlement types. Subtype categories come from the State of Afghan Cities report (GoIRA & UN Habitat, 2015) and are used to select locations to train the classification algorithm. Non-residential and uninhabited areas are not included in the classification.
| Regular settlement | Irregular settlement | Non-residential/uninhabited |
|---|---|---|
| Regular | Irregular | Commercial |
| Apartments | Hillside houses | Industrial |
| Mixed use | IDP camps | Institutional |
| Nomadic camps | Transport/roads | |
| Agriculture | ||
| Under construction | ||
| Vacant/barren | ||
| Water/green space |
Summary of all data layers considered in the study. Geometry-derived features (Panel A) are created from the Alcis Settlement Points dataset (www.alcis.org). Abbreviations are used when summarising the model results.
| Data layers | Abbreviation |
|---|---|
| A. Geometry-derived features | |
| Number of points | npts |
| Unconstrained nearest neighbour mean distance | nd_m_f |
| Unconstrained nearest neighbour distance variance | nd_v_f |
| Constrained nearest neighbour mean distance | nd_m_c |
| Constrained nearest neighbour distance variance | nd_v_c |
| Linearity | l |
| Nearest neighbour index | ni |
| Nearest neighbour angle (Shannon's entropy) | na_s |
| Nearest neighbour angle (Metric entropy) | na_m |
| B. Remote sensing features | |
| Elevation | elev |
| Slope | slope |
| Enhanced vegetation index (EVI) | evi |
| Nighttime lights (VIIRS) | viirs |
| Nighttime lights (DMSP) | dmsp |
Fig. 2Geometry-derived features across spatial scales. Average values extracted from 14,000 training point locations for regular (dashed, gold coloured) and irregular (solid, green coloured) settlement areas. Residential types defined by the State of Afghan Cities (GoIRA & UN Habitat, 2015) land use parcels. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 3Remote sensing measures. Boxplots of values extracted at 14,000 training locations for regular (gold coloured) and irregular (green coloured) residential types. Residential types defined by the State of Afghan Cities (GoIRA & UN Habitat, 2015) land use parcels. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 4Variable importance and feature selection among geometry-derived features. The variable importance score observed in the baseline random forest model is compared to results from 500 random permutations. Non-significant variables at p = .01 level are shown in grey. p-Values are approximated by comparing the baseline variable importance to the distribution of permutations. Feature abbreviations are given in Table 2. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Classification accuracy. The confusion matrix compares residential types not used for training with the predicted classification from the final random forest model (which include geometry-derived and remote sensing predictors) at a pixel level. Validation data come from the reclassified State of Afghan Cities report (GoIRA & UN Habitat, 2015) from the provincial capitals in each of the seven provinces.
| A. Settlement type prevalence | B. Predicting regular settlement (pixels) | C. Assessment | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Regular | Irregular | True positive | False positive | True negative | False negative | PPV | NPV | Sensitivity | Specificity | Overall accuracy | |
| Balkh | 60.1% | 39.9% | 38,849 | 3767 | 24,356 | 3600 | 91.2% | 87.1% | 91.5% | 86.6% | 89.6% |
| Helmand | 33.5% | 66.5% | 18,345 | 7007 | 32,673 | 1618 | 72.4% | 95.3% | 91.9% | 82.3% | 85.5% |
| Herat | 38.7% | 61.3% | 22,706 | 7643 | 43,033 | 9225 | 74.8% | 82.3% | 71.1% | 84.9% | 79.6% |
| Kabul | 28.3% | 71.7% | 87,581 | 43,287 | 263,375 | 33,553 | 66.9% | 88.7% | 72.3% | 85.9% | 82.0% |
| Kandahar | 45.7% | 54.3% | 31,810 | 10,322 | 39,812 | 10,304 | 75.5% | 79.4% | 75.5% | 79.4% | 77.6% |
| Kunduz | 14.1% | 85.9% | 4170 | 2795 | 27,053 | 722 | 59.9% | 97.4% | 85.2% | 90.6% | 89.9% |
| Nangarhar | 38.6% | 61.4% | 17,862 | 2211 | 31,062 | 3027 | 89.0% | 91.1% | 85.5% | 93.4% | 90.3% |
PPV (positive predictive value) = true positive / (true positive + false positive).
NPV (negative predictive value) = true negative / (true positive + false negative).
Sensitivity = true positive / (true positive + false negative).
Specificity = true negative / (true negative + false positive).
Fig. 5Example classification results for the city of Kabul using a 7 × 7 smoothing function on the prediction results. Insets show two areas of the city with validation data and settlement points used to calculate the geometry-derived features (Left column) and the prediction map of regular and irregular residential types (Right column). Whitespace in the maps are unsettled or non-residential areas. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)