| Literature DB >> 30240429 |
Hugh J W Sturrock1, Katelyn Woolheater2, Adam F Bennett1, Ricardo Andrade-Pacheco1, Alemayehu Midekisa1.
Abstract
Having accurate maps depicting the locations of residential buildings across a region benefits a range of sectors. This is particularly true for public health programs focused on delivering services at the household level, such as indoor residual spraying with insecticide to help prevent malaria. While open source data from OpenStreetMap (OSM) depicting the locations and shapes of buildings is rapidly improving in terms of quality and completeness globally, even in settings where all buildings have been mapped, information on whether these buildings are residential, commercial or another type is often only available for a small subset. Using OSM building data from Botswana and Swaziland, we identified buildings for which 'type' was indicated, generated via on the ground observations, and classified these into two classes, "sprayable" and "not-sprayable". Ensemble machine learning, using building characteristics such as size, shape and proximity to neighbouring features, was then used to form a model to predict which of these 2 classes every building in these two countries fell into. Results show that an ensemble machine learning approach performed marginally, but statistically, better than the best individual model and that using this ensemble model we were able to correctly classify >86% (using independent test data) of structures correctly as sprayable and not-sprayable across both countries.Entities:
Mesh:
Year: 2018 PMID: 30240429 PMCID: PMC6150517 DOI: 10.1371/journal.pone.0204399
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Structure types of OSM buildings for Swaziland and Botswana combined and their respective binary classification for modeling.
| Classification | Type |
|---|---|
| Sprayable | Abandoned hut, Apartments, Bridge, Building, Cathedral, Church Civic, Collapsed, College, Commercial, Compound, Construction, Constructions, Damaged, Farm, Farm auxiliary, Garage, Government building, Greenhouse, Hangar, Hospital, Hotel, Industrial, Kindergarten, Land Board, Mosque, Office, Public, Quay, Retail, Roof, Roofless, Ruins, School, Service, Shed, Sports centre, Stable, Stadium, Storage tank, Supermarket, Terrace, Toilets, Tree, Tribal Government Building, University, Warehouse, Water utility, Youth ministry condo |
| Not-sprayable | Detached, House, Hut, Residential, Teacher housing |
Breakdown of details of the OSM structure data available for Swaziland and Botswana.
| Country | Total with ‘type’ data | Total without ‘type’ data | |
|---|---|---|---|
| Botswana | 39,174 | 571,104 | |
| Sprayable | 33,955 | ||
| Not-sprayable | 5,219 | ||
| Swaziland | 9,177 | 584,188 | |
| Sprayable | 7,317 | ||
| Not-sprayable | 1,860 |
Fig 1Distribution of OSM structure data for which type is indicated in A- Botswana and B—Swaziland. Blue and red points represent sprayable and not-sprayable respectively. Note that data are plotted with opaque colors to allow density to be represented. © OpenStreetMap contributors.
Fig 2CV-AUC values obtained from each of the base learners and super learner plotted in decreasing value of AUC for A- Botswana and B–Swaziland. SL–super learner, RF–random forest, XGB–extreme gradient boosting, BRT–boosted regression tree, MARS–multivariate adaptive regression spline, CF–conditional forest, GLM NET–elastic net regression, PMARS–multivariate adaptive polynomial spline, GLM–logistic regression, GAM–generalized additive model.
Coefficients estimated for each base learner by the super leaner algorithm.
| Base learner | Botswana | Swaziland |
|---|---|---|
| Mean | 0.12 | 0.14 |
| GLM | 0 | 0 |
| PMARS | 0.04 | 0 |
| GLM NET | 0 | 0 |
| BRT | 0 | 0 |
| XGB | 0.2 | 0.26 |
| CF | 0.02 | 0 |
| RF | 0.60 | 0.70 |
| MARS | 0 | 0 |
| GAM | 0.01 | 0 |
GLM–logistic regression, PMARS–multivariate adaptive polynomial spline, GLM NET–elastic net regression, BRT–boosted regression tree, XGB–extreme gradient boosting, CF–conditional inference forest, RF–random forest, MARS–multivariate adaptive regression spline, GAM–generalized additive model
Fig 3AUC values of the final super learner model by local impervious (urbanicity) quantile for Botswana and Swaziland.
Fig 4Classification performance of the super learner model under different cutoff values for A–Botswana and B–Swaziland.
Performance of the super learner models when applied to the test data.
Classifications were made using the country specific cutoffs at which CV-classification accuracy was equal for sprayable and not-sprayable structures.
| Observed | ||||
|---|---|---|---|---|
| Predicted | Not-sprayable | Sprayable | ||
| Not-sprayable | 468 | 458 | ||
| Sprayable | 53 | 2937 | ||
| % correctly classified | 89.8% | 86.5% | ||
| Predicted | Not-sprayable | Sprayable | ||
| Not-sprayable | 167 | 100 | ||
| Sprayable | 19 | 631 | ||
| % correctly classified | 89.8% | 86.3% | ||
Breakdown of predictions for all OSM structures in the country, including structures for which type is available.
| Country | Sprayable | Not-sprayable |
|---|---|---|
| Botswana | 385,336 (63%) | 224,942 (37%) |
| Swaziland | 378,659 (64%) | 214,706 (36%) |