| Literature DB >> 33805155 |
Oluwaseyi Olalekan Arowosegbe1,2, Martin Röösli1,2, Nino Künzli1,2, Apolline Saucy1,2, Temitope Christina Adebayo-Ojo1,2, Mohamed F Jeebhay3, Mohammed Aqiel Dalvie3, Kees de Hoogh1,2.
Abstract
Good quality and completeness of ambient air quality monitoring data is central in supporting actions towards mitigating the impact of ambient air pollution. In South Africa, however, availability of continuous ground-level air pollution monitoring data is scarce and incomplete. To address this issue, we developed and compared different modeling approaches to impute missing daily average particulate matter (PM10) data between 2010 and 2017 using spatiotemporal predictor variables. The random forest (RF) machine learning method was used to explore the relationship between average daily PM10 concentrations and spatiotemporal predictors like meteorological, land use and source-related variables. National (8 models), provincial (32) and site-specific (44) RF models were developed to impute missing daily PM10 data. The annual national, provincial and site-specific RF cross-validation (CV) models explained on average 78%, 70% and 55% of ground-level PM10 concentrations, respectively. The spatial components of the national and provincial CV RF models explained on average 22% and 48%, while the temporal components of the national, provincial and site-specific CV RF models explained on average 78%, 68% and 57% of ground-level PM10 concentrations, respectively. This study demonstrates a feasible approach based on RF to impute missing measurement data in areas where data collection is sparse and incomplete.Entities:
Keywords: Random Forest; South Africa; air pollution; environmental exposure; imputation; particulate matter
Year: 2021 PMID: 33805155 PMCID: PMC8037804 DOI: 10.3390/ijerph18073374
Source DB: PubMed Journal: Int J Environ Res Public Health ISSN: 1660-4601 Impact factor: 3.390
Figure 1The spatial distribution of particulate matter (PM10) monitoring stations across the four provinces of South Africa operating at some point during 2010–2017.
Figure 2PM10 data availability by year and by province—the size and colour of the circles indicate percentage of data capture per year.
Spatial and temporal predictors used for random forest models
| Variable | Description | Source | Resolution |
|---|---|---|---|
| Population density | Mean population within 1 × 1 km2 grid cell | SEDAC | ~1 km |
| Landcover | South Africa National Land Cover 2018 densities (summary of meters within the grid cells by land cover categories of Natural, Built-up, Residential, Agricultural, Industrial) | South Africa Department of Environmental Affairs. | 20 m |
| Light at night | 1 × 1 km2 Intersected aggregate | VIIRS-DNB | 750 m |
| Impervious Surface | 1 × 1 km2 Intersected aggregate after removing no data, clouds, shadows data | NOAA | 30 m |
| Elevation | 1 × 1 km2 intersected aggregate of mean elevation | SRTM Digital Elevation Database | 90 m |
| Roads | Summary of road length distance to nearest road type: major roads and other roads | OpenStreetMap | Lines |
| Climate zones | Cold interior, Temperate interior, Hot interior, Temperate coastal, Sub-tropical coastal, Arid interior | South Africa Bureau of Standards 2005 | 6 Zones |
| Meteorological variables (daily modelled planetary boundary layer height, temperature, precipitation, wind speed, wind direction, relative humidity, vertical velocity | Daily global ECMWF re-analysis estimates | ERA5-reanalysis | 10 × 10 km |
| Modeled Tropospheric estimates of NO2, PM10, O3 | Daily Chemical transport model estimate | Chemical transport model | 10 × 10 km |
Abbreviations: SEDAC (Socioeconomic Data and Applications Center), VIIRS-DNB(Visible Infrared Imaging Radiometer Suite-Day/Night Band), NOAA(National Oceanic and Atmospheric Administration, SRTM (Shuttle Radar Topography Mission), ERA-5 (European Centre for Medium-Range Weather Forecasts Reanalysis 5th Generation).
Summary of model performance statistics over the period 2010–2017 for the national, provincial and site-specific models showing the range of R2, root mean squared error (RMSE) and mean absolute error (MAE) for the years included.
| Model Building | Spatial LOLO CV | Temporal LTO CV | Data Availability | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| R2 | RMSE | MAE | R2 | RMSE | MAE | R2 | RMSE | MAE | No of Unique Sites | Years | |
|
| 0.77–0.79 | 12.1–16.76 | 8.69–11.38 | 0.11–0.35 | 17.72–29.47 | 13.62–23.65 | 0.77–0.79 | 12.31–16.43 | 8.85–11.39 | 20–44 | 2010–2017 |
| Mpumalanga | 0.73–0.81 | 14.03–19.35 | 9.63–12.13 | 0.39–0.69 | 22.06–36.21 | 13.5–29.59 | 0.73–0.78 | 13.55–19.21 | 9.85–12.01 | 5–17 * | 2010–2017 |
| Gauteng | 0.49–0.79 | 10.34–23.36 | 9.24–16.75 | 0.26–0.52 | 19.72–34.25 | 15.69–29.42 | 0.52–0.79 | 15.11–23.43 | 9.94–16.87 | 6–18 * | 2010–2017 |
| Western Cape | 0.29–0.71 | 6.74–8.73 | 5.11–6.72 | 0.35–0.54 | 7.38–11.22 | 5.76–8.86 | 0.44–0.66 | 6.66–23.29 | 5.18–17.92 | 1–11 * | 2010–2017 |
| KwaZulu-Natal | 0.55–0.79 | 7.36–9.53 | 5.29–8.11 | 0.29–0.57 | 8.54–19.95 | 6.95–16.82 | 0.47–0.78 | 7.37–10.71 | 5.46–8 | 3–6 * | 2010–2017 |
| Beliville | 0.42–0.47 | 5.81–9.16 | 4.51–7.26 | NA | NA | NA | 0.45–0.49 | 5.67–9.02 | 4.45–7.03 | NA | 2012, 2013, 2015–2017 |
| Bodibeng | 0.54–0.63 | 16.89–19.42 | 13.61–15.07 | NA | NA | NA | 0.57–0.67 | 16.36–18.91 | 13.32–14.87 | NA | 2012–2013 |
| Brackenham | 0.41–0.49 | 8.06–8.95 | 6.31–7.10 | NA | NA | NA | 0.46–0.49 | 7.81–8.95 | 6.25–7.15 | NA | 2011, 2015–2017 |
| Booysens | 0.45–0.67 | 22.13–22.82 | 17.99–20.77 | NA | NA | NA | 0.5–0.71 | 22.10–25.74 | 17.87–20.53 | NA | 2012,2014 |
| Camden | 0.38–0.62 | 10.64–23.27 | 8.69–17.85 | NA | NA | NA | 0.39–0.65 | 10.29–22.43 | 9.61–17.15 | NA | 2013, 2015, 2017 |
| CBD | 0.38–0.59 | 6.35–9.55 | 4.93–7.45 | NA | NA | NA | 0.41–0.64 | 6.28–9.23 | 4.98–7.21 | NA | 2011–2013, 2015–2017 |
| City Hall | 0.45 | 10.29 | 7.69 | NA | NA | NA | 0.48 | 9.78 | 7.43 | NA | 2010 |
| Elandsfontein | 0.39–0.52 | 11.72–12.49 | 9.38–9.68 | NA | NA | NA | 0.45–0.57 | 11.17–11.79 | 8.99–9.38 | NA | 2016–2017 |
| Ermelo | 0.48–0.76 | 9.20–18.96 | 7.69–15.31 | NA | NA | NA | 0.51–0.77 | 9.12–19.98 | 7.54–13.89 | NA | 2010–2016 |
| Etwatwa | 0.63 | 24.03 | 18.74 | NA | NA | NA | 0.69 | 23.78 | 18.56 | NA | 2012 |
| Ferndale | 0.68–0.74 | 3.63–5.42 | 2.84–3.92 | NA | NA | NA | 0.65–0.77 | 3.49–5.38 | 2.76–3.88 | NA | 2010–2012 |
| Foreshore | 0.32–0.49 | 5.29–9.76 | 4.1–7.22 | NA | NA | NA | 0.33–0.49 | 5.27–9.58 | 4.13–7.08 | NA | 2011–2013,2015–2017 |
| Gangles | 0.48–0.74 | 11.86–13.4 | 9.22–10.11 | NA | NA | NA | 0.51–0.75 | 11.23–11.88 | 8.96–9.71 | NA | 2010, 2011, 2013,2014 |
| Germiston | 0.42 | 19.65 | 14.96 | NA | NA | NA | 0.44 | 19.07 | 14.79 | NA | 2011 |
| George | 0.55–0.56 | 7.09–8.41 | 5.49–6.56 | NA | NA | NA | 0.58 | 6.95–8.12 | 5.39–6.34 | NA | 2010, 2013 |
| Goodwood | 0.46–0.57 | 6.77–8.78 | 5.26–8.24 | NA | NA | NA | 0.49–0.59 | 6.60–8.49 | 5.29–7.80 | NA | 2011–2012, 2014–2016 |
| Grootvlei | 0.41–0.44 | 10.76–11.32 | 8.70–8.87 | NA | NA | NA | 0.42–0.49 | 10.65–11.12 | 8.63–8.82 | NA | 2011, 2013 |
| Hendrina | 0.39–0.71 | 11.12–17.02 | 8.32–13.62 | NA | NA | NA | 0.43–0.74 | 11.18–16.56 | 8.36–12.96 | NA | 2010–2012,2015–2016 |
| Middleburg | 0.67–0.81 | 7.81–19.25 | 6.08–14.73 | NA | NA | NA | 0.70–0.82 | 7.49–18.63 | 5.92–14.25 | NA | 2010–2016 |
| Olievenhoutbosch | 0.57 | 34.23 | 27.01 | NA | NA | NA | 0.59 | 34.16 | 26.98 | NA | 2012 |
| Orange Farm | 0.45–0.69 | 10.78–19.81 | 8.57–15.56 | NA | NA | NA | 0.49–0.71 | 10.23–19.49 | 8.28–15.62 | NA | 2010,2017 |
| Rosslyn | 0.55–0.61 | 5.91–11.49 | 4.77–9.30 | NA | NA | NA | 0.52–0.67 | 5.86–11.05 | 4.47.8.93 | NA | 2012–2014 |
| Secunda | 0.63–0.77 | 7.73–25.21 | 5.86–19.96 | NA | NA | NA | 0.67–0.77 | 7.47–24.64 | 5.75–19.7 | NA | 2010–2013 |
| Witbank | 0.72–0.83 | 9.21–22.33 | 7.63–17.27 | NA | NA | NA | 0.73–0.83 | 8.79–21.87 | 7.34–16.75 | NA | 2010,2013–2016 |
| Komati | 0.45–0.83 | 8.52–28.02 | 6.61–21.51 | NA | NA | NA | 0.46–0.84 | 8.29–27.11 | 6.5–20.91 | NA | 2011–2012,2014–2017 |
| Leandra | 0.29–0.36 | 6.63–14 | 4.86–10.38 | NA | NA | NA | 0.35–0.4 | 6.35–13.64 | 4.81–10.31 | NA | 2011–2012 |
| Newtown | 0.43 | 22.07 | 17.52 | NA | NA | NA | 0.47 | 21.68 | 17.27 | NA | 2012 |
| Phola | 0.54–0.65 | 22.44–28.89 | 17.83–22.55 | NA | NA | NA | 0.57–0.65 | 22.02–28.88 | 17.48–22.72 | NA | 2013–2014,2016–2017 |
| Stellenbosch | 0.35–0.56 | 6.34–7.31 | 4.85–5.67 | NA | NA | NA | 0.37–0.61 | 6.26–7.14 | 4.83–5.62 | NA | 2012–2013 |
| Tableview | 0.36–0.4 | 5.63–7.04 | 4.43–5.81 | NA | NA | NA | 0.38–0.43 | 5.54–7 | 4.31–5.6 | NA | 2011–2013 |
| Tembisa | 0.71 | 17.78 | 14.09 | NA | NA | NA | 0.73 | 17.35 | 13.89 | NA | 2011 |
| Thokoza | 0.56 | 41.30 | 29.22 | NA | NA | NA | 0.57 | 40.25 | 28.76 | NA | 2011 |
| Wallacedene | 0.47–0.51 | 5.53–11.26 | 4.28–8.9 | NA | NA | NA | 0.47–0.54 | 5.52–10.82 | 4.29–8.69 | NA | 2012, 2015–2017 |
| Wattville | 0.52 | 39.10 | 29.09 | NA | NA | NA | 0.57 | 37.16 | 28.57 | NA | 2012 |
| Club | 0.59–0.67 | 11.01–14.87 | 8.76–11.86 | NA | NA | NA | 0.62–0.69 | 10.7–14.88 | 8.55–11.99 | NA | 2012–2014, 2016–2017 |
| Ekandustria | 0.46–0.59 | 11.14–16.83 | 8.88–13.09 | NA | NA | NA | 0.50–0.64 | 10.58–16.43 | 8.5–12.83 | NA | 2013–2014 |
| Embalenhle | 0.56–0.73 | 16.48–22.18 | 11.34–14.69 | NA | NA | NA | 0.59–0.73 | 13.31–22.18 | 11.03–17.86 | NA | 2012,2014,2016–2017 |
| Verkykkop | 0.44–0.49 | 6.63–9.71 | 5.53–7.88 | NA | NA | NA | 0.47–0.48 | 6.56–9.49 | 5.33–7.72 | NA | 2013,2016–2017 |
| Randwater | 0.32–0.73 | 12.99–15.99 | 9.82–15.83 | NA | NA | NA | 0.36–0.75 | 12.08–15.63 | 9.57–12.19 | NA | 2013–2017 |
| Esikhaweni | 0.43–0.58 | 9.07.9.45 | 7.36–7.4 | NA | NA | NA | 0.44–0.60 | 8.95–9.35 | 7.17 | NA | 2016–2017 |
| Chicken Farm | 0.44 | 13.14 | 10.44 | NA | NA | NA | 0.48 | 12.71 | 10.21 | NA | 2017 |
| Kwazamokuhle | 0.65 | 18.10 | 14.44 | NA | NA | NA | 0.67 | 17.10 | 13.84 | NA | 2017 |
| Kriel Village | 0.62 | 17.27 | 13.55 | NA | NA | NA | 0.66 | 16.89 | 13.41 | NA | 2017 |
| Bosjesspruit | 0.51 | 13.05 | 10.44 | NA | NA | NA | 0.55 | 12.58 | 10.27 | NA | 2017 |
* The provincial models included all possible sites with PM10 observation; ** The sites models included the monitoring stations with at least 70% annual PM10 observation. NA: Not applicable. These are individual site models—Spatial cross-validation (CV) cannot be perform for models with less than two sites. LOLO: Leave one location out spatial cross-validation; LTO: Leave time out temporal cross-validation. Range: The minimum and maximum values of the statistics metrics from the models across 2010.
Figure 3National model variable of importance.
Range of the observed versus predicted PM10 concentrations (in µg/m3) for the 3 different models (National, Provincial and Site-specific) averaged over all sites and years (2010–2017) by province for the mean, standard deviation (SD) and 5th, 25th, 50th, 75th and 95th percentiles).
| Province | Mean | SD | Percentiles | |||||
|---|---|---|---|---|---|---|---|---|
| µg/m3 | µg/m3 | 5 | 25 | 50 | 75 | 95 | ||
| Mpumalanga | Observed | 35.70–50.90 | 17.70–29.10 | 9.30–15.30 | 21.40–30.30 | 32.90–46.20 | 47.70–71.20 | 68.20–102.80 |
| National | 34.60–48.60 | 6.30–11.10 | 23.70–34.20 | 29.20–41.10 | 34.30–47.80 | 39.50–56.80 | 45.70–66.50 | |
| Provincial | 34.20–46.30 | 10.40–17.40 | 17.10–24.70 | 24.90–33.60 | 32.20–44.30 | 42.30–60.40 | 53.00–75.80 | |
| Site-specific | 35.70–52.00 | 11.40–19.50 | 18.60–26.10 | 26.80–37.10 | 34.30–49.80 | 43.30–66.90 | 55.50–85.40 | |
| Gauteng | Observed | 53.40–58.30 | 28.40–31.30 | 16.20–20.30 | 31.10–35.20 | 47.50–52.10 | 71.10–77.10 | 107.60–115.00 |
| National | 36.30–41.60 | 10.20–12.90 | 21.30–24.40 | 27.00–31.00 | 34.80–40.70 | 44.60–52.00 | 54.00–62.40 | |
| Provincial | 52.90–59.40 | 16.90–17.90 | 30.80–35.50 | 40.30–45.40 | 50.20–56.50 | 66.10–73.30 | 81.20–90.00 | |
| Site-specific | 53.00–58.40 | 17.40–19.70 | 29.30–33.50 | 37.90–43.10 | 49.70–54.80 | 65.60–72.30 | 84.70–93.20 | |
| Western Cape | Observed | 19.50–26.70 | 8.10–11.60 | 8.50–12.70 | 13.40–18.70 | 18.50–25.20 | 24.30–33.30 | 35.00–48.10 |
| National | 31.90–49.10 | 7.10–11.20 | 22.00–35.90 | 26.00–41.00 | 29.90–46.80 | 36.60–55.40 | 45.20–71.60 | |
| Provincial | 20.00–28.00 | 39.00–5.50 | 13.50–20.40 | 16.70–24.10 | 20.00–28.00 | 22.70–31.80 | 26.90–37.10 | |
| Site-specific | 19.50–26.70 | 4.80–6.60 | 11.80–17.90 | 15.90–21.80 | 18.80–26.20 | 22.40–30.70 | 28.00–38.40 | |
| KwaZulu-Natal | Observed | 24.20–29.80 | 11.01–14.01 | 9.50–13.50 | 15.90–20.01 | 22.10–26.60 | 30.70–37.10 | 45.70–56.60 |
| National | 31.60–43.80 | 8.20–12.90 | 21.10–28.40 | 24.50–33.40 | 29.00–40.40 | 37.60–53.00 | 47.60–66.00 | |
| Provincial | 23.90–32.90 | 5.20–9.50 | 15.60–21.60 | 19.20–25.90 | 22.50–31.60 | 27.10–39.40 | 35.40–49.50 | |
| Site-specific | 24.20–30.50 | 6.01–10.02 | 15.30–19.70 | 19.10–23.30 | 23.00–28.30 | 28.00–36.00 | 36.00–50.80 |