| Literature DB >> 33237771 |
Jie Chen1, Kees de Hoogh2,3, John Gulliver4, Barbara Hoffmann5, Ole Hertel6, Matthias Ketzel6,7, Gudrun Weinmayr8, Mariska Bauwelinck9, Aaron van Donkelaar10,11, Ulla A Hvidtfeldt12, Richard Atkinson13, Nicole A H Janssen14, Randall V Martin10,12,15, Evangelia Samoli16, Zorana J Andersen17, Bente M Oftedal18, Massimo Stafoggia19,20, Tom Bellander20, Maciej Strak1,15, Kathrin Wolf21, Danielle Vienneau2,3, Bert Brunekreef1,22, Gerard Hoek1.
Abstract
We developed Europe-wide models of long-term exposure to eight elements (copper, iron, potassium, nickel, sulfur, silicon, vanadium, and zinc) in particulate matter with diameter <2.5 μm (PM2.5) using standardized measurements for one-year periods between October 2008 and April 2011 in 19 study areas across Europe, with supervised linear regression (SLR) and random forest (RF) algorithms. Potential predictor variables were obtained from satellites, chemical transport models, land-use, traffic, and industrial point source databases to represent different sources. Overall model performance across Europe was moderate to good for all elements with hold-out-validation R-squared ranging from 0.41 to 0.90. RF consistently outperformed SLR. Models explained within-area variation much less than the overall variation, with similar performance for RF and SLR. Maps proved a useful additional model evaluation tool. Models differed substantially between elements regarding major predictor variables, broadly reflecting known sources. Agreement between the two algorithm predictions was generally high at the overall European level and varied substantially at the national level. Applying the two models in epidemiological studies could lead to different associations with health. If both between- and within-area exposure variability are exploited, RF may be preferred. If only within-area variability is used, both methods should be interpreted equally.Entities:
Year: 2020 PMID: 33237771 PMCID: PMC7745532 DOI: 10.1021/acs.est.0c06595
Source DB: PubMed Journal: Environ Sci Technol ISSN: 0013-936X Impact factor: 9.028
Performance of PM2.5 Composition Models over Europea
| component | Cu | Fe | K | Ni | S | Si | V | Zn | ||
|---|---|---|---|---|---|---|---|---|---|---|
| inclusion of | no. of sites | 414 | 413 | 414 | 402 | 404 | 400 | 402 | 413 | |
| Model Building | ||||||||||
| SLR | one-step | model | 0.56 | 0.55 | 0.61 | 0.62 | 0.79 | 0.52 | 0.70 | 0.48 |
| model RMSE | 3.3 | 65.5 | 64.6 | 0.9 | 146.5 | 59.7 | 1.7 | 11.8 | ||
| two-step, step1 | model | 0.52 | 0.53 | 0.52 | 0.56 | 0.80 | 0.48 | 0.66 | 0.47 | |
| model RMSE | 3.4 | 67.4 | 71.1 | 1.0 | 142.2 | 61.9 | 1.8 | 11.9 | ||
| two-step, step2 | model | 0.56 | 0.53 | 0.60 | 0.60 | 0.82 | 0.50 | 0.69 | 0.48 | |
| model RMSE | 3.3 | 67.4 | 65.0 | 0.9 | 135.4 | 61.1 | 1.7 | 11.8 | ||
| RF | one-step | model | 0.95 | 0.95 | 0.97 | 0.95 | 0.98 | 0.95 | 0.97 | 0.95 |
| model RMSE | 1.1 | 20.8 | 16.8 | 0.3 | 40.2 | 19.9 | 0.5 | 3.5 | ||
| two-step, step1 | model | 0.95 | 0.95 | 0.97 | 0.95 | 0.98 | 0.94 | 0.97 | 0.96 | |
| model RMSE | 1.1 | 20.9 | 17.4 | 0.3 | 41.8 | 20.3 | 0.5 | 3.4 | ||
| two-step, step2 | model | 0.98 | 0.98 | 0.99 | 0.98 | 0.99 | 0.98 | 0.99 | 0.99 | |
| model RMSE | 0.6 | 12.4 | 9.5 | 0.2 | 27.0 | 12.2 | 0.3 | 1.8 | ||
| HOV | ||||||||||
| SLR | one-step | HOV | 0.47 | 0.48 | 0.58 | 0.57 | 0.76 | 0.50 | 0.63 | 0.41 |
| HOV RMSE | 3.6 | 70.5 | 66.4 | 1.0 | 156.4 | 60.8 | 1.8 | 12.5 | ||
| two-step, step1 | HOV | 0.44 | 0.46 | 0.50 | 0.51 | 0.76 | 0.46 | 0.60 | 0.42 | |
| HOV RMSE | 3.7 | 71.7 | 72.6 | 1.0 | 154.9 | 63.4 | 1.9 | 12.4 | ||
| two-step, step2 | HOV | 0.48 | 0.48 | 0.59 | 0.56 | 0.79 | 0.46 | 0.63 | 0.41 | |
| HOV RMSE | 3.6 | 70.5 | 66.1 | 1.0 | 147.0 | 62.9 | 1.8 | 12.5 | ||
| RF | one-step | HOV | 0.60 | 0.60 | 0.82 | 0.74 | 0.91 | 0.62 | 0.85 | 0.68 |
| HOV RMSE | 3.2 | 61.7 | 44.1 | 0.7 | 97.0 | 52.9 | 1.2 | 9.3 | ||
| two-step, step1 | HOV | 0.59 | 0.59 | 0.79 | 0.74 | 0.90 | 0.60 | 0.84 | 0.68 | |
| HOV RMSE | 3.2 | 62.4 | 47.4 | 0.7 | 102.1 | 54.2 | 1.2 | 9.2 | ||
| two-step, step2 | HOV | 0.59 | 0.61 | 0.80 | 0.76 | 0.90 | 0.62 | 0.86 | 0.71 | |
| HOV RMSE | 3.2 | 61.3 | 45.8 | 0.7 | 99.5 | 53.1 | 1.1 | 8.7 | ||
SLR = supervised linear regression; RF = random forest; r2 = squared Pearson correlation; RMSE = root-mean-square error; HOV = fivefold hold-out validation.
Unit of RMSE: ng/m3.
Performance of RF on training set cannot be interpreted.
Performance of PM2.5 Composition Models to Assess within-Area Variation: Average within-Area r2a
| avg. WA | inclusion of | evaluation method | Cu | Fe | K | Ni | S | Si | V | Zn |
|---|---|---|---|---|---|---|---|---|---|---|
| SLR | one-step | five-fold HOV | 0.34 | 0.35 | 0.09 | 0.18 | 0.14 | 0.18 | 0.21 | 0.20 |
| LOAOCV | 0.37 | 0.38 | 0.09 | 0.15 | 0.22 | 0.21 | 0.23 | 0.18 | ||
| two-step, step1 | five-fold HOV | 0.34 | 0.34 | 0.08 | 0.17 | 0.14 | 0.20 | 0.18 | 0.21 | |
| LOAOCV | 0.35 | 0.35 | 0.09 | 0.15 | 0.22 | 0.20 | 0.20 | 0.18 | ||
| two-step, step2 | five-fold HOV | 0.35 | 0.36 | 0.07 | 0.17 | 0.14 | 0.20 | 0.19 | 0.19 | |
| LOAOCV | 0.36 | 0.36 | 0.09 | 0.15 | 0.22 | 0.20 | 0.21 | 0.18 | ||
| RF | one-step | five-fold HOV | 0.31 | 0.31 | 0.05 | 0.21 | 0.21 | 0.19 | 0.27 | 0.24 |
| LOAOCV | 0.35 | 0.35 | 0.12 | 0.18 | 0.21 | 0.17 | 0.27 | 0.18 | ||
| two-step, step1 | five-fold HOV | 0.31 | 0.30 | 0.06 | 0.21 | 0.22 | 0.17 | 0.27 | 0.24 | |
| LOAOCV | 0.34 | 0.34 | 0.07 | 0.16 | 0.21 | 0.16 | 0.23 | 0.19 | ||
| two-step, step2 | five-fold HOV | 0.29 | 0.29 | 0.07 | 0.21 | 0.23 | 0.17 | 0.29 | 0.25 | |
| LOAOCV | 0.34 | 0.34 | 0.07 | 0.16 | 0.21 | 0.16 | 0.23 | 0.20 |
SLR = supervised linear regression; RF = random forest; r2 = squared Pearson correlation; avg. WA r2 is the average of 19 study area-specific r2s (area-specific r2s evaluated by five-fold HOV are shown in Figure S4); HOV = hold-out validation; LOAOCV = leave-one-area-out cross-validation.
Figure 1Regression slopes (shown in red) of predictors selected in SLR and relative variable importance (shown in blue) of the 15 most important predictors in RF.
Figure 2Maps of PM2.5 components developed by our main SLR (two-step, step2) and RF (two-step, step1) models.
Correlations between Predictions by Our Main SLR (Two-Step, step2) and RF (Two-Step, step1) Models at 41,936 Random Locationsa
| PM2.5 Cu | PM2.5 Fe | PM2.5 K | PM2.5 Ni | PM2.5 S | PM2.5 Si | PM2.5 V | PM2.5 Zn | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| region | RMSE | RMSE | RMSE | RMSE | RMSE | RMSE | RMSE | RMSE | |||||||||
| all European countries | 0.72 | 1.1 | 0.66 | 20.5 | 0.75 | 51.3 | 0.56 | 0.5 | 0.88 | 162.6 | 0.56 | 22.8 | 0.64 | 1.0 | 0.73 | 7.1 | 41,936 |
| ELAPSE Countries | |||||||||||||||||
| combined | 0.79 | 1.0 | 0.77 | 19.1 | 0.70 | 56.1 | 0.53 | 0.5 | 0.89 | 142.0 | 0.75 | 12.5 | 0.66 | 0.8 | 0.73 | 7.0 | 27,411 |
| Austria | 0.77 | 1.1 | 0.80 | 16.2 | 0.80 | 35.0 | 0.28 | 0.1 | 0.95 | 69.5 | 0.08 | 9.5 | 0.82 | 0.2 | 0.89 | 4.4 | 1051 |
| Belgium | 0.82 | 0.9 | 0.89 | 11.9 | 0.05 | 21.0 | 0.83 | 0.4 | 0.79 | 47.4 | 0.71 | 8.2 | 0.76 | 0.8 | 0.70 | 11.4 | 355 |
| Switzerland | 0.80 | 1.1 | 0.89 | 13.1 | 0.76 | 24.1 | 0.36 | 0.1 | 0.48 | 95.0 | 0.38 | 9.2 | 0.49 | 0.2 | 0.65 | 5.8 | 500 |
| Germany | 0.74 | 1.0 | 0.80 | 14.9 | 0.43 | 30.6 | 0.66 | 0.3 | 0.67 | 60.1 | 0.59 | 8.2 | 0.80 | 0.4 | 0.58 | 6.9 | 4233 |
| Denmark | 0.77 | 0.3 | 0.77 | 9.3 | 0.68 | 15.0 | 0.21 | 0.2 | 0.73 | 36.0 | 0.45 | 7.2 | 0.24 | 0.3 | 0.49 | 2.0 | 522 |
| France | 0.62 | 1.1 | 0.75 | 14.2 | 0.31 | 26.9 | 0.39 | 0.4 | 0.72 | 69.0 | 0.48 | 9.0 | 0.77 | 0.6 | 0.59 | 6.8 | 6476 |
| Italy | 0.72 | 1.3 | 0.53 | 23.2 | 0.69 | 33.6 | 0.51 | 0.6 | 0.83 | 183.0 | 0.70 | 19.5 | 0.63 | 1.3 | 0.67 | 9.0 | 3550 |
| Netherlands | 0.74 | 0.9 | 0.83 | 14.2 | 0.55 | 17.2 | 0.64 | 0.4 | 0.60 | 45.0 | 0.68 | 8.9 | 0.48 | 0.7 | 0.66 | 14.2 | 451 |
| Norway | 0.37 | 0.0 | 0.61 | 5.6 | –0.68 | 8.6 | 0.43 | 0.2 | 0.25 | 104.9 | 0.03 | 4.3 | 0.80 | 0.3 | 0.46 | 4.0 | 2649 |
| Sweden | 0.56 | 0.2 | 0.77 | 5.8 | –0.68 | 22.6 | 0.68 | 0.1 | 0.88 | 86.7 | 0.31 | 4.9 | 0.73 | 0.4 | 0.38 | 3.6 | 4786 |
| United Kingdom | 0.85 | 0.6 | 0.87 | 12.0 | –0.74 | 25.9 | 0.59 | 0.3 | 0.86 | 72.1 | 0.12 | 9.6 | 0.57 | 0.5 | 0.68 | 4.1 | 2838 |
| Non-ELAPSE Countries | |||||||||||||||||
| Greece | 0.55 | 1.0 | 0.57 | 16.8 | 0.38 | 28.7 | 0.44 | 0.8 | 0.18 | 123.5 | 0.34 | 25.6 | 0.46 | 1.9 | 0.54 | 7.0 | 1541 |
| Finland | 0.25 | 0.3 | 0.58 | 7.4 | 0.49 | 28.7 | 0.48 | 0.1 | 0.85 | 67.3 | 0.28 | 4.9 | 0.33 | 0.3 | 0.55 | 4.2 | 3208 |
| Hungary | 0.68 | 0.9 | 0.61 | 11.7 | 0.74 | 29.0 | 0.34 | 0.2 | 0.60 | 87.4 | 0.57 | 8.2 | 0.67 | 0.3 | 0.79 | 5.0 | 1123 |
| Ireland | 0.52 | 0.3 | 0.65 | 5.8 | –0.70 | 16.9 | 0.46 | 0.2 | 0.45 | 48.5 | –0.08 | 4.8 | 0.41 | 0.4 | 0.26 | 1.8 | 844 |
| Lithuania | 0.63 | 0.8 | 0.68 | 7.0 | 0.65 | 25.3 | 0.26 | 0.1 | 0.16 | 52.5 | 0.59 | 4.7 | 0.29 | 0.2 | 0.89 | 1.8 | 783 |
| Luxembourg | 0.76 | 0.8 | 0.83 | 8.4 | –0.08 | 16.2 | 0.82 | 0.1 | 0.82 | 26.5 | 0.77 | 4.9 | 0.80 | 0.2 | 0.83 | 2.1 | 33 |
| Portugal | 0.41 | 1.6 | 0.01 | 21.0 | –0.03 | 23.5 | 0.71 | 0.4 | 0.72 | 72.9 | 0.22 | 7.3 | 0.80 | 0.7 | 0.60 | 6.6 | 1021 |
| Spain | 0.59 | 1.2 | 0.34 | 17.2 | 0.06 | 32.7 | 0.41 | 0.6 | 0.59 | 102.0 | 0.49 | 10.7 | 0.59 | 1.1 | 0.56 | 8.0 | 5972 |
r = Pearson correlation coefficient; RMSE = root-mean-square error.
Unit of RMSE: ng/m3.
We do not have clear explanations of these high negative correlations. These values possibly reflect the poor performance of both models at low concentrations. Scatter plots documented the poor agreement between predictions by the two models with lots of scatters.