| Literature DB >> 30711654 |
Mariam S Girguis1, Lianfa Li2, Fred Lurmann3, Jun Wu4, Robert Urman2, Edward Rappaport2, Carrie Breton2, Frank Gilliland2, Daniel Stram5, Rima Habre2.
Abstract
BACKGROUND: Increasingly ensemble learning-based spatiotemporal models are being used to estimate residential air pollution exposures in epidemiological studies. While these machine learning models typically have improved performance, they suffer from exposure measurement error that is inherent in all models. Our objective is to develop a framework to formally assess shared, multiplicative measurement error (SMME) in our previously published three-stage, ensemble learning-based nitrogen oxides (NOx) model to identify its spatial and temporal patterns and predictors.Entities:
Mesh:
Substances:
Year: 2019 PMID: 30711654 PMCID: PMC6499078 DOI: 10.1016/j.envint.2018.12.025
Source DB: PubMed Journal: Environ Int ISSN: 0160-4120 Impact factor: 9.621
Fig. 1.Average NOx (ppb) for southern California Children's Health Study (CHS) residential locations, 1992–2012. Average NOx using stage 3 of the Li et al. (2017) model which uses the averaged stage 2 NOx estimates and constrained optimization to re-predict exposure based on physical constraints meant to mimic known or observed real-life behavior of NOx. Average NOx for each unique CHS location displayed using quantiles (6).
Fig. 2.Scatter plot of covariance by product means to visualize shared exposure measurement error. The covariance and product of means of each pair of predictions are used to demonstrate shared error. The intercept of the ordinary least squares regression line to fit the data is −0.2516 with a slope of 0.000029. The negative intercept indicates there is no evidence of additive shared error and the significant slope (p < 0.0001) indicates significant multiplicative shared error.
Comparison of the distribution of estimated NOx exposures[a] and their main predictors in the full southern California Children’s Health Study Cohort E Residential (Biweekly) Timelines[b] and in the subset of 2500 randomly sampled[c] predictions used in the assessment of Shared Unshared Multiplicative Additive (SUMA) exposure measurement error.
| N | Full CHS cohort E | Random sample of 2500 |
|---|---|---|
|
| ||
| 1,850,415 | 2500 | |
|
| ||
| n (%) | n (%) | |
| Prediction year | ||
| 1992–2000 | 615,454 (33.2) | 826 (33.0) |
| 2001–2004 | 568,177 (30.7) | 749 (30.0) |
| 2005–2012 | 666,784 (36.0) | 925 (37.0) |
| Traffic density within a 300 m buffer[ | ||
| 0–13.54 | 462,287 (25.0) | 651 (26.0) |
| 13.55–33.61 | 462,427 (25.0) | 611 (24.5) |
| 33.62–75.64 | 462,518 (25.0) | 579 (23.1) |
| 75.65–1235 | 462,591 (25.0) | 659 (26.3) |
| Population density[ | ||
| 0–2700 | 462,606 (25.0) | 657 (26.2) |
| 2701–5234 | 461,887 (25.0) | 571 (22.8) |
| 5235–9049 | 463,340 (25.0) | 642 (25.6) |
| 9050–78,668 | 462,582 (25.0) | 630 (25.2) |
| Mean elevation within a 300 m buffer | ||
| −36.6–56.5 | 462,790 (25.0) | 648 (25.9) |
| 56.6–253.3 | 462,038 (25.0) | 598 (23.9) |
| 253.4–365.4 | 462,892 (25.0) | 633 (25.9) |
| 365.5–2231.8 | 462,695 (25.0) | 621 (24.8) |
| Distance to major roadways[ | ||
| 0–150 | 113,133 (6.1) | 163 (6.5) |
| 151–300 | 147,851 (7.9) | 210 (8.4) |
| > 300 | 1,589,431 (85.8) | 2127 (85.0) |
| CALINE4[ | ||
| 0–3.30 | 462,837 (25.0) | 629 (25.1) |
| 3.31–8.87 | 462,175 (25.0) | 590 (23.6) |
| 8.88–18.55 | 462,453 (25.0) | 626 (25.0) |
| 18.56–455 | 462,950 (25.0) | 655 (26.2) |
| CALINE4[ | ||
| 0–2.43 | 461,744 (25.0) | 656 (26.2) |
| 2.44–4.75 | 462,269 (25.0) | 587 (23.4) |
| 4.76–8.10 | 463,607 (25.0) | 624 (25.0) |
| 8.11–92.39 | 462,795 (25.0) | 633 (25.3) |
| Spatiotemporal NOx predictions[ | ||
| 2.10–20.62 | 462,406 (25.0) | 635 (25.4) |
| 20.63–31.60 | 462,523 (25.0) | 632 (25.3) |
| 31.61–48.40 | 462,800 (25.0) | 589 (23.6) |
| 48.41–277.00 | 462,689 (25.0) | 644 (25.8) |
Each prediction is for a biweekly period at a residential location from the reconstructed CHS lifetime residential history.
Exposure prediction characteristics for all 5106 southern California Children's Health Study (CHS) cohort E participants.
Geographic characteristics summarized for sample 1 of 10.
Traffic Density calculated using distance decayed annual average daily traffic (AADT) volume from major roads (freeways/highways and major surface streets) within a 300 m circular buffer.
Population density calculated within 300 m buffers based on US census block group populations from the 1990, 2000, 2010 linearly interpolated or extrapolated for 1992–2012.
Distance to freeways/highways (FCC1 road classification).
CALINE4 is line source dispersion model using quarterly average daily traffic volumes (Benson, 1984).
Spatiotemporal Stage 2 NOx predictions (Li et al., 2017).
Distribution of between and within prediction variance parameters used to determine Shared Unshared Multiplicative Additive (SUMA) measurement error components.
| Parameters | Min | Max | Mean (standard | Median | Reference |
|---|---|---|---|---|---|
| Covariance | −69 | 513 | 0.20 (3) | 0 | |
| Product of means | 9 | 43,192 | 1453 (1560) | 967 | |
| Variance | 0.02 | 649 | 11 (34) | 4 | |
| Square of mean | 3449 | 43,510 | 2133 (3450) | 975 |
Shared and Unshared, Multiplicative and Additive (SUMA) exposure measurement error components in the spatiotemporal NOx predictions for the southern California Children's Health Study Cohort lifetime residential histories.
| Error type | Value | Standard error | p-Value | Reference |
|---|---|---|---|---|
| Shared additive ( | −0.25166 | 0.00209 | < 0.0001 | |
| Shared multiplicative ( | 0.00029 | 0.0000009 | < 0.0001 | |
| Unshared additive ( | −5.39406 | 0.48467 | < 0.0001 |
|
| Unshared multiplicative ( | 0.00751 |
|
Fig. 3.Scatter plot of prediction variance by square of mean to visualize unshared exposure measurement error. The variance and square of mean for each prediction across 120 ensembles are used to demonstrate unshared error. The intercept of the ordinary least squares regression line to fit the data is −5.39 with a slope of 0.0078. The negative intercept indicates there is no evidence of additive unshared error and the significant slope (p < 0.0001) indicates significant multiplicative unshared error.
Time-stratified (calendar year tertiles) analysis of Shared Multiplicative Measurement Error (SMME) in the spatiotemporal NOx predictions for the southern California Children's Health Study Cohort lifetime residential histories.
| Time period[ | 1992–2000 | 2001–2004 | 2005–2012 |
|---|---|---|---|
| Shared multiplicative error ( | 0.0003627[ | 0.0001549[ | 0.0001496[ |
| Min covariance | −89 | −31 | −20 |
| Max covariance | 757 | 240 | 182 |
| Median covariance | 0.7 | 0.5 | 0.4 |
| Min product mean | 40 | 19 | 10 |
| Max product mean | 52,691 | 24,540 | 15,215 |
| Median product mean | 1563 | 1100 | 589 |
A random subset of 2500 predictions were sampled for each time period stratum.
Shared multiplicative error component determined by the slope of the regression of the covariance on product means between predictions using 120 ensembles.
p-Value < 0.0001.
Fig. 4.Time stratified visualization of shared error: scatter plot of covariance by product means within random samples from a) 1992–2000, b) 2001–2004, and c) 2005–2012 NOx exposure predictions. Figures include a random subset of 2,500 predictions sampled for each time period stratum.
Fig. 5.Spatial pattern of the odds of high Shared Multiplicative Exposure Measurement Error (SMME) in Spatiotemporal NOx Predictions for the full southern California Children's Health Study (CHS) Cohort E residential histories in the a) Unadjusted, crude and b) Fully adjusted model. High SMME risk is determined based on the cut-off of the top 80th percentile of average covariance distribution at each unique prediction location. Odds of SMME is adjusted for population density, traffic density, CALINE4 Non-freeway NOx, distance to airport, and prediction year in the fully adjusted model. Statistically significant geographic areas of increased or decreased risk of SMME are indicated using black contour lines.
Spatial and temporal predictors of the odds of high Shared Multiplicative Exposure Measurement Error (SMME)[a] in spatiotemporal NOx predictions using a random subset[b] of the southern California Children's Health Study.
| Odds ratio | 95% confidence | p-Value | |
|---|---|---|---|
| CALINE4[ | 1.06 | (1.04, 1.08) | < 0.0001 |
| Population density[ | 1.03 | (1.01, 1.04) | < 0.0001 |
| Traffic density within a 300 m buffer[ | 1.11 | (1.09, 1.14) | < 0.0001 |
| Distance to major airport (km)[ | |||
| 0–15 | 1.16 | (1.10, 1.23) | 0.0001 |
| > 15 | 1.00 | – | – |
| Time period | |||
| 1992–2000 | 1.00 | – | – |
| 2001–2004 | 0.97 | (0.93, 1.00) | 0.1777 |
| 2005–2012 | 0.90 | (0.87, 0.94) | < 0.0001 |
Shared multiplicative error determined as the top 80th percentile of average covariance distribution at each unique location.
Random subset of 2500 predictions sampled.
CALINE4 is line source dispersion model using quarterly average daily traffic volumes (Benson, 1984). Odds Ratios given for an interquartile range increase (5.89 ppb).
Population density calculated within a 300 m buffers based on US Census block group populations from the 1990, 2000, 2010 linearly interpolated or extrapolated for 1992–2012. Odds Ratios given for an interquartile range increase (664.4 people per 300 m buffer).
Traffic Density calculated using distance decayed annual average daily traffic (AADT) volume from major roads (freeways/highways and major surface streets) within a 300 m buffer. Odds Ratios given for an interquartile range increase 60.3 AADT per 300 m buffer.
Distance to major (largest 5 in study area) class 1 airports in meters.
Geographic characteristics of spatiotemporal NOx predictions with high and low Shared Multiplicative Exposure Measurement Error (SMME) from a random sample of 2500 predictions from the city of Long Beach, California.
| Low SMME | High SMME (Covariance ≥ 80th percentile) | p-Value [ | 95% CI of difference | |
|---|---|---|---|---|
|
|
| |||
| Mean (sd) | Mean (sd) | (95% CI) | ||
| NOx measures (ppb) | ||||
| Exposure model stage 2 NOx output[ | 55.22 (33.54) | 79.36 (39.38) | < 0.001 | −24.86, −24.61 |
| Ambient NOx[ | 54.81 (28.14) | 77.15 (35.35) | < 0.001 | −22.03, −22.65 |
| CALINE4[ | 29.46 (18.00) | 26.61 (17.01) | < 0.001 | 2.79, 2.90 |
| CALINE4[ | 16.90 (13.9) | 25.50 (12.7) | < 0.001 | −3.63, −3.58 |
| Traffic measures | ||||
| Traffic density[ | 117.84 (74.01) | 126.97 (62.72) | < 0.001 | −9.34, −8.92 |
| Distance[ | 1318.81 (850.88) | 1589.6 (833.4) | < 0.001 | −274.17, −268.72 |
| Distance[ | 3139.12 (1991.44) | 2593 (2036.70) | < 0.001 | 539.07, 552.29 |
| Distance[ | 205.76 (133.77) | 181.48 (124.62) | < 0.001 | 23.20, 24.03 |
| Distance[ | 26.77 (13.9) | 27.38 (14.62) | < 0.001 | −0.656, −0.561 |
| Heavy duty vehicle fraction FCC1[ | 0.120 (0.05) | 0.125 (0.05) | < 0.001 | −0.0055, −0.0056 |
| Heavy duty vehicle fraction FCC2[ | 0.030 (0.05) | 0.050 (0.06) | < 0.001 | −0.0114, −0.01110 |
| Average annual daily traffic FCC1[ | 192,745.0 (61,375.6) | 185,859 (59,631.8) | < 0.001 | 6690.67, 7081.67 |
| Average annual daily traffic FCC2[ | 37,635.2 (6221.5) | 37,133 (5187.9) | < 0.001 | 483.58, 518.86 |
| Average annual daily traffic FCC3[ | 26,127 (7253.1) | 24,773 (8221.5) | < 0.001 | 1327.55, 1379.93 |
| Average annual daily traffic FCC3[ | 4974 (353.9) | 4866 (376.8) | < 0.001 | 106.69, 109.09 |
| Meteorology | ||||
| Minimum temperature | 13.57 (3.40) | 11.20 (3.4) | < 0.001 | 1.28, 1.30 |
| Wind speed | 2.19 (0.39) | 2.20 (0.41) | < 0.001 | −0.017, −0.015 |
| Other | ||||
| Elevation[ | 15.1 (3.7) | 15.2 (4.2) | 0.321 | −0.019, 0.007 |
| Distance[ | 6880.6 9 (3282.36) | 5793.5 (3282.36) | < 0.001 | 1076.54, 1097.69 |
| Population density[ | 14,899 (4990) | 18,338 (4191) | < 0.001 | −3453.23, −3424.78 |
Average of 120 ensembles from Stage 2 of the spatiotemporal NOx exposure model.
Ambient NOx measured at the EPA air quality monitoring stations.
CALINE4 is a line source dispersion model using quarterly average daily traffic volumes (Benson, 1984).
Traffic density calculated using distance decayed annual average daily traffic (AADT) volume from major roads (freeways/highways and major surface streets) within a 300 and 500 m circular buffer.
Distances calculated in meters.
Fraction of heavy duty vehicles by road class within 300 m buffer.
Average annual average daily traffic at location (point estimate).
Mean elevation in a 300 m buffer.
Population density calculated within 300 m buffers based on US Census block group populations from the 1990, 2000, 2010 linearly interpolated or extrapolated for 1992–2012.
Welch non-parametric two sided t-test.
Distribution of NOx predictions with low or high shared Multiplicative Exposure Measurement Error (SMME) across season or time period drawn from a random sample of 2500 predictions from the city of Long Beach, California.
| Low SMME | High SMME | p-Value[ | |
|---|---|---|---|
| Season[ | |||
| Spring | 542 (27.5) | 76 (15.0) | – |
| Winter | 403 (20.4) | 193 (39.2) | < 0.001 |
| Summer | 568 (28.9) | 47 (9.5) | < 0.001 |
| Fall | 454 (23.1) | 176 (35.8) | < 0.001 |
| Time period | |||
| 1992–2000 | 576 (29.3) | 214 (43.4) | – |
| 2001–2004 | 579 (29.5) | 139 (28.2) | < 0.001 |
| 2005–2012 | 812 (41.3) | 139 (28.2) | < 0.001 |
Total sample n = 2459 after accounting for repeat predictions within sample.
Welch non-parametric two sided t-test.
Seasons defined as winter (December through February), spring (March through May), summer (June through August), fall (September through November).
Fig. 6.Spatial pattern of the odds of high Shared Multiplicative Exposure Measurement Error (SMME) in spatiotemporal NOx predictions for a random sample of 2500 predictions from the city of Long Beach, CA (a) unadjusted, (b) after spatial (c) and temporal adjustments. High SMME is defined with a cut-off based on the top 80th percentile of average covariance distribution in Long Beach at each unique location. Confounders of shared multiplicative exposure measurement error risk adjusted for in the model included population density, CALINE4 Non-freeway NOx, and Traffic Density on FCC2 Roads. Statistically significant geographic areas of increased or decreased risk of SMME are indicated using black contour lines.