| Literature DB >> 24093917 |
Zhichao Sun1, Yebin Tao, Shi Li, Kelly K Ferguson, John D Meeker, Sung Kyun Park, Stuart A Batterman, Bhramar Mukherjee.
Abstract
BACKGROUND: As public awareness of consequences of environmental exposures has grown, estimating the adverse health effects due to simultaneous exposure to multiple pollutants is an important topic to explore. The challenges of evaluating the health impacts of environmental factors in a multipollutant model include, but are not limited to: identification of the most critical components of the pollutant mixture, examination of potential interaction effects, and attribution of health effects to individual pollutants in the presence of multicollinearity.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24093917 PMCID: PMC3857674 DOI: 10.1186/1476-069X-12-85
Source DB: PubMed Journal: Environ Health ISSN: 1476-069X Impact factor: 5.984
Simulation results comparing five statistical methods under Scenario 1
| 0.50 | Estimate (ESE) | 0.22 (0.40) | 0.76 (0.62) | 0.40 (0.39) | 0.04 (0.02) | 0.08 (0.18) | |
| Percent included | 51.8% | N/A | 70.8% | N/A | 90.6% | ||
| 0.50 | Estimate (ESE) | 0.25 (0.44) | 0.86 (0.59) | 0.43 (0.43) | 0.05 (0.03) | 0.08 (0.18) | |
| Percent included | 53.0% | N/A | 67.9% | N/A | 90.6% | ||
| 0.20 | Estimate (ESE) | 0.29 (0.11) | 0.02 (0.11) | 0.19 (0.14) | 0.23 (0.11) | 0.16 (0.11) | |
| Percent included | 96.0% | 4.4% | 83.2% | N/A | 82.5% | ||
| Average model size | 3.2 | 4.5 | 3.7 | 10 | 7.1 | ||
Average estimated effects, empirical standard errors, percentage of correct identification of non-zero coefficients, and average model size corresponding to 5 statistical methods in a cross-sectional study with continuous responses and 4 candidate air pollutants. Sample size for each replicate was N=250. The true model size was 3 without accounting for the intercept, and the possible maximum model size was 10. ESE, empirical standard error of the estimate. Results are based on 1000 replicates.
Estimate of the non-zero predictor is calculated as the mean of the products that estimated regression coefficient of this predictor multiplies the indicator function that this predictor is correctly identified during each replication. The percentage of the non-zero predictor quantifies the proportion of correct identification of this predictor over 1000 replicates in each method. 1In BMA, predictors with their posterior probabilities greater than 10% are regarded as identified. 2In DSA, there is no variable selection for main effects as individual exposures are enforced when their interactions are of interest. Identification of interaction refers to the inclusion of interaction term in the cross-validated best predictive model. 3Predictors with their estimated LASSO regression coefficients not equal to zero are considered identified. 4No variable selection has been applied in PLSR because it uses all predictors. 5In SPCA, predictors are identified if their Wald’s statistics from univariate models are larger than a threshold value.
Simulation results under Scenario 2: single step versus two-step strategy
| 0.50 | Estimate (ESE) | 0.93 (0.29) | 0.08 (0.19) | 0.03 (0.01) | 0.03 (0.04) | 0.32 (0.38) | 0.93 (0.29) | 0.35 (0.36) | 0.11 (0.04) | 2.3×10-4 (0.01) | |
| Percent included | N/A | 28.2% | N/A | 98.5% | 65.2% | N/A | 68.3% | N/A | 58.8% | ||
| 0.50 | Estimate (ESE) | 0.75 (0.27) | 0.07 (0.22) | 0.02 (0.01) | 0.02 (0.03) | 0.25 (0.32) | 0.74 (0.25) | 0.33 (0.38) | 0.09 (0.04) | −2.7×10-4 (0.01) | |
| Percent included | N/A | 22.6% | N/A | 94.0% | 63.5% | N/A | 63.8% | N/A | 58.9% | ||
| 0.50 | Estimate (ESE) | 0.88 (0.29) | 0.07 (0.19) | 0.03 (0.02) | 0.02 (0.03) | 0.29 (0.36) | 0.88 (0.25) | 0.36 (0.36) | 0.10 (0.05) | −1.2×10-4 (0.01) | |
| Percent included | N/A | 25.8% | N/A | 96.2% | 63.6% | N/A | 67.4% | N/A | 57.9% | ||
| 0.50 | Estimate (ESE) | 0.71 (0.26) | 0.04 (0.22) | 0.02 (0.01) | 0.01 (0.02) | 0.24 (0.30) | 0.67 (0.26) | 0.32 (0.34) | 0.08 (0.04) | 9.1×10-4 (0.01) | |
| Percent included | N/A | 18.1% | N/A | 82.4% | 65.6% | N/A | 64.3% | N/A | 57.8% | ||
| 0.20 | Estimate (ESE) | 0.002 (0.03) | 0.17 (0.14) | 0.07 (0.04) | 0.07 (0.07) | 0.24 (0.22) | 0.006 (0.06) | 0.21 (0.18) | 0.27 (0.08) | 0.28 (0.13) | |
| Percent included | 0.3% | 79.2% | N/A | 96.3% | 78.4% | 1.1% | 84.0% | N/A | 98.6% | ||
| 0.20 | Estimate (ESE) | 0.003 (0.05) | 0.20 (0.18) | 0.06 (0.03) | 0.05 (0.06) | 0.21 (0.26) | 0.006 (0.08) | 0.22 (0.22) | 0.23 (0.08) | 0.19 (0.11) | |
| Percent included | 0.3% | 77.3% | N/A | 99.0% | 66.7% | 0.9% | 78.1% | N/A | 99.8% | ||
| 0.20 | Estimate (ESE) | 0.002 (0.04) | 0.17 (0.16) | 0.06 (0.03) | 0.03 (0.05) | 0.25 (0.27) | 0.004 (0.05) | 0.21 (0.21) | 0.19 (0.08) | 0.15 (0.11) | |
| Percent included | 0.3% | 74.4% | N/A | 94.1% | 73.3% | 0.5% | 76.9% | N/A | 98.6% | ||
| Average model size | 20.1 | 22.8 | 210 | 79.3 | 6.0 | 4.2 | 6.7 | 10.0 | 8.2 | ||
Average estimated effects, empirical standard errors, percentages of correct identification of non-zero coefficients, and average model size corresponding to four available statistical methods in a cross-sectional study with continuous responses and 20 air pollutants were provide in panel A. Similar results of five statistical methods after an initial CART variable selection using the two-step modeling strategy were summarized in panel B. Sample size for each replicate was N=250. The true model size was 7 without accounting for the intercept, and the possible maximum model size was 210. ESE, empirical standard error of the estimate. Results are based on 1000 replicates.
Estimate of the non-zero predictor is calculated as the mean of the products that estimated regression coefficient of this predictor multiplies the indicator function that this predictor is correctly identified during each replication. The percentage of the non-zero predictor quantifies the proportion of correct identification of this predictor over 1000 replicates in each method. 1In BMA, predictors with their posterior probabilities greater than 10% are regarded as identified. 2In DSA, there is no variable selection for main effects as individual exposures are enforced when their interactions are of interest. Identification of interaction refers to the inclusion of interaction term in the cross-validated best predictive model. 3Predictors with their estimated LASSO regression coefficients not equal to zero are considered identified. 4No variable selection has been applied in PLSR because it uses all predictors. 5In SPCA, predictors are identified if their Wald’s statistics from univariate models are larger than a threshold value.
Simulation results for four statistical methods under Scenario 3
| 0.30 | Estimate (ESE) | 0.26 (0.18) | 0.27 (0.05) | 0.24 (0.07) | 0.0013 (0.0070) | |
| Percent included | 88.5% | 100% | N/A | 5.6% | ||
| 0.30 | Estimate (ESE) | 0.28 (0.15) | 0.27 (0.04) | 0.23 (0.06) | 0.0005 (0.0036) | |
| Percent included | 95.8% | 100% | N/A | 3.8% | ||
| 0.10 | Estimate (ESE) | 0.11 (0.06) | 0.11 (0.01) | 0.10 (0.02) | 0.19 (0.04) | |
| Percent included | 97.7% | 100% | N/A | 100% | ||
| Average model size | 4.5 | 5.4 | 10 | 1.3 | ||
Average estimated effects, empirical standard errors, percentages of correct identification of non-zero coefficients, and average model size corresponding to four statistical methods in a time-series study with count response and 4 air pollutants. Sample size for each replicate was N=400. The true model size was 3 with intercept not counted, and the possible maximum model size was 10. ESE, empirical standard error of the estimate. Results are based on 1000 replicates.
Estimate of the non-zero predictor is calculated as the mean of the products that estimated regression coefficient of this predictor multiplies the indicator function that this predictor is correctly identified during each replication. The percentage of the non-zero predictor quantifies the proportion of correct identification of this predictor over 1000 replicates in each method. 1In BMA, predictors with their posterior probabilities greater than 10% are regarded as identified. 2Predictors with their estimated LASSO regression coefficients not equal to zero are considered identified. 3No variable selection has been applied in PLSR because it uses all predictors. 4In SPCA, predictors are identified if their Wald’s statistics from univariate models are larger than a threshold value.
Simulation results of four statistical methods under Scenario 4
| 0.20 | Estimate (ESE) | 0.19 (0.12) | 0.15 (0.04) | 0.15 (0.03) | 0.006 (0.009) | |
| Percent included | 89.3% | 99.9% | N/A | 44.1% | ||
| 0.20 | Estimate (ESE) | 0.19 (0.09) | 0.15 (0.03) | 0.12 (0.03) | 0.004 (0.006) | |
| Percent included | 94.6% | 99.8% | N/A | 32.8% | ||
| 0.20 | Estimate (ESE) | 0.19 (0.09) | 0.14 (0.03) | 0.12 (0.04) | 0.0006 (0.0014) | |
| Percent included | 95.8% | 99.9% | N/A | 18.2% | ||
| 0.20 | Estimate (ESE) | 0.20 (0.09) | 0.14 (0.03) | 0.08 (0.03) | 0.0001 (0.0006) | |
| Percent included | 94.5% | 99.9% | N/A | 6.2% | ||
| 0.10 | Estimate (ESE) | 0.10 (0.03) | 0.11 (0.01) | 0.10 (0.01) | 0.10 (0.07) | |
| Percent included | 99.2% | 100% | N/A | 97.1% | ||
| 0.10 | Estimate (ESE) | 0.10 (0.03) | 0.11 (0.01) | 0.10 (0.01) | 0.06 (0.05) | |
| Percent included | 99.5% | 100% | N/A | 87.0% | ||
| Average model size | 13.1 | 21.1 | 55 | 9.8 | ||
Average estimated effects, empirical standard errors, percentages of correct identification of non-zero coefficients, and average model size corresponding to four statistical approaches in a time-series study with count response and 10 air pollutants. Sample size for each replicate was N=800. The true model size was 6 with intercept not counted, and the possible maximum model size was 55. ESE, empirical standard error of the estimate. Results are based on 1000 replicates.
Estimate of the non-zero predictor is calculated as the mean of the products that estimated regression coefficient of this predictor multiplies the indicator function that this predictor is correctly identified during each replication. The percentage of the non-zero predictor quantifies the proportion of correct identification of this predictor over 1000 replicates in each method. 1In BMA, predictors with their posterior probabilities greater than 10% are regarded as identified. 2Predictors with their estimated LASSO regression coefficients not equal to zero are considered identified. 3No variable selection has been applied in PLSR because it uses all predictors. 4In SPCA, predictors are identified if their Wald’s statistics from univariate models are larger than a threshold value.
Results of model selection for the NHANES data (2005–2008)
| BMA | OP, EPAR, P8, MEOHP, MiBP | OP*EPAR, OP*MEOHP, P8*MEOHP, MEOHP*MiBP | EPAR, PPAR, MCPP, MEOHP, MiBP, 2,5-DCP | EPAR*PPAR, EPAR*MCPP, EPAR*MiBP, MEOHP*2,5-DCP, |
| LASSO | OP, TCS, EPAR, P8, MCPP, MEOHP, MiBP | OP*EPAR, OP*MEOHP, TCS*EPAR, TCS*MCPP, EPAR*MCPP, P8*MCPP | EPAR, PPAR, P8, MCPP, MEOHP, MiBP, 2,5-DCP, 2,4,5-TCP | EPAR*PPAR, EPAR*P8, EPAR*MCPP, EPAR*MiBP, EPAR*2,5-DCP,
PPAR*2,4,5-TCP, P8*MCPP, MCPP*2,4,5-TCP, MiBP*2,4,5-TCP |
| SPCA | OP, EPAR, P8, MCPP, MEOHP, MiBP | OP*P8, OP*MEOHP, P8*MCPP, P8*MEOHP, P8*MiBP | EPAR, PPAR, P8, MCPP, MEOHP, MiBP, 2,5-DCP, 2,4,5-TCP | EPAR*PPAR, MCPP*2,5-DCP, EPAR*MEOHP, EPAR*MiBP, EPAR*2,5-DCP,
P8*MEOHP, MEOHP*2,5-DCP, P8*2,5-DCP, EPAR*P8 |
| DSA | OP, TCS, EPAR, P8, MCPP, MEOHP, MiBP | N/A | EPAR, PPAR, P8, MCPP, MEOHP, MiBP, 2,5-DCP, 2,4,5-TCP | N/A |
Phthalates: MEHP, mono(2-ethylhexyl) phthalate; MEHHP, mono(2-ethyl-5-hydroxyhexyl) phthalate; MEOHP, mono(2-ethyl-5-oxohexyl) phthalate; MECPP, mono(2-ethyl-5-carboxypentyl) phthalate; MnBP, mono-n-butyl phthalate; MiBP, mono-isobutyl phthalate; MBzP, mono-benzyl phthalate; MEP, mono-ethyl phthalate; MCPP, mono(3-carboxypropyl) phthalate. Phenols: BPA, bisphenol-A; TCS, triclosan; BPAR, butyl paraben; EPAR, ethyl paraben; MPAR, methyl paraben; PPAR, propyl paraben; BP3, benzophenone-3; OP, 4-tert octylphenol. Pesticides: 2,5-DCP, 2,5-dichlorophenol; 2,4-DCP, 2,4-dichlorophenol; OPP, o-phenyl phenol; 2,4,5-TCP, 2,4,5-trichlorophenol; 2,4,6-TCP, 2,4,6-trichlorophenol. Perchlorate and related anions: P8, perchlorate; NO3, nitrate; SCN, thiocyanate.
Results of model selection for the DAMAT data (2004–2006)
| BMA | PM2.5 | PM2.5*SO2 |
| LASSO | CO, PM2.5, SO2 | NO2*PM2.5,
PM2.5*SO2 |
| SPCA | PM2.5, SO2 | PM2.5*SO2, NO2*SO2 |
Glossary of methods with implementation software
| Bayesian model averaging (BMA) | Theory | Madigan and Raftery, 1994 [ | |
| Application | Koop and Tole, 2004 [ | ||
| Deletion/Substitution/Addition (DSA) | Theory | Sinisi and van der Laan, 2004 [ | |
| Application | Mortimer et al., 2008 [ | ||
| Least absolute shrinkage and selection operator (LASSO) | Theory | Tibshirani, 1996 [ | |
| Efron et al., 2004 [ | |||
| Application | Roberts and Martin, 2005 [ | ||
| Partial least-square regression (PLSR) | Theory | Hoeskuldsson, 1988 [ | |
| Application | N/A | ||
| Supervised principal component analysis (SPCA) | Theory | Bair et al., 2006 [ | N/A |
| Application | Roberts and Martin, 2006 [ | ||
| Classification and regression tree (CART) | Theory | Breiman et al., 1984 [ | |
| Application | Hu et al., 2008 [ |