| Literature DB >> 28709428 |
Jose Barrera-Gómez1,2,3, Lydiane Agier4, Lützen Portengen5, Marc Chadeau-Hyam6, Lise Giorgis-Allemand4, Valérie Siroux4, Oliver Robinson1,2,3,6, Jelle Vlaanderen5, Juan R González1,2,3, Mark Nieuwenhuijsen1,2,3, Paolo Vineis7, Martine Vrijheid1,2,3, Roel Vermeulen5,6, Rémy Slama4, Xavier Basagaña8,9,10.
Abstract
BACKGROUND: There is growing interest in examining the simultaneous effects of multiple exposures and, more generally, the effects of mixtures of exposures, as part of the exposome concept (being defined as the totality of human environmental exposures from conception onwards). Uncovering such combined effects is challenging owing to the large number of exposures, several of them being highly correlated. We performed a simulation study in an exposome context to compare the performance of several statistical methods that have been proposed to detect statistical interactions.Entities:
Keywords: Exposome; Interactions; Variable selection
Mesh:
Substances:
Year: 2017 PMID: 28709428 PMCID: PMC5513197 DOI: 10.1186/s12940-017-0277-6
Source DB: PubMed Journal: Environ Health ISSN: 1476-069X Impact factor: 5.984
Scenarios used to generate the data a
| Subscenario | Adjusted | Pairwise corr. b | Interaction size (and sign) | Parameters c | |
|---|---|---|---|---|---|
| Scenario 1. True model: | |||||
| 1a | 0.10 (0.07, 0.16) | Mixed |
| ||
| 1b | 0.30 (0.23, 0.39) | Mixed |
| ||
| 1c | 0.11 (0.09, 0.12) | High |
| ||
| 1d | 0.27 (0.25, 0.28) | High |
| ||
| Scenario 2. True model: | |||||
| 2a | 0.09 (0.07, 0.14) | Mixed | Strong (+) |
|
|
| 2b | 0.09 (0.06, 0.15) | Mixed | Strong (−) |
|
|
| 2c | 0.10 (0.06, 0.15) | Mixed | Moderate (+) |
|
|
| 2d | 0.10 (0.07, 0.14) | Mixed | Moderate (−) |
|
|
| 2e | 0.13 (0.11, 0.14) | High | Strong (+) |
|
|
| 2f | 0.13 (0.11, 0.15) | High | Strong (−) |
|
|
| 2g | 0.30 (0.28, 0.32) | High | Moderate (+) |
|
|
| 2h | 0.30 (0.28, 0.32) | High | Moderate (−) |
|
|
| Scenario 3. True model: | |||||
| 3a | 0.11 (0.08, 0.15) | Mixed | Strong (+) |
|
|
| 3b | 0.10 (0.08, 0.16) | Mixed | Strong (−) |
|
|
| 3c | 0.10 (0.06, 0.14) | Mixed | Moderate (+) |
|
|
| 3d | 0.10 (0.07, 0.14) | Mixed | Moderate (−) |
|
|
| 3e | 0.29 (0.27, 0.32) | High | Strong (+) |
|
|
| 3f | 0.29 (0.27, 0.31) | High | Strong (−) |
|
|
| 3g | 0.31 (0.29, 0.33) | High | Moderate (+) |
|
|
| 3h | 0.31 (0.28, 0.33) | High | Moderate (−) |
|
|
aIn each of the three scenarios, the outcome Y was generated as Y=F(E)+ε, where F(E) is a function of the predictors X 1,…,X 5, and ε∼N(0,σ). In each scenario, subscenarios were considered according to the pairwise correlation of the predictors (“Mixed”, when selecting the predictors among the whole exposome, in which case the absolute pairwise correlation ranged from 0.0000 to 1.0000; or “High”, when selecting the predictors among the subset of the 13 variables in the exposome for which all absolute pairwise correlations were 0.62 or higher); the size of the interaction terms (“Strong”, corresponding to equal size than the main effects size; or “Moderate”, corresponding to size 1/2 of the “Strong”), and the sign of the interaction terms (+ or −). Values for the adjusted R 2 correspond to the mean and percentiles 2.5th and 97.5th as a result of fitting the model to 100 simulated datasets. bThe median of the mean pairwise correlation between the true predictors was 0.12 (percentiles 2.5th and 97.5th: (0.05, 0.25)) for “Mixed”, and 0.78 (percentiles 2.5th and 97.5th: (0.72, 0.87)) for “High”. The median of the mean pairwise correlation between the true predictors and the other exposures was 0.13 (percentiles 2.5th and 97.5th: (0.09, 0.16)) for “Mixed”, and 0.18 (percentiles 2.5th and 97.5th: (0.17, 0.19)) for “High”. cIn all scenarios, β 0=β 1=⋯=β 5=1
Characteristics and performance measures available for each method
| Feature | EWAS2 | DSA1 | DSA2 | Sun3step | LASSO | GLINTERNET | BRT |
|---|---|---|---|---|---|---|---|
| Model structure | |||||||
| Provides regression coefficients | ✓ | ✓ | ✓ | ✓ | ✓ | ||
| Able to include interaction terms | ✓ | ✓ | ✓ | ✓ | ✓ | ||
| Able to include confounder covariates | ✓ | ✓ | ✓ | ✓ | ✓ | ||
| Able to capture non-linear associations | ✓ | ✓ | ✓ | ||||
| Measures of performance | |||||||
| RMS | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |
| RNV | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
|
| ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |
| Sens | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |
| AltSens | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| Sensvar | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| Sens2 | ✓ | ✓ | ✓ | ✓ | |||
| AltSens2 | ✓ | ✓ | ✓ | ✓ | |||
| FDP | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |
| AltFDP | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| FDPvar | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| FDP2 | ✓ | ✓ | ✓ | ✓ | |||
| AltFDP2 | ✓ | ✓ | ✓ | ✓ | |||
Fig. 1Performance of the compared methods in terms of number of variables in the fitted model and predictive ability. a Relative number of variables (RNV), in log scale, and b Relative out-of-sample R 2 (). Both measures are relative such that the true model corresponds to the value 1. Mean values based on 100 simulations. The vertical line separates scenarios according to the pairwise correlation between the true predictors as “Mixed” (any exposure can be selected as a true predictor regardless of correlation), or “High” (exposures are chosen so that all their pairwise correlations are above 0.6). Scenarios 1, 2 and 3 involve no interactions, one two-way interaction, and two two-way interactions, respectively
Fig. 2Performance of the compared methods in terms of sensitivity. a Sensitivity for variables (Sensvar), b Alternative sensitivity (AltSens), and c Sensitivity for interactions terms (Sens2). Mean values based on 100 simulations. The vertical line separates scenarios according to the pairwise correlation between the true predictors as “Mixed” (any exposure can be selected as a true predictor regardless of correlation), or “High” (exposures are chosen so that all their pairwise correlations are above 0.6). Scenarios 1, 2 and 3 involve no interactions, one two-way interaction, and two two-way interactions, respectively
Fig. 3Performance of the compared methods in terms of specificity. a False discovery proportion for variables (FDPvar), b Alternative false discovery proportion (AltFDP), and c False discovery proportion for interaction terms (FDP2). Mean values based on 100 simulations. The vertical line separates scenarios according to the pairwise correlation between the true predictors as “Mixed” (any exposure can be selected as a true predictor regardless of correlation), or “High” (exposures are chosen so that all their pairwise correlations are above 0.6). Scenarios 1, 2 and 3 involve no interactions, one two-way interaction, and two two-way interactions, respectively
Cost of testing for interactions in cases where they do not exist a
| Scenario 1a | Scenario 1b | Scenario 1c | Scenario 1d | |||||
|---|---|---|---|---|---|---|---|---|
| Ratio of measures | Sens | FDP | Sens | FDP | Sens | FDP | Sens | FDP |
| DSA2 to DSA1 | 1.00 | 0.98 | 1.00 | 0.96 | 0.96 | 1.07 | 1.02 | 0.94 |
| GLINTERNET to LASSO | 0.88 | 0.81 | 0.97 | 0.75 | 0.89 | 0.75 | 0.93 | 0.77 |
aRestricted to methods having a version for main effects only and a version for main effects and interactions. Figures in the table represent the ratio of performance measure between the version looking only for main effects (denominator) and the version looking also for interaction terms (numerator)