| Literature DB >> 34956304 |
Xi Lu1, Kun Fan1, Jie Ren2, Cen Wu1.
Abstract
In high-throughput genetics studies, an important aim is to identify gene-environment interactions associated with the clinical outcomes. Recently, multiple marginal penalization methods have been developed and shown to be effective in G×E studies. However, within the Bayesian framework, marginal variable selection has not received much attention. In this study, we propose a novel marginal Bayesian variable selection method for G×E studies. In particular, our marginal Bayesian method is robust to data contamination and outliers in the outcome variables. With the incorporation of spike-and-slab priors, we have implemented the Gibbs sampler based on Markov Chain Monte Carlo (MCMC). The proposed method outperforms a number of alternatives in extensive simulation studies. The utility of the marginal robust Bayesian variable selection method has been further demonstrated in the case studies using data from the Nurse Health Study (NHS). Some of the identified main and interaction effects from the real data analysis have important biological implications.Entities:
Keywords: gene-environment interaction; marginal analysis; markov chain monte carlo method; robust Bayesian variable selection; spike-and-slab priors
Year: 2021 PMID: 34956304 PMCID: PMC8693717 DOI: 10.3389/fgene.2021.667074
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Simulation results of the first setting for BL (Bayesian LASSO), BLSS (Bayesian LASSO with spike-and-slab priors), LADBL (LAD Bayesian LASSO), and LADBLSS (LAD Bayesian LASSO with spike-and-slab priors).
|
|
|
|
| ||
|---|---|---|---|---|---|
| Error 1 | AUC | 0.9182 | 0.9901 | 0.9258 | 0.9887 |
| N(0,1) | SD | 0.0052 | 0.0021 | 0.0076 | 0.0026 |
| Error 2 | AUC | 0.8332 | 0.9420 | 0.9004 | 0.9841 |
| SD | 0.0107 | 0.0235 | 0.0078 | 0.0031 | |
| Error 3 | AUC | 0.5343 | 0.5473 | 0.8432 | 0.9558 |
| Lognormal(0,2) | SD | 0.0144 | 0.0576 | 0.0115 | 0.0161 |
| Error 4 | AUC | 0.8221 | 0.9124 | 0.9222 | 0.9895 |
| 90%N(0,1) + 10%Cauchy(0,1) | SD | 0.0212 | 0.0410 | 0.0071 | 0.0024 |
| Error 5 | AUC | 0.7507 | 0.8431 | 0.9192 | 0.9904 |
| 80%N(0,1) + 20%Cauchy(0,1) | SD | 0.0217 | 0.0633 | 0.0059 | 0.0018 |
AUC (mean of AUC) and SD (sd of AUC) based on 100 replicates. n = 200, p = 500, q = 4, and m = 3.
Identification results of the first setting with Top100 method for BL (Bayesian LASSO), BLSS (Bayesian LASSO with spike-and-slab priors), LADBL (LAD Bayesian LASSO) and LADBLSS (LAD Bayesian LASSO with spike-and-slab priors).
|
|
|
| ||
|---|---|---|---|---|
| Error 1 | BL | 7.60(0.49) | 6.80(1.6) | 14.40(1.73) |
| N(0,1) | BLSS | 7.80(0.41) | 10.80(0.92) | 18.60(1.13) |
| LADBL | 7.67(0.55) | 6.53(1.85) | 14.20(1.81) | |
| LADBLSS | 7.76(0.5) | 10.53(1.36) | 18.30(1.49) | |
| Error 2 | BL | 6.37(1.90) | 3.90(2.07) | 10.27(3.19) |
| BLSS | 6.33(1.63) | 8.53(2.46) | 14.87(3.71) | |
| LADBL | 7.43(0.94) | 5.80(1.71) | 13.23(2.01) | |
| LADBLSS | 7.53(0.51) | 9.90(1.56) | 17.43(1.76) | |
| Error 3 | BL | 0.90(1.21) | 0.50(0.97) | 1.40(1.45) |
| Lognormal(0,2) | BLSS | 0.73(0.94) | 0.47(0.68) | 1.20(1.35) |
| LADBL | 6.27(1.55) | 3.67(1.94) | 9.93(2.75) | |
| LADBLSS | 6.10(1.37) | 8.93(2.02) | 15.03(3.09) | |
| Error 4 | BL | 5.57(2.99) | 3.63(2.53) | 9.20(5.05) |
| 90%N(0,1) | BLSS | 6.20(2.62) | 8.30(3.98) | 14.50(6.39) |
| +10%Cauchy(0,1) | LADBL | 7.77(0.43) | 7.00(1.93) | 14.77(1.81) |
| LADBLSS | 7.77(0.57) | 10.67(1.50) | 18.23(1.67) | |
| Error 5 | BL | 5.07(2.89) | 3.00(2.49) | 8.07(5.01) |
| 80%N(0,1) | BLSS | 4.60(3.25) | 5.70(4.23) | 10.30(7.27) |
| +20%Cauchy(0,1) | LADBL | 7.57(0.57) | 6.83(1.07) | 14.40(1.83) |
| LADBLSS | 7.80(0.55) | 10.53(1.36) | 18.33(1.69) |
Mean(sd) based on 100 replicates. n = 200, p = 500, q = 4, and m = 3.
Figure 1Potential scale reduction factor (PSRF) against iterations for the coefficients of the first genetic factors and its interaction with environmental factors in Example 1 under error 3. Black line: the PSRF. Red dotted line: the upper limits of the 95% confidence interval for the PSRF. Blue dotted line: The threshold of 1.1. The represents the estimated coefficients of the main effects for the first genetic factor. The to represent the estimated coefficients of the first three interaction effects for the first genetic factor.
The numbers of main G effects and interactions identified by different approaches and their overlaps for BL (Bayesian LASSO), BLSS (Bayesian LASSO with spike-and-slab priors), LADBL (LAD Bayesian LASSO), and LADBLSS (LAD Bayesian LASSO with spike-and-slab priors).
|
|
|
| ||||||
|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
| |
| BL | 86 | 5 | 6 | 8 | 14 | 14 | 4 | 8 |
| BLSS | 24 | 3 | 6 | 76 | 20 | 23 | ||
| LADBL | 20 | 12 | 80 | 50 | ||||
| LADBLSS | 20 | 80 | ||||||