| Literature DB >> 28953253 |
Mauricio A Mazo Lopera1,2, Brandon J Coombes3, Mariza de Andrade4.
Abstract
Gene-environment (GE) interaction has important implications in the etiology of complex diseases that are caused by a combination of genetic factors and environment variables. Several authors have developed GE analysis in the context of independent subjects or longitudinal data using a gene-set. In this paper, we propose to analyze GE interaction for discrete and continuous phenotypes in family studies by incorporating the relatedness among the relatives for each family into a generalized linear mixed model (GLMM) and by using a gene-based variance component test. In addition, we deal with collinearity problems arising from linkage disequilibrium among single nucleotide polymorphisms (SNPs) by considering their coefficients as random effects under the null model estimation. We show that the best linear unbiased predictor (BLUP) of such random effects in the GLMM is equivalent to the ridge regression estimator. This equivalence provides a simple method to estimate the ridge penalty parameter in comparison to other computationally-demanding estimation approaches based on cross-validation schemes. We evaluated the proposed test using simulation studies and applied it to real data from the Baependi Heart Study consisting of 76 families. Using our approach, we identified an interaction between BMI and the Peroxisome Proliferator Activated Receptor Gamma (PPARG) gene associated with diabetes.Entities:
Keywords: best linear unbiased predictor; family data; gene-environment interaction; generalized linear mixed model; ridge regression; score test; variance component test
Mesh:
Year: 2017 PMID: 28953253 PMCID: PMC5664635 DOI: 10.3390/ijerph14101134
Source DB: PubMed Journal: Int J Environ Res Public Health ISSN: 1660-4601 Impact factor: 3.390
Figure 1Pedigree for simulated data. Circle = Female, Square = Male.
Figure 2Correlation of the 50 simulated single nucleotide pollymorphisms (SNPs) in linkage disequilibrium (LD). The vertical color line to the right indicates the level of correlation between the SNPs. The dark blue (red) means high positive (high negative) correlation and the light blue (red) means low positive (low negative) correlation.
Type I error at 0.05 -level for each method depending on the number of SNPs q and whether the SNPs are independent or in LD. Two of the SNPs in each scenario have a main effect.
| SNPs Category | Score | MinP | VCT | ||||
|---|---|---|---|---|---|---|---|
| Independent | 5 | 1.247 | 0.034 | 29.4 | 0.020 | 0.031 | 0.034 |
| 10 | 1.240 | 0.017 | 58.8 | 0.023 | 0.026 | 0.024 | |
| 50 | 1.222 | 0.003 | 333 | 0.004 | 0.022 | 0.014 | |
| LD | 5 | 1.243 | 0.021 | 47.6 | 0.025 | 0.026 | 0.031 |
| 10 | 1.239 | 0.009 | 111 | 0.004 | 0.034 | 0.030 | |
| 50 | 1.222 | 0.002 | 500 | 0.000 | 0.028 | 0.022 |
Figure 3Empirical power at 0.05 -level of the methods for q independent SNPs of which two of the SNPs have a main effect. The same two SNPs have an equal interaction with the environment. (a) independent SNPs; (b) independent SNPs; (c) independent SNPs.
Figure 4Empirical power at 0.05 -level of the methods for q correlated SNPs of which two of the SNPs have a main effect. The same two SNPs have an equal interaction with the environment. (a) correlated SNPs; (b) correlated SNPs; (c) correlated SNPs.
Summary of cases per subjects and families.
| Gene | Subjects | Families | ||||
|---|---|---|---|---|---|---|
| Control | Cases | Total | Control | Cases | Total | |
| PPARG | 845 | 83 | 928 | 43 | 42 | 85 |
| FTO | 712 | 71 | 783 | 47 | 38 | 85 |
| CDKAL1 | 661 | 69 | 730 | 47 | 38 | 85 |
Sample size, GLMM parameters, p-values and execution times for the analysis of the Baependi dataset.
| Gene | SNPs | Total Subjects | GLMM Parameters | Test | Time (s) | ||||
|---|---|---|---|---|---|---|---|---|---|
| VCT | 0.028 | 0.05 | 18.420 | ||||||
| PPARG | 16 | 928 | 0.4463 | 0.0029 | 344.8276 | MinP | 0.019 * | 0.005 | 100.261 |
| Score | 0.595 | 0.05 | 9.025 | ||||||
| VCT | 0.451 | 0.05 | 12.958 | ||||||
| FTO | 149 | 783 | 0.3710 | 0.0033 | 303.0303 | MinP | 0.031 * | 0.0005 | 2675.907 |
| Score | 0.992 | 0.05 | 6.197 | ||||||
| VCT | 0.635 | 0.05 | 9.907 | ||||||
| CDKAL1 | 186 | 730 | 0.0918 | 0.0111 | 90.0901 | MinP | 0.040 * | 0.0005 | 1755.881 |
| Score | 0.915 | 0.05 | 4.257 | ||||||
* Compare MinP test p-value with the corresponding corrected , obtained by dividing 0.05 for the number of effective SNPs (which is equivalent to the number of principal components that reach 99.5% of the their total variation): 10 for PPARG, 93 for FTO and 92 for CDKAL1.