| Literature DB >> 32937825 |
Sungkyoung Choi1, Sungyoung Lee2, Iksoo Huh3, Heungsun Hwang4, Taesung Park5,6.
Abstract
Gene-environment interaction (G×E) studies are one of the most important solutions for understanding the "missing heritability" problem in genome-wide association studies (GWAS). Although many statistical methods have been proposed for detecting and identifying G×E, most employ single nucleotide polymorphism (SNP)-level analysis. In this study, we propose a new statistical method, Hierarchical structural CoMponent analysis of gene-based Gene-Environment interactions (HisCoM-G×E). HisCoM-G×E is based on the hierarchical structural relationship among all SNPs within a gene, and can accommodate all possible SNP-level effects into a single latent variable, by imposing a ridge penalty, and thus more efficiently takes into account the latent interaction term of G×E. The performance of the proposed method was evaluated in simulation studies, and we applied the proposed method to investigate gene-alcohol intake interactions affecting systolic blood pressure (SBP), using samples from the Korea Associated REsource (KARE) consortium data.Entities:
Keywords: generalized structured component analysis (GSCA); gene–environment interactions; genome-wide association study (GWAS)
Year: 2020 PMID: 32937825 PMCID: PMC7555026 DOI: 10.3390/ijms21186724
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1Empirical power estimates as effect size of G×E when the gene sizes are (a) 5 SNPs and (b) 20 SNPs. The empirical power estimates were calculated with 1000 replicates. Results for each method can be distinguished by plotting colors. Each bar is the mean and the error bars represent standard deviation (SD).
Significant (p < 0.001; bold) gene–environment interactions that affect systolic blood pressure (SBP), according to the interaction sequence kernel association test (iSKAT), the gene–environment gene-based association test using extended Simes (GE_GATES), and HisCoM-G×E, using the Korea Associated REsource (KARE) genome-wide association studies (GWAS) dataset.
| No | CHR | GENE | # of SNPs | iSKAT | GE_GATES | HisCoM-G×E |
|---|---|---|---|---|---|---|
| 1 | 1 |
| 54 | 0.5697 | 0.9902 |
|
| 2 | 19 |
| 7 | 0.2587 | 0.2491 |
|
| 3 | 4 |
| 116 | 0.0285 | 0.2064 |
|
| 4 | 4 |
| 90 | 0.0141 | 0.0518 |
|
| 5 | 8 |
| 91 | 0.1227 | 0.1713 |
|
| 6 | 2 |
| 8 | 0.1900 | 0.9966 |
|
| 7 | 9 |
| 120 | 0.4700 | 0.6905 |
|
| 8 | 1 |
| 107 | 0.7512 | 0.9398 |
|
| 9 | 9 |
| 78 | 0.6676 | 0.4259 |
|
| 10 | 1 |
| 119 | 0.7613 | 0.9911 |
|
| 11 | 3 |
| 89 | 0.1935 | 0.2530 |
|
| 12 | 19 |
| 41 |
| 0.0262 | 0.0182 |
| 13 | 17 |
| 154 |
| 0.0051 | 0.0220 |
| 14 | 19 |
| 11 | 0.3003 |
| 0.0670 |
| 15 | 13 |
| 313 |
| 0.5510 | 0.0950 |
| 16 | 8 |
| 107 | 0.0027 |
| 0.0974 |
| 17 | 16 |
| 38 |
| 0.0011 | 0.1555 |
| 18 | 16 |
| 60 |
| 0.0010 | 0.1887 |
| 19 | 16 |
| 69 |
| 0.0014 | 0.2463 |
| 20 | 11 |
| 62 | 0.8119 |
| 0.2515 |
| 21 | 14 |
| 1199 |
| 0.1166 | 0.2515 |
| 22 | 11 |
| 53 |
| 0.0070 | 0.2633 |
| 23 | 16 |
| 68 |
| 0.0015 | 0.2787 |
| 24 | 4 |
| 1270 |
|
| 0.3954 |
| 25 | 11 |
| 134 |
| 0.0341 | 0.4232 |
| 26 | 21 |
| 230 |
| 0.0060 | 0.4502 |
| 27 | 16 |
| 82 |
|
| 0.4794 |
| 28 | 16 |
| 131 | 0.0393 |
| 0.5110 |
| 29 | 1 |
| 1212 |
| 0.2535 | 0.5270 |
| 30 | 7 |
| 61 |
| 0.1369 | 0.5700 |
| 31 | 19 |
| 90 |
| 0.0154 | 0.5736 |
| 32 | 14 |
| 40 |
| 0.0249 | 0.8029 |
| 33 | 2 |
| 50 |
| 0.1123 | 0.8119 |
| 34 | 8 |
| 134 |
| 0.0020 | 0.8379 |
| 35 | 4 |
| 57 | 0.5246 |
| 0.8798 |
Figure 2A schematic diagram of Hierarchical structural CoMponent analysis of gene-based Gene–Environment interactions (HisCoM-G×E). The exemplary model is described with the number of single nucleotide polymorphisms (SNPs) (x) K = 2 and the number of covariates (c) P = 2. The variables ws denote the weights assigned to the latent variables and βs are coefficients of the latent variables (g and e). The g × e term represents a latent interaction term (or effect) on the phenotype (y) and ε is the error term.
Figure 3Linkage disequilibrium (LD) patterns of genes used in simulation studies: (a) TNFRSG10C gene (K = 5); (b) TNFRSF10C gene (K = 50); (c) PLA2G4C gene (K = 100). Numbers indicate the Dʹ values expressed as percentiles. A standard color scheme is used to display the LD pattern, with red for perfect LD (r2 = 1), white for no LD (r2 = 0) and shades of pink/red for intermediate LD (0 < r2 < 1).