| Literature DB >> 23468652 |
Li Ma1, Andrew G Clark, Alon Keinan.
Abstract
Various methods have been developed for identifying gene-gene interactions in genome-wide association studies (GWAS). However, most methods focus on individual markers as the testing unit, and the large number of such tests drastically erodes statistical power. In this study, we propose novel interaction tests of quantitative traits that are gene-based and that confer advantage in both statistical power and biological interpretation. The framework of gene-based gene-gene interaction (GGG) tests combine marker-based interaction tests between all pairs of markers in two genes to produce a gene-level test for interaction between the two. The tests are based on an analytical formula we derive for the correlation between marker-based interaction tests due to linkage disequilibrium. We propose four GGG tests that extend the following P value combining methods: minimum P value, extended Simes procedure, truncated tail strength, and truncated P value product. Extensive simulations point to correct type I error rates of all tests and show that the two truncated tests are more powerful than the other tests in cases of markers involved in the underlying interaction not being directly genotyped and in cases of multiple underlying interactions. We applied our tests to pairs of genes that exhibit a protein-protein interaction to test for gene-level interactions underlying lipid levels using genotype data from the Atherosclerosis Risk in Communities study. We identified five novel interactions that are not evident from marker-based interaction testing and successfully replicated one of these interactions, between SMAD3 and NEDD9, in an independent sample from the Multi-Ethnic Study of Atherosclerosis. We conclude that our GGG tests show improved power to identify gene-level interactions in existing, as well as emerging, association studies.Entities:
Mesh:
Year: 2013 PMID: 23468652 PMCID: PMC3585009 DOI: 10.1371/journal.pgen.1003321
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
Figure 1Graphical illustration of the framework of gene-based single-marker test and its generalization to a gene-based gene–gene interaction (GGG) test as proposed in this paper.
While the former considers the P values of each single-marker test (A), a GGG test (B) is based on all P values of an interaction test between of markers from each of the two genes. In order to combine these pairwise P values into a single test, a correlation matrix that concurrently accounts for linkage disequilibrium in each of the two genes needs to be estimated, which we derive in Materials and Methods.
Empirical, simulation-based type I error rates of proposed GGG tests.
|
| α | GG_PC | GG_minP | GG_GATES | GG_tTS | GG_tProd |
| 1000 | 0.05 | 0.0506 | 0.0502 | 0.0564 | 0.0521 | 0.0492 |
| 0.01 | 0.0101 | 0.0099 | 0.0105 | 0.0113 | 0.0094 | |
| 2000 | 0.05 | 0.0496 | 0.0474 | 0.0531 | 0.0508 | 0.0452 |
| 0.01 | 0.0092 | 0.0087 | 0.0091 | 0.0117 | 0.0088 | |
| 3000 | 0.05 | 0.0504 | 0.0493 | 0.0557 | 0.0489 | 0.0528 |
| 0.01 | 0.0087 | 0.0082 | 0.0088 | 0.0099 | 0.0120 | |
| 5000 | 0.05 | 0.0506 | 0.0485 | 0.0564 | 0.0511 | 0.0495 |
| 0.01 | 0.0103 | 0.0086 | 0.0090 | 0.0096 | 0.0098 |
Empirical, simulation-based statistical power of GGG tests.
| Simulation number | Interacting SNP-pairs | Type | MAFs | Effect size |
| Power | ||||
| GG_PC | GG_minP | GG_GATES | GG_tTS | GG_tProd | ||||||
| 1 | 30-15 | U-U | .45-.48 | 0.15 | 1k | 14.3 | 31.0 | 34.8 | 47.2 | 47.0 |
| 2k | 27.5 | 60.0 | 65.2 | 76.3 | 76.0 | |||||
| 3k | 43.9 | 81.8 | 84.4 | 90.6 | 90.6 | |||||
| 5k | 73.3 | 94.5 | 98.0 | 98.3 | 99.3 | |||||
| 2 | 30-17 | U-O | .45-.39 | 0.15 | 1k | 14.1 | 30.4 | 33.6 | 45.5 | 45.4 |
| 2k | 27.1 | 60.9 | 64.5 | 73.5 | 73.4 | |||||
| 3k | 44.5 | 83.6 | 85.1 | 88.9 | 89.0 | |||||
| 5k | 75.1 | 93.4 | 98.2 | 98.2 | 98.9 | |||||
| 3 | 29-17 | O-O | .10-.39 | 0.15 | 1k | 7.0 | 10.2 | 11.0 | 8.4 | 8.5 |
| 2k | 10.9 | 16.8 | 19.4 | 13.6 | 14.3 | |||||
| 3k | 14.2 | 28.6 | 30.4 | 20.5 | 21.2 | |||||
| 5k | 21.5 | 51.4 | 52.9 | 33.7 | 35.0 | |||||
| 4 | 29-17 | O-O | .10-.39 | 0.25 | 1k | 13.1 | 25.3 | 27.0 | 18.4 | 19.1 |
| 2k | 27.6 | 56.0 | 57.9 | 35.7 | 38.8 | |||||
| 3k | 40.8 | 81.4 | 82.6 | 52.8 | 60.4 | |||||
| 5k | 69.3 | 97.7 | 98.0 | 75.8 | 85.9 | |||||
| 5 | 30-15, 40-20, 48-27 | U-U | .45-.48, .41-.34, .30-.43 | 0.12 | 1k | 17.8 | 33.5 | 37.0 | 49.3 | 49.1 |
| 2k | 38.8 | 64.3 | 69.3 | 80.1 | 79.9 | |||||
| 3k | 57.7 | 83.9 | 86.5 | 92.2 | 92.0 | |||||
| 5k | 87.2 | 95.3 | 97.9 | 99.4 | 98.5 | |||||
| 6 | 30-15, 40-20, 48-27 | U-U | .45-.48, .41-.34, .30-.43 | 0.15 | 1k | 28.3 | 51.5 | 55.8 | 68.1 | 68.2 |
| 2k | 60.5 | 85.8 | 88.2 | 94.3 | 94.3 | |||||
| 3k | 83.7 | 97.4 | 98.2 | 99.4 | 99.4 | |||||
| 5k | 98.7 | 99.9 | 99.9 | 100 | 100 | |||||
| 7 | 29-17, 39-22, 47-25 | O-O | .10-.39, .41-.38, .29-.44 | 0.12 | 1k | 18.8 | 42.8 | 47.1 | 54.3 | 54.3 |
| 2k | 39.3 | 77.5 | 81.0 | 84.6 | 84.7 | |||||
| 3k | 57.5 | 93.7 | 94.5 | 95.9 | 95.7 | |||||
| 5k | 88.3 | 98.8 | 99.9 | 99.9 | 100 | |||||
| 8 | 10-5, 20-10, 30-15, 40-20, 48-27 | U-U | .10-.39, .10-.44, .45-.48, .41-.34, .30-.43 | 0.12 | 1k | 23.1 | 34.8 | 38.1 | 43.6 | 43.5 |
| 2k | 51.1 | 67.5 | 71.8 | 79.0 | 79.0 | |||||
| 3k | 73.7 | 86.2 | 89.1 | 93.0 | 93.0 | |||||
| 5k | 96.4 | 96.6 | 98.7 | 99.2 | 99.7 | |||||
| 9 | 6-4, 19-9, 29-17,39-22, 47-25 | O-O | .12-.49, .32-.47, .10-.39, .41-.38, .29-.44 | 0.12 | 1k | 54.9 | 89.7 | 92.1 | 96.3 | 96.3 |
| 2k | 92.4 | 99.7 | 99.8 | 100 | 100 | |||||
Indices of the interaction SNPs in the two loci (Figure S3); Three types of scenarios are considered, of one, three, and five pairs of interacting SNPs, for each at least two different sets of SNPs are considered, for a total of 7 different scenarios.
U: SNP untyped; O: SNP observed; For scenarios with more than one pair of interacting SNPs, the U/O status of the first and second interacting SNP is the same across all pairs.
Minor allele frequency of all SNPs involved in interactions, by order.
Coefficient of the interaction term in the linear model, b 3, as described in Equation (1); For scenarios with more than one pair of interacting SNPs, the effect size is the same for all pairs.
Power, as percentage of significant tests with P value<0.05.
Figure 2Average power of GGG tests summarized from Table 2.
For each simulation scenario from Table 2, average power for each type of test is presented as an average across the different sample sizes (n) reported in Table 2. The method that collapses markers in each of the two genes, GG_PC, is least powerful in all simulation scenarios. Among the four GGG tests that combine P values, GG_minP and GG_GATES are more powerful only in simulation scenarios 3 and 4, which are the only cases that we simulated a single marker-by-marker interaction with both markers available for analysis (denoted by O-O in Table 2). GG_tTS and GG_tProd are most powerful in all other simulation scenarios.
Significant (P<9.4×10−7; bolded) gene-level interactions affecting total cholesterol (TC) and high-density lipoprotein cholesterol (HDL-C) levels in data from the ARIC study.
| Trait | Gene 1 | Gene 2 |
| |||
| GG_minP | GG_GATES | GG_tTS | GG_tProd | |||
| TC |
|
| 1.8×10−2 | 2.3×10−3 |
| 1.9×10−4 |
|
|
| 6.5×10−2 | 3.9×10−3 |
| 3.5×10−6 | |
| HDL-C |
|
| 2.5×10−2 | 1.2×10−2 |
|
|
|
|
| 6.2×10−3 | 8.2×10−4 |
| 2.2×10−4 | |
|
|
| 3.7×10−3 | 2.2×10−4 |
| 2.1×10−6 | |
The interaction between SMAD3 and NEDD9 on HDL-C levels was further replicated in data from the MESA study (multiple testing corrected P c = 0.01 for GG_tProd and P c = 0.05 for GG_tTS).