| Literature DB >> 27045371 |
Chin Lin1, Chi-Ming Chu1,2, Sui-Lung Su1,2.
Abstract
Conventional genome-wide association studies (GWAS) have been proven to be a successful strategy for identifying genetic variants associated with complex human traits. However, there is still a large heritability gap between GWAS and transitional family studies. The "missing heritability" has been suggested to be due to lack of studies focused on epistasis, also called gene-gene interactions, because individual trials have often had insufficient sample size. Meta-analysis is a common method for increasing statistical power. However, sufficient detailed information is difficult to obtain. A previous study employed a meta-regression-based method to detect epistasis, but it faced the challenge of inconsistent estimates. Here, we describe a Markov chain Monte Carlo-based method, called "Epistasis Test in Meta-Analysis" (ETMA), which uses genotype summary data to obtain consistent estimates of epistasis effects in meta-analysis. We defined a series of conditions to generate simulation data and tested the power and type I error rates in ETMA, individual data analysis and conventional meta-regression-based method. ETMA not only successfully facilitated consistency of evidence but also yielded acceptable type I error and higher power than conventional meta-regression. We applied ETMA to three real meta-analysis data sets. We found significant gene-gene interactions in the renin-angiotensin system and the polycyclic aromatic hydrocarbon metabolism pathway, with strong supporting evidence. In addition, glutathione S-transferase (GST) mu 1 and theta 1 were confirmed to exert independent effects on cancer. We concluded that the application of ETMA to real meta-analysis data was successful. Finally, we developed an R package, etma, for the detection of epistasis in meta-analysis [etma is available via the Comprehensive R Archive Network (CRAN) at https://cran.r-project.org/web/packages/etma/index.html].Entities:
Mesh:
Substances:
Year: 2016 PMID: 27045371 PMCID: PMC4821560 DOI: 10.1371/journal.pone.0152891
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Inconsistent estimates of interaction effects in the same data.
This figure describes a meta-regression analysis based on the data from Fang et al. [27] (detailed data are shown in S1 Table). The upper plot describes an investigation of the association between proportions of null/null GSTT1 in cases and the odds ratios of GSTM1 in cancer, and the lower plot describes an investigation of the association between proportions of null/null GSTM1 in cases and the odds ratios of GSTT1 in cancer. The solid lines denote unbiased estimators of odds ratios, and the dashed lines show 95% confidence intervals of odds ratios. According to a previous article, the slopes in meta regression approximate interaction effects [16]. However, the estimates of interaction effect were inconsistent when we exchanged the independent and moderator variables (0.1377 and 0.2338, respectively). This phenomenon does not occur in individual data analysis and leads to problems in interpretation.
Fig 2A typical analysis pipeline of ETMA function in 'etma' package.
This figure summarized the pipeline of ETMA function. The main input is a meta-analysis dataset, which including the number of wild/mutation type of SNP1/SNP2 in case/control group. The main options include the length of chains in step 1/2, the maximum number of iterations, and the start seed. Main outputs include three matrixes. Matrix b includes the beta values (logarithmic ORs) of each SNP and interaction term, and VCOV is the variance covariance matrix of beta value. P is an n by 3 matrix describing three study-specific parameters (p1 = Disease risk in subjects with wild-type alleles of SNP1 and SNP2; p5 = Mutation frequency of SNP1; p6 = Mutation frequency of SNP2)
Summary of simulation conditions.
| OR | OR | ORinteraction | |
|---|---|---|---|
| ~Uniform (0.001, 0.002) | 1.0 | 1.0 | 1.0 |
| ~Uniform (0.01, 0.02) | 1.2 | 1.2 | 1.2 |
| ~Uniform (0.1, 0.2) | 1.5 | ||
| 2.0 |
pbaseline: the disease risk in subjects with major homozygous genotype of SNP1 and SNP2 in each simulated population.
ORy,SNP1: the main effect of SNP1.
ORy,SNP2: the main effect of SNP2.
ORinteraction: gene–gene interaction effect between SNP1 and SNP2.
The proportion of individual with different status of disease/SNP1/SNP2 could be calculated by pbaseline, MAF1, MAF2, ORy,SNP1, ORy,SNP2 and ORinteraction.
| SNP1 | SNP2 | Disease | Proportion in total population |
|---|---|---|---|
| Major homozygous | Major homozygous | Control | (1- |
| Major homozygous | Major homozygous | Case | (1- |
| Major homozygous | Heterogeneous | Control | 2(1- |
| Major homozygous | Heterogeneous | Case | 2(1- |
| Major homozygous | Minor homozygous | Control | (1- |
| Major homozygous | Minor homozygous | Case | (1- |
| Heterogeneous | Major homozygous | Control | 2 |
| Heterogeneous | Major homozygous | Case | 2 |
| Heterogeneous | Heterogeneous | Control | 4 |
| Heterogeneous | Heterogeneous | Case | 4 |
| Heterogeneous | Minor homozygous | Control | 2 |
| Heterogeneous | Minor homozygous | Case | 2 |
| Minor homozygous | Major homozygous | Control | |
| Minor homozygous | Major homozygous | Case | |
| Minor homozygous | Heterogeneous | Control | 2 |
| Minor homozygous | Heterogeneous | Case | 2 |
| Minor homozygous | Minor homozygous | Control | |
| Minor homozygous | Minor homozygous | Case |
pbaseline: the disease risk in subjects with major homozygous genotype of SNP1 and SNP2 in each simulated population; MAF1: the minor allele frequency of SNP1; MAF2: the minor allele frequency of SNP2; ORy,SNP1: the main effect of SNP1; ORy,SNP2: the main effect of SNP2; ORinteraction: gene–gene interaction effect between SNP1 and SNP2.
q1 to q9: the disease prevalence of individuals with different genotype.
q1 = p = (1 + exp(−ln(p /(1 − p))))−1
Type I error of individual data analysis, ETMA and conventional meta-regression.
| Simulation conditions | Individual data analysis | ETMA | Conventional meta-regression | ||
|---|---|---|---|---|---|
| OR | OR | ||||
| ~Uniform (0.001, 0.002) | 1.0 | 1.0 | 0.047 | 0.037 | 0.050 |
| ~Uniform (0.001, 0.002) | 1.2 | 1.0 | 0.039 | 0.039 | 0.054 |
| ~Uniform (0.001, 0.002) | 1.2 | 1.2 | 0.039 | 0.052 | |
| ~Uniform (0.01, 0.02) | 1.0 | 1.0 | 0.047 | 0.037 | 0.050 |
| ~Uniform (0.01, 0.02) | 1.2 | 1.0 | 0.059 | 0.048 | |
| ~Uniform (0.01, 0.02) | 1.2 | 1.2 | 0.047 | 0.047 | |
| ~Uniform (0.1, 0.2) | 1.0 | 1.0 | 0.047 | 0.037 | 0.050 |
| ~Uniform (0.1, 0.2) | 1.2 | 1.0 | 0.055 | 0.052 | 0.059 |
| ~Uniform (0.1, 0.2) | 1.2 | 1.2 | 0.043 | 0.047 | |
pbaseline: the disease risk in subjects with major homozygous genotype of SNP1 and SNP2 in each simulated population.ORy,SNP1: the main effect of SNP1.
ORy,SNP2: the main effect of SNP2.
The bold value denotes a significant difference compared with 0.05 (the 95% confidence interval of type I error is between 0.036 and 0.064). Each data point was based on 1,000 simulations.
Fig 3The statistical power of individual data analysis, ETMA and conventional meta-regression.
The x-axis describes three levels of interaction effect (ORinteraction = 1.2, 1.5 or 2.0), and the y-axis indicates the statistical power provided by individual data analysis (black), ETMA (red) and conventional meta-regression (blue), respectively. The details of these methods are described in the Method. The different subplots present comparisons using different simulation parameters, and the titles of these subplots show their detailed settings. Each data point was based on 1,000 simulations.
The result of real data analysis using ETMA.
| Real data set | OR (95% CI) | p value | |
|---|---|---|---|
| GSTM1 (null type vs. functional type) | 1.110 (1.080–1.141) | <0.0001 | |
| GSTT1 (null type vs. functional type) | 1.125 (1.073–1.180) | <0.0001 | |
| GSTM1×GSTT1 (interaction term) | 0.942 (0.862–1.029) | 0.1814 | |
| CYP1A1 (AC/CC vs. AA) | 0.819 (0.592–1.133) | 0.2008 | |
| GSTM1 (null type vs. functional type) | 0.981 (0.717–1.340) | 0.8915 | |
| CYP1A1×GSTM1 (interaction term) | 2.220 (1.166–4.225) | 0.0201 | |
| ACE (D allele vs. I allele) | 0.921 (0.809–1.049) | 0.2073 | |
| AGT (T allele vs. M allele) | 0.995 (0.884–1.120) | 0.9277 | |
| ACE ×AGT (interaction term) | 1.305 (1.048–1.624) | 0.0188 |