Zihan Zhao1, Jianjun Zhang2, Qiuying Sha3, Han Hao2. 1. Texas Academy of Mathematics & Science, University of North Texas, Denton, TX, United States of America. 2. Department of Mathematics, University of North Texas, Denton, TX, United States of America. 3. Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, United States of America.
Abstract
The risk of many complex diseases is determined by a complex interplay of genetic and environmental factors. Advanced next generation sequencing technology makes identification of gene-environment (GE) interactions for both common and rare variants possible. However, most existing methods focus on testing the main effects of common and/or rare genetic variants. There are limited methods developed to test the effects of GE interactions for rare variants only or rare and common variants simultaneously. In this study, we develop novel approaches to test the effects of GE interactions of rare and/or common risk, and/or protective variants in sequencing association studies. We propose two approaches: 1) testing the effects of an optimally weighted combination of GE interactions for rare variants (TOW-GE); 2) testing the effects of a weighted combination of GE interactions for both rare and common variants (variable weight TOW-GE, VW-TOW-GE). Extensive simulation studies based on the Genetic Analysis Workshop 17 data show that the type I error rates of the proposed methods are well controlled. Compared to the existing interaction sequence kernel association test (ISKAT), TOW-GE is more powerful when there are GE interactions' effects for rare risk and/or protective variants; VW-TOW-GE is more powerful when there are GE interactions' effects for both rare and common risk and protective variants. Both TOW-GE and VW-TOW-GE are robust to the directions of effects of causal GE interactions. We demonstrate the applications of TOW-GE and VW-TOW-GE using an imputed data from the COPDGene Study.
The risk of many complex diseases is determined by a complex interplay of genetic and environmental factors. Advanced next generation sequencing technology makes identification of gene-environment (GE) interactions for both common and rare variants possible. However, most existing methods focus on testing the main effects of common and/or rare genetic variants. There are limited methods developed to test the effects of GE interactions for rare variants only or rare and common variants simultaneously. In this study, we develop novel approaches to test the effects of GE interactions of rare and/or common risk, and/or protective variants in sequencing association studies. We propose two approaches: 1) testing the effects of an optimally weighted combination of GE interactions for rare variants (TOW-GE); 2) testing the effects of a weighted combination of GE interactions for both rare and common variants (variable weight TOW-GE, VW-TOW-GE). Extensive simulation studies based on the Genetic Analysis Workshop 17 data show that the type I error rates of the proposed methods are well controlled. Compared to the existing interaction sequence kernel association test (ISKAT), TOW-GE is more powerful when there are GE interactions' effects for rare risk and/or protective variants; VW-TOW-GE is more powerful when there are GE interactions' effects for both rare and common risk and protective variants. Both TOW-GE and VW-TOW-GE are robust to the directions of effects of causal GE interactions. We demonstrate the applications of TOW-GE and VW-TOW-GE using an imputed data from the COPDGene Study.
The etiology of many diseases is characterized by the interplay between genetic and environment factors. For example, anthracyclines are one of the most effective classes of chemotherapeutic agents currently available for cancer treatment. The therapeutic potential of anthracyclines, however, is limited because of their strong dose-dependent relation with progressive and irreversible cardiomyopathy leading to congestive heart failure. Both gene hyaluronan synthase 3 (HAS3) and gene CUGBP Elav-like family member 4 (CELF4) modify the risk of anthracycline on the development of anthracycline-related cardiomyopathy [1, 2]. A genome-wide gene environment interaction analysis indicates that gene EBF1 plays together with stress associated with cardiovascular disease. Additionally, gene EBF1 not only shows gene-by-stress interaction effect for hip circumference but also indicates gene-by-stress interaction effects for waist circumference, body mass index (BMI), fasting glucose, type II diabetes, and common carotid intimal medial thickness (CCIMT) [3].To date, most of the successful findings in gene-environment (GE) interactions are for common genetic variants. There has been very limited success in findings for rare variants’ GE interactions. This is often attributed to study design issues, such as sample size or population heterogeneity [4]. Lack of statistical methodology on rare variants’ GE also contributes to the limitations.Rare variants, which are usually defined as genetic variants with minor allele frequency (MAF) less than 5% (or 1%), may play an important role in studying the etiology of complex human diseases. Numerous statistical methods have been developed for testing the main effects of rare variants, such as the sequence kernel association test (SKAT) [5], the combined multivariate and collapsing (CMC) method [6], the weighted sum statistic (WSS) [7], and Testing the effect of an Optimally Weighted combination of variants (TOW) [8].To our knowledge, limited methods have been developed for testing GE interactions in sequencing association studies. Existing methods for assessing common variants by environment interactions, such as the gene-environment interactions association test (GESAT) [9] are less powerful when naively applied to rare variants [10]. To test rare variants by environment interactions, [10] developed the interaction sequence kernel association test (ISKAT) to assess the effects of rare variants by environment interactions. As ISKAT considers the special weights Beta(MAF;1, 25), the beta distribution density function with parameters 1 and 25 evaluated at the sample MAF, which is the recommended weight for ISKAT when there is no prior information, ISKAT may lose power when the MAFs of causal variants are not in the range (0.01,0.035) [11].In this article, to test for rare and/or common variants and environment interactions in sequencing association studies, we develop two novel methods: 1) Testing the Optimally weighted combination of GE interactions for rare variants (TOW-GE); 2) testing effects of weighted combination of GE interactions for both rare and common variants (variable weight TOW-GE, refer to this statistic as VW-TOW-GE). Both TOW-GE and VW-TOW-GE are robust to directions of effects of causal GE interactions. We evaluate the performance of the proposed methods via simulation studies and real data analysis using the imputed sequencing data from the COPDGene Study.
Methods
Consider n unrelated individuals sequenced in a testing region with m genetic variants. In the testing region, we are interested in testing the effects of p rare variants (p < m) by environment interactions on a trait, which can be a quantitative or a qualitative trait. For ease of presentation, we only consider a single environmental factor. The method can be easily extended to the case when there are multiple environmental factors. For individuals i = 1, …, n, let y denote the trait, = (x, …, x) denote the q covariates, = (g, …, g) denote genotypes for the p rare variants in a genomic region (a gene or a pathway) and E as the environmental factor. Let = (E
g, …, E
g) be a vector of variants by environment interaction terms for the i individual.We use the generalized linear model (GLM) to model the relationship between the trait values y and covariates , genotypes , environmental factor E and GE interactions, :
where g(⋅) is a canonical link function. Two commonly used models under the generalized linear model framework are the linear model with the identity link for a continuous or quantitative trait, and the logistic regression model with the Logit link for a binary trait. 1, α2, 3, and are defined as q × 1 coefficient vector of covariate, the coefficient of environmental factor, p × 1 coefficient vector of genotype and p × 1 coefficient vector of GE interactions for the i individual and the trait, respectively. Let and = (1, α2, 3). Testing the association between the trait and the rare variants by environment interactions is equivalent to testing the null hypothesis H0: = 0.We develop a score test by treating as nuisance parameters and then adjust both the trait value y and for the covariates , the genotypic score , and the environmental variable E by applying linear regression. Denote as the residual of y and as the residual of , regressed on . Then, the relationship between and can be modeled by the GLM:Testing H0: = 0 in (1) is equivalent to testing H0: * = 0 in (2) (Sha et al., [8]). Here, we utilize a weight selection scheme proposed by Sha et al. [8] on our new model to test the effect of a weighted combination of GE, . Following Sha et al. [12], we propose the following score test statistic under the generalized linear model:Because GE interactions for rare variants are essentially independent, we have:Thus, as a function of (w1, …, w), the score test statistic S(w1, …, w) reaches its maximum when and .Similarly, we define the statistic to Test the effect of the Optimally Weighted combination of GE interactions (TOW-GE), , as:
which is equivalent to , where can be viewed as a constant when we use a permutation test to evaluate p-values.The optimal weight is equivalent to , where is the correlation coefficient between and . From the expression of , we can see that it is proportional to and thus will put large weights to the GE interactions that have strong associations with the trait and also adjust for the direction of the association. Simultaneously, is proportional to and will put large weights to GE interactions with small variations which are common in GE interactions for rare variants.TOW-GE focuses primarily on rare variants by environment interactions and it may lose power because of the small weights on common variants by environment interactions. Thus, to test the GE interactions’ effects of both rare and common variants, we propose the following variable weight TOW-GE denoted as VW-TOW-GE. We first divide GE interactions into two parts based on rare or common variants and then we apply TOW-GE to the two parts separately. Let where T and T denote the test statistics of TOW-GE for GE interactions’ effects of rare and common variants, respectively. λ is a tuning parameter. Denote pλ as the p-value of Tλ, and then the test statistic of VW-TOW-GE is defined as T = min0≤λ≤1
pλ. In this study, we use a simple grid search method to choose the tuning parameter λ and minimize the p-value. Divide the interval [0, 1] into K subintervals of equal-length. Let λ = k/K for k = 0, 1, …, K. Then, .The p-value of T can be evaluated by permutation tests following similar permutation tests for variable weight TOW (VW-TOW) proposed by [8]. Suppose that we perform B times of permutations. In each permutation, we randomly shuffle the trait values. Let and denote the values of T and T, respectively, based on the b permuted data, where b = 0 represents the original data. Based on and (b = 0, 1, 2, …, B), we can calculate for b = 0, 1, 2, …, B and k = 0, 1, 2, …, K, where var(T) and var(T) are estimated using and (b = 0, 1, 2, …, B). Then, we transfer to byLet . Then, the p-value of T is given by
Simulation
We compared the performance of our proposed methods with the interaction sequence kernel association test (ISKAT) [10], the modified WSS for testing the effects of GE interactions [7] and the modified CMC method for testing the effects of GE interactions [6]. In this study, the rank sum test used by WSS and the T2 test used by CMC were replaced with the score test based on residuals and . The empirical Mini-Exome genotype data provided by the Genetic Analysis Workshop 17 (GAW17) is used for simulation studies. The dataset contains genotypes of 697 unrelated individuals on 3,205 genes. Because gene ELAVL4 in GAW17 was used to simulate GE interaction’s effect on quantitative trait Q1 which follows a normal distribution, we chose gene ELAVL4 in our simulation study. Gene ELAVL4 has 10 variants, containing 8 rare variants and 2 common variants. Rare variants in the simulation are defined with MAF < 0.05.To evaluate type I error, we generate trait values independent of GE interactions (e.g. 1 = 0 and β = 0) by using the model:
where ϵ1 follows a normal distribution with mean as 0 and variances as ; α1 = 0.015; is GE interactions for rare variants and S is GE interaction for a common variant. We consider two covariates: a standard normal covariate X1 and a binary covariate X2 with P(X2 = 1) = 0.5. The environmental factor E is assumed to be continuous following a standard normal distribution.For type I error evaluation, we consider two different cases: 1) testing the effects of GE interactions for rare variants; 2) testing the effects of GE interactions for both rare and common variants. For each case, we consider two scenarios: (a) with main effect; (b) without main effect in the model. When the main effects exist, we set the magnitudes of vector 2 as 0.3 and the sign of each coefficient is random sampled from (−1, 1). When main effects do not exist, we set 2 = 0.For power comparisons, the phenotype is generated using similar settings to type I evaluation except for existing GE interactions’ effects. We compare the power of TOW-GE, ISKAT, WSS and CMC to test rare variant GE interactions’ effects considering two scenarios: (a) including main effects, 2 ≠ 0 for rare variants; (b) no main effects, 2 = 0 for rare variants. We vary the number of non-zero in the vector , the proportion of non-zero in that are positive, and the magnitudes of the non-zero β. We set the magnitudes of the non-zero β’s as |β| = c, and increase c from 0.1 to 0.5. In each simulation scenario, p-values are estimated by 10,000 permutations and 1,000 replicated samples.
Simulation results
The empirical type I error rates are shown in Tables 1 and 2. For 10,000 replicated samples, the 95% confidence intervals for type I error rates of nominal levels as 0.05, 0.01 and 0.001 are (0.046, 0.054), (0.008, 0.012) and (0.0004, 0.0016), respectively. When there are (a) main effects, e.g. 2 ≠ 0, TOW-GE, VW-TOW-GE, ISKAT and WSS control type I error rates well and the burden test CMC tends to have very conservative type I error rates (top panel of Tables 1 and 2). When there are (b) no main effects. e.g. 2 = 0, all methods can control type I error rates well (bottom panel of Tables 1 and 2).
Table 1
Type 1 error rates for testing the effects of GE interactions of rare variants in the presence of main effects (top panel) and in the absence of main effects (bottom panel) (n = 2000).
With main effect
α-level
TOW-GE
ISKAT
WSS
CMC
n = 2000
0.05
0.042
0.066
0.047
0.027
0.01
0.01
0.017
0.011
0.005
0.001
0.001
0.009
0.000
0.000
Without main effect
n = 2000
0.05
0.054
0.066
0.050
0.043
0.01
0.006
0.014
0.012
0.012
0.001
0.000
0.004
0.000
0.000
Table 2
Type 1 error rates for testing the effects of GE interactions for both rare and common variants in the presence of main effects (top panel) and in the absence of main effects (bottom panel) (n = 2000).
With main effect
α-level
TOW-GE
ISKAT
WSS
CMC
VW-TOW-GE
n = 2000
0.05
0.053
0.062
0.055
0.040
0.052
0.01
0.007
0.013
0.017
0.009
0.011
0.001
0.002
0.002
0.001
0.000
0.002
Without main effect
n = 2000
0.05
0.051
0.056
0.048
0.049
0.058
0.01
0.006
0.012
0.012
0.011
0.014
0.001
0.001
0.003
0.001
0.001
0.002
The results for testing the effects of GE interactions of rare variants when including main effect and no main effect are given in Figs 1 and 2, respectively. In both of these two scenarios, we consider the sample size as 2000 without a GE interaction of a common variant. We do not apply VW-TOW-GE here because it is designed for existing GE interactions’ effects of both common and rare variants. The top, middle, and bottom panels in Figs 1 and 2 provide results for three cases, e.g. when there are 2, 6 and 8 non-zero β’s, respectively. The left and right panels of Figs 1 and 2 present for two cases, e.g. 50% of the β are positive and 100% of the β are positive, respectively. For each plot, we vary c, the magnitudes of the non-zero β. As shown in the four plots for the case when 50% of the β are positive, TOW-GE is more powerful than the other three tests. For the case when 100% of the β are positive, WSS is relatively more powerful than TOW-GE since all the GxEs have the same direction of effects. TOW-GE is more powerful than the other two tests. However, WSS is very sensitive to the directions of effects due to aggregation of GE interactions directly. Among the four tests (TOW-GE, ISKAT, WSS and CMC) in the two different cases, CMC is the least powerful test. CMC loses power as it gives GE interactions of common variants large weights, and thus GE interactions of common neutral variants will introduce large noise.
Fig 1
Power comparisons of the four tests (TOW-GE, ISKAT, WSS and CMC) for testing GE interaction effects for rare variants on a continuous outcome when there are main effects (n = 2000 and the significance level of α = 0.05).
Fig 2
Power comparisons of the four tests (TOW-GE, ISKAT, WSS and CMC) for testing GE interaction effects of rare variants on a continuous outcome when there are no main effects (n = 2000, significance level of α = 0.05).
Power comparisons of the five tests (TOW-GE, VW-TOW-GE, ISKAT, WSS and CMC) for testing GE interaction effects for both rare and common variants are given in Fig 3. For each scenario in Fig 3, we vary c from 0.02 to 0.1 and set 50% of the β as positive. Simultaneously, we set the coefficient of a common variant by environment interaction as positive and the magnitudes of as twice of β which is the coefficient of a rare variant by environment interaction. From Fig 3, we can see that VW-TOW-GE is the most powerful test. CMC is the second most powerful test as CMC puts large weights on GE interactions of common variants and gains power increment when the GE interaction of a common variant plays an important role as the causal effect. WSS is the least powerful test, which loses power because it puts very small weight on the GE interaction of the common variant.
Fig 3
Power comparisons of the five tests (TOW-GE, ISKAT, WSS, CMC and VW-TOW-GE) for testing GE interaction effects for both rare and common variants on a continuous outcome (n = 2000 and the significance level of α = 0.05).
Left panel: With main effect; Right panel: With no main effect.
Power comparisons of the five tests (TOW-GE, ISKAT, WSS, CMC and VW-TOW-GE) for testing GE interaction effects for both rare and common variants on a continuous outcome (n = 2000 and the significance level of α = 0.05).
Left panel: With main effect; Right panel: With no main effect.TOW-GE, VW-TOW-GE, and ISKAT can all be considered as quadratic statistics which have reasonable power across a wide range of alternative hypothesis. The three methods are robust to the different directions of the GE interaction effects. We perform a further assessment for the three methods. Fig 4 shows the results. When there are causal effects of GE interactions for both common and rare variants, VW-TOW-GE outperforms TOW-GE and ISKAT. TOW-GE is more powerful than ISKAT except when the magnitude of the GE interactions is less than 0.04.
Fig 4
Power comparisons of the three quadratic tests (TOW-SE, iSKAT, and VW-TOW-SE) for testing GE interaction effects of both rare and common variants on a continuous outcome (n = 2000, the significance of α = 0.05).
Left panel: With main effect; Right panel: Without main effect.
Power comparisons of the three quadratic tests (TOW-SE, iSKAT, and VW-TOW-SE) for testing GE interaction effects of both rare and common variants on a continuous outcome (n = 2000, the significance of α = 0.05).
Left panel: With main effect; Right panel: Without main effect.
Real data analysis
Chronic obstructive pulmonary disease (COPD) is one of the most common lung diseases characterized by long term poor airflow and is a major public health problem [13]. It is a complex disease which is influenced by genetic factors, environmental influences, and genotype-environment interactions. We have known that cigarette smoking is the major environmental determinant of COPD [14]. Several genes have been suggested to play a role in the presence of a gene-by-smoking interaction term. Specifically, [15] reported that the 30-repeat allele of HMOX1 was associated with COPD in presence of a gene-by-smoking (pack-years) interaction term. [14] presented that the GSTM1 gene was associated with severe chronic bronchitis in heavy smokers and an association of the TNF—308A allele with COPD was found in a Taiwanese population. [15] reported that the SFTPB Thr131Ile polymorphism was associated with COPD, but only in the presence of a gene with an environment interaction. The SNP rs2292566 in gene EPHX1 was associated with COPD only in presence of a gene-by-smoking (pack-years) interaction. [16] showed that two SNPs in the promoter region of TGFB1 (rs2241712 and rs1800469) and one SNP in exon 1 of TGFB1 (rs1982073) were significantly associated with COPD among smokers in a COPD case control study.The COPDGene Study is a multi-center genetic and epidemiologic investigation to study COPD [17]. Participants in the COPDGene Study gave consent for the use of data collected during the study in downstream analyses. This study is sufficiently large and appropriately designed for analysis of COPD. In this study, we consider more than 5,000 non-Hispanic Whites (NHW) participants where the participants have completed a detailed protocol, including questionnaires, pre- and post-bronchodilator spirometry, high-resolution CT scanning of the chest, exercise capacity (assessed by six-minute walk distance), and blood samples for genotyping. The participants were genotyped using the Illumina OmniExpress platform. The genotype data have gone through standard quality-control procedures for genome-wide association analysis detailed at http://www.copdgene.org/sites/default/files/GWAS_QC_Methodology_20121115.pdf. We imputed the COPD genotype data using the EUR haplotypes from the 1000 Genome Project as references.Based on the literature of COPD [18, 19], we selected 7 key quantitative COPD-related phenotypes, including FEV1 (% predicted FEV1), Emphysema (Emph), Emphysema Distribution (EmphDist), Gas Trapping (GasTrap), Airway Wall Area (Pi10), Exacerbation frequency (ExacerFreq), Six-minute walk distance (6MWD), and one qualitative phenotypes (case-control disease status denoted as COPD in following tables). 3 covariates, including BMI, Age and Sex and one environmental factor (Pack-Years) were considered in our analysis. EmphDist is the ratio of emphysema at -950 HU in the upper 1/3 of lung fields compared to the lower 1/3 of lung fields where we did a log transformation on EmphDist in the following analysis, referred to [18]. In the analysis, participants with missing data in any of these phenotypes were excluded.To evaluate the performance of our proposed method on a real data set, we applied all of the 5 methods (TOW-GE, ISKAT, WSS, CMC, and VW-TOW-GE) to six COPD associated genes (HMOX1, GSTM1, TGFB1, TNF, SFTPB, and EPHX1) through an interaction with cigarette smoking. In the analysis, we removed the extreme rare SNPs (MAF<0.001) in any genotypic variants and missing value in any of the 7 phenotypes and 3 covariates. We considered three different scenarios: (1) main effect; (2) gene-by-smoking interaction with main effect and (3) gene-by-smoking interaction without main effect. When we considered only the main effect, we used five existing methods (TOW-GE, SKAT, WSS, CMC, and VW-TOW) which are specifically designed for testing the main effect of a gene. We adopted 104 permutations for our methods and used 0.05 as the significance level.The results for testing association between COPD and gene HMOX1 and GSTM1 are summarized in Tables 3 and 4 respectively. The results for testing association between COPD and gene TGFB1, TNF, SFTPB, and EPHX1 are summarized in S1–S4 Tables. At gene HMOX1, both TOW-GE and modified WSS verified significant GE intecation effects without main effect for two traits Emph and Pi10. ISKAT and VW-TOW-GE verified significant GE intecation effects without main effect for trait Emph. At gene GSTM1, TOW-GE, VW-TOW-GE and ISKAT verified GE interaction effect without main effect for trait EmphDist, while all other methods failed in the verification tests. At gene TGFB1, TOW-GE, VW-TOW-GE and ISKAT verified GE interaction effect without main effect for trait ExacerFreq (S1 Table). Gene TNF was only identified by the modified CMC method and the modified WSS method for gene-by-smoking interaction with main effect (S2 Table). Gene EPHX1 was only identified by the modified WSS method for gene-by-smoking interaction with main effect (S4 Table). Four genes with gene-by-smoking interaction effects (GSTM1, HMOX1, SFTPB, and TGFB1) were identified by our methods (S1 and S3 Tables, Tables 3 and 4).
Table 3
Summary results of association analysis for HMOX1 based on the COPD dataset.
The p-values are shown for testing the gene’s main effect (top panel), gene-by-smoking interaction with main effect (middle panel), gene-by-smoking interaction without main effect (bottom panel).
Gene’s main effect
trait
TOW
SKAT
WSS
CMC
VW-TOW
GasTrap
0.3917
0.542
0.2618
0.9038
0.5539
ExacerFreq
0.7050
0.6149
0.1922
0.9845
0.8155
Emph
0.0204
0.0328
0.0036
0.9155
0.0360
Pi10
0.0283
0.0266
0.0457
0.4501
0.0559
EmphDist
0.8083
0.7363
0.5172
0.7520
0.8888
6MWD
0.8299
0.8451
0.8526
0.9985
0.8642
FEV1
0.6825
0.6928
0.7057
0.6906
0.7691
COPD
0.8637
0.8277
0.8345
0.7540
0.8677
Gene-by-smoking interaction with main effect
trait
TOW-GE
ISKAT
WSS
CMC
VW-TOW-GE
GasTrap
0.7432
0.8001
0.2610
0.8894
0.8033
ExacerFreq
0.5883
0.2389
0.2768
0.9964
0.3921
Emph
0.4024
0.2718
0.1140
0.9861
0.5696
Pi10
0.1208
0.4084
0.0821
0.9948
0.0651
EmphDist
0.5315
0.4794
0.6006
0.9892
0.4886
6MWD
0.6174
0.3624
0.4211
0.9929
0.6793
FEV1
0.8656
0.7748
0.4419
0.9575
0.9178
COPD
0.2302
0.3029
0.9089
0.9424
0.3394
Gene-by-smoking interaction without main effect
trait
TOW-GE
ISKAT
WSS
CMC
VW-TOW-GE
GasTrap
0.3388
0.5724
0.1207
0.6040
0.4967
ExacerFreq
0.3818
0.2810
0.0915
0.9320
0.4513
Emph
0.0189
0.0487
0.0011
0.8288
0.0349
Pi10
0.0304
0.0532
0.0118
0.5587
0.0571
EmphDist
0.8166
0.8062
0.7610
0.8066
0.8217
6MWD
0.7253
0.3463
0.6810
0.9929
0.6811
FEV1
0.5604
0.7387
0.4519
0.3043
0.7280
COPD
0.8869
0.8877
0.8657
0.3204
0.9300
Note: The bold numbers represent p-values of significant tests (significance level = 0.05).
Table 4
Summary results of association analysis for GSTM1 based on the COPD dataset.
The p-values are shown for testing the gene’s main effect (top panel), gene-by-smoking interaction with main effect (middle panel), gene-by-smoking interaction without main effect (bottom panel).
Gene’s main effect
trait
TOW
SKAT
WSS
CMC
VW-TOW
GasTrap
0.2309
0.6152
0.6163
0.2479
0.2848
ExacerFreq
0.7198
0.7823
0.7677
0.9138
0.6594
Emph
0.1401
0.3901
0.8496
0.1686
0.2355
Pi10
0.0177
0.1069
0.1705
0.0749
0.0151
EmphDist
0.0077
0.0256
0.3545
0.0487
0.0082
6MWD
0.6011
0.6856
0.8401
0.9260
0.5707
FEV1
0.1190
0.5920
0.9013
0.2301
0.0866
COPD
0.2144
0.2178
0.4699
0.2078
0.3243
Gene-by-smoking interaction with main effect
trait
TOW-GE
ISKAT
WSS
CMC
VW-TOW-GE
GasTrap
0.8652
0.7482
0.5358
0.1096
0.9158
ExacerFreq
0.7417
0.0867
0.3860
0.0599
0.5606
Emph
0.6829
0.9901
0.6833
0.2927
0.7207
Pi10
0.2757
0.5465
0.4808
0.1164
0.4506
EmphDist
0.1287
0.2314
0.6639
0.6781
0.1126
6MWD
0.8144
0.8769
0.8893
0.3781
0.8384
FEV1
0.9389
0.4145
0.6640
0.1277
0.9169
COPD
0.9944
0.8870
0.7842
0.2098
0.9878
Gene-by-smoking interaction without main effect
trait
TOW-GE
ISKAT
WSS
CMC
VW-TOW-GE
GasTrap
0.5160
0.2723
0.8413
0.6725
0.5769
ExacerFreq
0.5887
0.7348
0.6194
0.2701
0.6796
Emph
0.1041
0.1114
0.6514
0.4787
0.1691
Pi10
0.0697
0.1112
0.1282
0.0844
0.0631
EmphDist
0.0071
0.0229
0.6162
0.1078
0.0131
6MWD
0.7759
0.9342
0.9903
0.8683
0.7867
FEV1
0.2833
0.4709
0.6934
0.2673
0.2254
COPD
0.3641
0.1615
0.4693
0.5593
0.4934
The bold numbers represent p-values of significant tests (significance level = 0.05).
Summary results of association analysis for HMOX1 based on the COPD dataset.
The p-values are shown for testing the gene’s main effect (top panel), gene-by-smoking interaction with main effect (middle panel), gene-by-smoking interaction without main effect (bottom panel).Note: The bold numbers represent p-values of significant tests (significance level = 0.05).
Summary results of association analysis for GSTM1 based on the COPD dataset.
The p-values are shown for testing the gene’s main effect (top panel), gene-by-smoking interaction with main effect (middle panel), gene-by-smoking interaction without main effect (bottom panel).The bold numbers represent p-values of significant tests (significance level = 0.05).
Discussion
Recent evidence shows that gene-environment interactions of rare variants may play an important role in explaining the etiology of a complex disease. However, there are limited methods that can be employed to test the effects of GE interactions for rare variants. In this study, we propose two new methods for testing GE interactions for rare variants only or for both rare and common variants. We employ a generalized linear model to model the relationship between the trait and the GE interactions. Our model focuses on GE interactions by first adjusting for genetic main effects, environmental main effects, and possible covariates. Two methods are designed for different scenarios through specific weigh-selection mechanisms. TOW-GE assigns the majority of weights on rare variants by environment interactions. VW-TOW-GE balances common and rare variants by performing weight assignments separately for common variants by environment interactions and rare variants by environment interactions. Both methods achieve the best possible power with an adaptive weight selection procedure.In the application, we have tested genetic association for 7 traits of COPD. Our proposed methods verified the most significant GE interactions, especially for gene-by-smoking interactions without main effect and performs the best compared to other methods. In simulation studies, we also demonstrated that our proposed methods perform better in different scenario: with main effect and without main effect. Our results show that the proposed methods TOW-GE or VW-TOW-GE demonstrate better power in most cases compared with competing methods.The power of a test varies according to the number of GE interactions of rare or common variants, the effect directions of GE interactions, and the MAFs of variants. When substantial of GE interactions have opposite directions of effects, the quadratic statistics TOW-GE, VW-TOW-GE, and ISKAT are powerful. When effects of GE interactions of common variants play a primary role, CMC is more powerful than ISKAT, WSS, and has similar power to VW-TOW-GE.In our proposed method, the optimal weights of TOW-GE are derived analytically; thus the computation cost is relatively small. On the other hand, TOW-GE is flexible and allows for prior biological information to be incorporated by using flexible weights, such as weights derived from the expression quantitative trait locus (eQTL), which may further improve the power of TOW-GE. In addition, TOW-GE allows for adjustment of covariates. The covariates could be demographic variables, environmental variables, clinical variables, and/or principle components of genotype scores. The adjustment of covariates makes TOW-GE not only able to eliminate the effect of confounders but also able to correct for possible population stratification in admixed populations. One possible advantage of TOW-GE compared to ISKAT is that TOW-GE utilizes the residuals of both the trait value and the GE interactions, which are obtained by adjusting for covariates from linear regression models, respectively, while ISKAT utilizes only the residual of the trait value.The proposed test statistic TOW-GE does not have an asymptotic distribution and a permutation procedure is needed to estimate its p-value, which is time consuming compared to methods with asymptotic distributions. To save time when applying the proposed methods in genetic association studies, we can use the “step-up” procedure [20, 21] to determine the number of permutations. This can show evidence of association based on a small number of permutations first (e.g.1,000) and then a large number of permutations are used to test the selected potentially significant genes. Specifically, the computation time of p-value estimation of TOW-GE and VW-TOW-GE for a gene in the real data analysis was about 30 seconds using our R program on 6 Dell PowerEdge C6320 servers. Each server has two 2.4GHz Intel Xeon E5-2680 v4 fourteen-core processors and 600 MB average memory. We have uploaded the R program onto GitHub at https://github.com/Jianjun-CN/Single-GE.(PDF)Click here for additional data file.(PDF)Click here for additional data file.(PDF)Click here for additional data file.(PDF)Click here for additional data file.(TEX)Click here for additional data file.4 Dec 2019PONE-D-19-27761Testing gene-environment interactions for rare and/or common variants in sequencing association studiesPLOS ONEDear Dr. Hao,Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.We would appreciate receiving your revised manuscript by Jan 18 2020 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter.To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocolsPlease include the following items when submitting your revised manuscript:A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'.A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'.An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'.Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.We look forward to receiving your revised manuscript.Kind regards,Heming Wang, PhDAcademic EditorPLOS ONEJournal Requirements:1. When submitting your revision, we need you to address these additional requirements.Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found athttp://www.journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and http://www.journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf2. Thank you for stating the following in the Acknowledgments Section of your manuscript:The Genetic Analysis workshops are supported by NIH grant R01 GM031575 from the 262National Institute of General Medical Sciences. Preparation of the Genetic Analysis 263Workshop 17 Simulated Exome Data Set was supported in part by NIH R01 MH059490 264and used sequencing data from the 1000 Genomes Project (www.1000genomes.org). 265This research used data generated by the COPDGene study (phs000179/HMB and 266phs000179/DS-CS-RD), which was supported by National Institutes of Health (NIH) 267grants U01HL089856 and U01HL089897. The content is solely the responsibility of the 268authors and does not necessarily represent the ocial views of the National Heart, 269Lung, and Blood Institute or the National Institutes of Health. The COPDGene project 270is also supported by the COPD Foundation through contributions made by an Industry 271Advisory Board comprised of P zer, AstraZeneca, Boehringer Ingelheim, Novartis, and 272Sunovion.We note that you have provided funding information that is not currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form.Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows:Q Sha was supported by the National Human Genome Research Institute (https://www.genome.gov/) of the National Institutes of Health under Award Number R15HG008209. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.3. We note you have included a table to which you do not refer in the text of your manuscript. Please ensure that you refer to Table 4 in your text; if accepted, production will need this reference to link the reader to the Table.[Note: HTML markup is below. Please do not edit.]Reviewers' comments:Reviewer's Responses to QuestionsComments to the Author1. Is the manuscript technically sound, and do the data support the conclusions?The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.Reviewer #1: YesReviewer #2: Yes**********2. Has the statistical analysis been performed appropriately and rigorously?Reviewer #1: YesReviewer #2: Yes**********3. Have the authors made all data underlying the findings in their manuscript fully available?The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.Reviewer #1: YesReviewer #2: Yes**********4. Is the manuscript presented in an intelligible fashion and written in standard English?PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.Reviewer #1: YesReviewer #2: Yes**********5. Review Comments to the AuthorPlease use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)Reviewer #1: In this manuscript, authors proposed methods to detect gene-environment (GE) interactions for rare and/or common variants. A generalized linear model was developed and a score test was established for testing GE interactions. Permutation tests were used for obtaining p-values. Simulation studies were conducted to validate the properties of the test. It was showed that the proposed test was more powerful than existing ones in this area. The application of proposed model has been demonstrated with an empirical analysis of data for Chronic obstructive pulmonary disease. It is a nice work and provides biomedical researchers with useful methodology to address scientific questions related to GE interactions, particularly of rare variants.My questions are about the computation.1. I understand permutation tests have high computational cost when sample size is large and/or the number of features/genes is large. It was mentioned that HPC infrastructure had been utilized for this study. I wonder if it is feasible to perform the proposed test on a personal computer as it is more convenient for average users.2. Are the code of your model/test available to public?Reviewer #2: In this paper, Zhao and colleagues describe a novel method for rare and common variants gene-environment interaction testing in sequencing association studies. The method builds upon substantial existing work and is more powerful than existing methods by using an adaptive weight selection procedure. The text overall is very clearly written.However, I do have some minor concerns that I would like the authors to address.(1) The permutation procedure for the p-value calculation is time-consuming. Barnett et.al (JASA, 2017) provided a scheme for permutation test to save computation time. It would be good to incorporate it in the paper.(2) In the real data application, the authors only test for association of two known genes. It would be great for the authors to show the results of the gene-based analysis of the whole genome.Reference[1] Barnett, I., Mukherjee, R. and Lin, X., 2017. The generalized higher criticism for testing SNP-set effects in genetic association studies. Journal of the American Statistical Association, 112(517), pp.64-76.**********6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.If you choose “no”, your identity will remain anonymous but your review may still be made public.Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.Reviewer #1: NoReviewer #2: No[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.]While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step.7 Jan 2020Dear Dr. Wang,We thank you for your encouraging evaluation for our manuscript “Testing gene-environment interactions for rare and/or common variants in sequencing association studies” (Manuscript Number: PONE-D-19-27761R1). We appreciate the editor's patient work in organizing the reviewing process. After thoroughly and carefully addressing the reviewers' comments and making corresponding revisions in the manuscript highlighted in red, we are now resubmitting the paper for your further consideration. Please find our point-by-point response to the reviewers' comments in the letter named Response to Reviewers.Thank you for the helpful and constructive review. We feel that the manuscript is much improved and hope you now find it suitable for publication in PLOS ONE.Han HaoSubmitted filename: Response to Reviewers.docxClick here for additional data file.3 Feb 2020Testing gene-environment interactions for rare and/or common variants in sequencing association studiesPONE-D-19-27761R1Dear Dr. Hao,We are pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it complies with all outstanding technical requirements.Within one week, you will receive an e-mail containing information on the amendments required prior to publication. When all required modifications have been addressed, you will receive a formal acceptance letter and your manuscript will proceed to our production department and be scheduled for publication.Shortly after the formal acceptance letter is sent, an invoice for payment will follow. To ensure an efficient production and billing process, please log into Editorial Manager at https://www.editorialmanager.com/pone/, click the "Update My Information" link at the top of the page, and update your user information. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.If your institution or institutions have a press office, please notify them about your upcoming paper to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, you must inform our press team as soon as possible and no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.With kind regards,Heming Wang, PhDAcademic EditorPLOS ONEAdditional Editor Comments (optional):Reviewers' comments:Reviewer's Responses to QuestionsComments to the Author1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.Reviewer #1: All comments have been addressedReviewer #2: All comments have been addressed**********2. Is the manuscript technically sound, and do the data support the conclusions?The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.Reviewer #1: YesReviewer #2: Yes**********3. Has the statistical analysis been performed appropriately and rigorously?Reviewer #1: YesReviewer #2: Yes**********4. Have the authors made all data underlying the findings in their manuscript fully available?The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.Reviewer #1: YesReviewer #2: Yes**********5. Is the manuscript presented in an intelligible fashion and written in standard English?PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.Reviewer #1: YesReviewer #2: Yes**********6. Review Comments to the AuthorPlease use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)Reviewer #1: (No Response)Reviewer #2: (No Response)**********7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.If you choose “no”, your identity will remain anonymous but your review may still be made public.Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.Reviewer #1: NoReviewer #2: No20 Feb 2020PONE-D-19-27761R1Testing gene-environment interactions for rare and/or common variants in sequencing association studiesDear Dr. Hao:I am pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.If your institution or institutions have a press office, please notify them about your upcoming paper at this point, to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.For any other questions or concerns, please email plosone@plos.org.Thank you for submitting your work to PLOS ONE.With kind regards,PLOS ONE Editorial Office Staffon behalf ofDr. Heming WangAcademic EditorPLOS ONE
Authors: Craig P Hersh; Dawn L Demeo; Christoph Lange; Augusto A Litonjua; John J Reilly; David Kwiatkowski; Nan Laird; Jody S Sylvia; David Sparrow; Frank E Speizer; Scott T Weiss; Edwin K Silverman Journal: Am J Respir Cell Mol Biol Date: 2005-04-07 Impact factor: 6.914
Authors: Abanish Singh; Michael A Babyak; Daniel K Nolan; Beverly H Brummett; Rong Jiang; Ilene C Siegler; William E Kraus; Svati H Shah; Redford B Williams; Elizabeth R Hauser Journal: Eur J Hum Genet Date: 2014-10-01 Impact factor: 4.246
Authors: Juan C Celedón; Christoph Lange; Benjamin A Raby; Augusto A Litonjua; Lyle J Palmer; Dawn L DeMeo; John J Reilly; David J Kwiatkowski; Harold A Chapman; Nan Laird; Jody S Sylvia; Melvin Hernandez; Frank E Speizer; Scott T Weiss; Edwin K Silverman Journal: Hum Mol Genet Date: 2004-06-02 Impact factor: 6.150
Authors: Jen-hwa Chu; Craig P Hersh; Peter J Castaldi; Michael H Cho; Benjamin A Raby; Nan Laird; Russell Bowler; Stephen Rennard; Joseph Loscalzo; John Quackenbush; Edwin K Silverman Journal: BMC Syst Biol Date: 2014-06-25