Literature DB >> 25519395

A partition-based approach to identify gene-environment interactions in genome wide association studies.

Ruixue Fan¹, Chien-Hsun Huang¹, Inchi Hu², Haitian Wang³, Tian Zheng¹, Shaw-Hwa Lo¹.

Abstract

It is believed that almost all common diseases are the consequence of complex interactions between genetic markers and environmental factors. However, few such interactions have been documented to date. Conventional statistical methods for detecting gene and environmental interactions are often based on the linear regression model, which assumes a linear interaction effect. In this study, we propose a nonparametric partition-based approach that is able to capture complex interaction patterns. We apply this method to the real data set of hypertension provided by Genetic Analysis Workshop 18. Compared with the linear regression model, the proposed approach is able to identify many additional variants with significant gene-environmental interaction effects. We further investigate one single-nucleotide polymorphism identified by our method and show that its gene-environmental interaction effect is, indeed, nonlinear. To adjust for the family dependence of phenotypes, we apply different permutation strategies and investigate their effects on the outcomes.

Entities: CellLine Chemical Disease Mutation Species

Year: 2014 PMID： 25519395 PMCID： PMC4143762 DOI： 10.1186/1753-6561-8-S1-S60

Source DB: PubMed Journal: BMC Proc ISSN： 1753-6561

Background

Genome-wide association studies (GWAS) have successfully discovered many common variants associated with complex diseases, but the single-nucleotide polymorphisms (SNPs) identified so far account for a small proportion of the total heritability in quantitative traits [1]. Increasing evidence shows that gene-environment (G×E) interactions are widely involved in the etiology of complex diseases, including diabetes, cancer, and psychiatric disorders [2,3]. The investigation of G×E interactions will not only facilitate the identification of novel genes whose marginal effects are undetectable, but also provide insights into disease etiology and hence greatly benefit drug development and personalized therapy. The commonly applied methods to detect G×E interactions are based on linear or logistic regression models [4]. In particular, for quantitative outcomes, a linear model is considered in the form of where G is the genotype of a SNP, E is the environmental factor, ε is a normally distributed random error, and is the coefficient corresponding to the interaction term. If the conditional effect of the SNP is constant across different levels of the environmental factor and we conclude that there is no G×E interaction. This model assumes a linear interaction effect; given G, the outcome y is linearly related with E. However, in practice, it is likely that the interaction schemes are more complicated so that the linear model will probably fail to capture the interaction effect. Therefore, there is a pressing need to develop novel statistical approaches for genome-wide G×E interaction studies. Here we propose a nonparametric partition-based approach to detect G×E interactions and conduct a GWAS for hypertension using the real data set provided by Genetic Analysis Workshop 18 (GAW18). For each SNP, both the linear regression model and the proposed method are used to evaluate its interaction effect with each of the 4 environmental factors: age, gender, smoking status, and medicine. We note that, compared with the linear model, the proposed method is able to identify many additional SNPs. We further study the interaction pattern between SNP rs17206492 and medicine, and find that this interaction effect is, indeed, nonlinear. We also investigate different permutation strategies in the presence or absence of pedigree dependence of the phenotype.

Methods

Data set

The GAW18 data set consists of GWAS data and whole genome sequence data with longitudinal phenotypes for hypertension and related traits from Type 2 Diabetes Genetic Exploration by Next-generation sequencing in Ethnic Samples (T2D-GENES) Project 2. There are 939 individuals in total, and we include in our analysis only the 849 individuals with both phenotype data and imputed sequence information. Each individual has measurements for up to 4 time points. At each visit, systolic blood pressure (SBP) and diastolic blood pressure (DBP) were measured; covariates including age, use of antihypertensive medication, and current tobacco smoking status were also recorded. Gender and pedigree are known for each subject. Genotypes of odd-numbered chromosomes are provided. In our study, we focused on chromosome 3 as suggested by the workshop organizer for the sake of comparison. Although we had access to the answers for the simulated data set, we used only the real data set in our analysis.

A general framework--a partition-based association measure

Suppose there are n independent subjects that can be separated by a partition ∏. An association measure between the outcome Y and the partition ∏ is defined as: where is the number of subjects in partition i, is the average of the outcome Y for subjects in partition i, and and are the mean and variance of Y from all subjects. It has been shown that under the null hypothesis ∏ does not have influence on Y, I asymptotically converges to a weighted sum of distributions [5]. It has higher power than linear regression or logistic regression models, even in sparse partitions.

G×E association measure I

Consider a marker G and an environmental factor E. Suppose G has 3 phenotypes, AA, Aa, and aa (A refers to the major allele and a the minor allele), coded as 0, 1, and 2. Suppose E is divided into 3 categories: 0, 1, and 2. Hereby G and E together create 9 partitions for all subjects (Table 1). From the general framework in the last section, an association measure that evaluates the total effect of G and E on the phenotype is:

Table 1

Partitions created by genotypic and environmental factors

	E = 0	E = 1	E = 2	Total
G = 0	n₀₀	n₀₁	n₀₂	n₀
G = 1	n₁₀	n₁₁	n₁₂	n₁.
G = 2	n₂₀	n₂₁	n₂₂	n₂.
Total	n.₀	n.₁	n.₂	n..

n.., Total number of subjects,; n, number of subjects in partition ij, n, number of subjects in group G = i; n., number of subjects in group E = j.

Partitions created by genotypic and environmental factors n.., Total number of subjects,; n, number of subjects in partition ij, n, number of subjects in group G = i; n., number of subjects in group E = j. where all the terms are similarly defined as before and y denotes the phenotype. The marginal effects of G and E can be obtained in a similar fashion: The test statistic that measures the G×E interaction effect is defined as the difference between the total effect and the maximum of the two marginal effects: The significance of Iis evaluated by the method of permutation.

Permutation strategies

We consider 3 permutation strategies in our analysis: global permutation, local permutation, and residual permutation. Let ydenote the phenotype of the jindividual in the ipedigree. Global permutation is to permute phenotypes over all individuals. For local permutation, the phenotypes are permuted within each pedigree. In residual permutation, we first compute the residuals for each individual , where is the average phenotype for pedigree i, then permute eover all subjects to obtain a permuted residual for each individual. The permuted Y values are obtained by . Both local permutation and residual permutation assume , where and are independent. Residual permutation further assumes that have the same distribution.

Results

Partitions created by environmental factors

The real data set from GAW18 contains the records of 4 environmental factors: age, gender, smoking status, and antihypertensive medication usage (medicine). Because gender is a binary variable, it partitions all individuals into 2 groups. Although this data set provides longitudinal measurements of age, smoking, and medicine, the records have many missing values (only 187 subjects have complete measurements for all 4 visits). Therefore, for each individual, we summarized these covariates by either the averaged value (for age) or the sum (for smoking and medicine) across different time points from available records and used these summarized quantities in our analysis. Similarly, averaged SBP and averaged DBP were considered as outcomes. Here we created 3 partitions by each of age, smoking, and medicine (Table 2).

Table 2

Partitions based on the summarized quantities of age, smoking status, or medicine

By age*	By smoking	By medicine
16~33.44 →Partition 033.45~50.30 →Partition 150.31~94.20 →Partition 2	0 → Partition 01 → Partition 12,3,4 → Partition 2	0 → Partition 01 → Partition 12,3,4 → Partition 2

* The age group is divided by the 33% quantile (33.44) and 67% quantile (50.30). The minimum age is 16 and the maximum age is 94.2.

Partitions based on the summarized quantities of age, smoking status, or medicine * The age group is divided by the 33% quantile (33.44) and 67% quantile (50.30). The minimum age is 16 and the maximum age is 94.2.

SNPs with significant G×E interaction effects

In the GWAS data set provided by GAW18, there are 62,915 SNPs on chromosome 3. For each SNP, we evaluated its interaction effect with each of the 4 environmental factors on both SBP and DBP using the linear regression model (LRM) and the proposed partition-based score I (PBI). p Values of LRM were derived from the asymptotic distribution of the regression coefficient and p values of PBI were computed from 107 permutations using global, local, or residual permutation procedures. Table 3 lists the number of SNPs with p values less than the Bonferroni-corrected significance level (7.9*10−7) for all interactions under consideration. Compared with LRM, PBI identified many additional significant SNPs, especially when testing the G×E interaction effects with medicine. The reason, we believe, is that the interaction modeled by LRM is restricted to the linear form, whereas PBI is able to capture nonlinear and complicated interaction patterns. To confirm our hypothesis, we further analyzed the SNP rs17206492, which was identified by PBI (using any of the 3 permutation strategies) to have strong G×Medicine interaction effect on DBP, but was not selected by LRM. The left panel of Figure 1 shows that the averaged values of DBP in individuals not carrying the minor allele (genotype 0) and in individuals carrying the minor allele (genotype 1) are almost the same, indicating that rs17206492 does not have strong marginal effect. However, with the increase of medication usage, when the genotype is 1 (middle panel of Figure 1), DBP first decreases and then increases; but when the genotype is 0 (right panel of Figure 1), DBP first increases and then decreases. This nonlinear interaction scheme cannot be detected by LRM, but is captured by our model-free test statistic PBI.

Table 3

Number of significant SNPs with p value less than 7.9*10−7 *

Environmental factor	DBP				SBP

	LRM	PBI (GP)	PBI (LP)	PBI (RP)	LRM	PBI (GP)	PBI (LP)	PBI (RP)
Age	0	4	7	3	6	16	33	20
Smoke	0	6	3	3	0	0	0	0
Gender	0	42	37	36	0	1	1	1
Medicine	4	80	53	33	1	65	65	57

GP, Global permutation; LP, local permutation; LRM, linear regression model; PBI, partition-based I; RP, residual permutation.

*7.9*10−7 is the Bonferroni corrected p value.

Figure 1

G×E interaction effect of SNP . The marginal effect of the genotype (left), the medication effect when genotype = 1 (middle), and the medication effect when genotype is 0 (right).

G×E interaction effect of SNP . The marginal effect of the genotype (left), the medication effect when genotype = 1 (middle), and the medication effect when genotype is 0 (right). Number of significant SNPs with p value less than 7.9*10−7 * GP, Global permutation; LP, local permutation; LRM, linear regression model; PBI, partition-based I; RP, residual permutation. *7.9*10−7 is the Bonferroni corrected p value.

Effect of different permutation strategies

There are 20 pedigrees in the GAW18 data set. Both the analysis of variance (ANOVA) test and the nonparametric Kruskal-Wallis test indicate that the mean DBP values of different pedigrees are different, whereas the mean SBP values are the same (Table 4). When evaluating the p values of PBI, we performed 3 types of permutation: global (GP), local (LP), and residual (RP) permutations. Both LP and RP adjust for familial relatedness between individuals. For SBP, except for the environmental factor age, the results from 3 permutation methods coincide substantially (see Table 3 and Figure 2), which is consistent with the conclusion from ANOVA and Kruskal-Wallis test. In contrast, for DBP, the results of GP are quite different from the results of LP or RP, especially when assessing the interaction effect with medicine (see Table 3 and Figure 2). In this situation, the results from LP or RP are more reliable because they take into account the family dependence of the phenotype. In addition, LP tends to select more markers than RP; this may be because the data violate the assumption that have the same distribution. Moreover, SNPs identified by LP and RP overlap considerably and the consistency of results from these two permutation strategies can be an indicator of true signal.

Table 4

p Values for testing the pedigree dependence of SBP and DBP

	ANOVA test	Kruskal-Wallis test
SBP	0.155	0.433
DBP	0.000625	0.0004226

Figure 2

Positions of SNPs identified to have significant G×E interaction effects by .

Positions of SNPs identified to have significant G×E interaction effects by . p Values for testing the pedigree dependence of SBP and DBP

Discussion

In this paper, we have proposed a partition-based approach PBI to detect G×E interactions, which is nonparametric and model-free. The test statistic is derived from a partition-based measure I, and the interaction information score Iis defined as the difference between the total score Iand the maximum of the marginal scores. Intuitively, if the genetic and the environmental factors have strong interaction effect, Iwill be far greater than both marginal scores; hence Iwill be positive and large. If not, Iwill be no greater than at least 1 of the marginal scores. Therefore, Ievaluates the amount of influence of the G×E interactions on the phenotype. When applied to the real data set about hypertension provided by GAW18, PBI identified many more markers than the traditional linear regression method. Because our approach is model-free, it is able to capture complicated interaction patterns that are difficult to detect in linear model. The significance of Iis evaluated by permutation. LP and RP adjust effectively for the family dependence of the phenotype. Despite the fact that the proposed procedure selects more SNPs than linear regression, there is very little experimental evidence of G×E interactions for hypertension in the current literature to verify our findings. Therefore, biological studies will be required to investigate our results. Modifications of PBI have successfully identified gene-gene interactions and constructed genetic networks for breast cancer [6] and rheumatoid arthritis [7]. Moreover, PBI can be extended to evaluate the interaction effects between rare variants and environmental factors. Because of the low frequencies of rare variants (<1%), we can apply a gene-based approach by collapsing rare variants in a gene [8-11] and creating partitions based on the collapsed information.

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

SHL and RF designed the study. RF, CHH and SHL performed the study. RF, CHH, IH, HW, TZ and SHL contributed to analysis of the data. RF and SHL drafted the manuscript. All authors read and approved the final manuscript.

10 in total

1. A demonstration and findings of a statistical approach through reanalysis of inflammatory bowel disease data.

Authors: Shaw-Hwa Lo; Tian Zheng
Journal: Proc Natl Acad Sci U S A Date: 2004-07-01 Impact factor: 11.205

2. Exploiting gene-environment interaction to detect genetic associations.

Authors: Peter Kraft; Yu-Chun Yen; Daniel O Stram; John Morrison; W James Gauderman
Journal: Hum Hered Date: 2007-02-02 Impact factor: 0.444

3. Rare-variant association testing for sequencing data with the sequence kernel association test.

Authors: Michael C Wu; Seunggeun Lee; Tianxi Cai; Yun Li; Michael Boehnke; Xihong Lin
Journal: Am J Hum Genet Date: 2011-07-07 Impact factor: 11.025

4. Incorporating biological information into association studies of sequencing data.

Authors: Gary K Chen; Gary Chen; Peng Wei; Anita L DeStefano
Journal: Genet Epidemiol Date: 2011 Impact factor: 2.135

Review 5. Statistical analysis of rare sequence variants: an overview of collapsing methods.

Authors: Carmen Dering; Claudia Hemmelmann; Elizabeth Pugh; Andreas Ziegler
Journal: Genet Epidemiol Date: 2011 Impact factor: 2.135

Review 6. Finding the missing heritability of complex diseases.

Authors: Teri A Manolio; Francis S Collins; Nancy J Cox; David B Goldstein; Lucia A Hindorff; David J Hunter; Mark I McCarthy; Erin M Ramos; Lon R Cardon; Aravinda Chakravarti; Judy H Cho; Alan E Guttmacher; Augustine Kong; Leonid Kruglyak; Elaine Mardis; Charles N Rotimi; Montgomery Slatkin; David Valle; Alice S Whittemore; Michael Boehnke; Andrew G Clark; Evan E Eichler; Greg Gibson; Jonathan L Haines; Trudy F C Mackay; Steven A McCarroll; Peter M Visscher
Journal: Nature Date: 2009-10-08 Impact factor: 49.962

7. Genome-wide gene-environment study identifies glutamate receptor gene GRIN2A as a Parkinson's disease modifier gene via interaction with coffee.

Authors: Taye H Hamza; Honglei Chen; Erin M Hill-Burns; Shannon L Rhodes; Jennifer Montimurro; Denise M Kay; Albert Tenesa; Victoria I Kusel; Patricia Sheehan; Muthukrishnan Eaaswarkhanth; Dora Yearout; Ali Samii; John W Roberts; Pinky Agarwal; Yvette Bordelon; Yikyung Park; Liyong Wang; Jianjun Gao; Jeffery M Vance; Kenneth S Kendler; Silviu-Alin Bacanu; William K Scott; Beate Ritz; John Nutt; Stewart A Factor; Cyrus P Zabetian; Haydeh Payami
Journal: PLoS Genet Date: 2011-08-18 Impact factor: 5.917

8. Identifying rare disease variants in the Genetic Analysis Workshop 17 simulated data: a comparison of several statistical approaches.

Authors: Ruixue Fan; Chien-Hsun Huang; Shaw-Hwa Lo; Tian Zheng; Iuliana Ionita-Laza
Journal: BMC Proc Date: 2011-11-29

9. Rheumatoid arthritis-associated gene-gene interaction network for rheumatoid arthritis candidate genes.

Authors: Chien-Hsun Huang; Lei Cong; Jun Xie; Bo Qiao; Shaw-Hwa Lo; Tian Zheng
Journal: BMC Proc Date: 2009-12-15

10. Non-replication of genome-wide based associations between common variants in INSIG2 and PFKP and obesity in studies of 18,014 Danes.

Authors: Camilla H Andreasen; Mette S Mogensen; Knut Borch-Johnsen; Annelli Sandbaek; Torsten Lauritzen; Thorkild I A Sørensen; Lars Hansen; Katrine Almind; Torben Jørgensen; Oluf Pedersen; Torben Hansen
Journal: PLoS One Date: 2008-08-06 Impact factor: 3.240

10 in total

2 in total

1. Pharmacogenomics study of thiazide diuretics and QT interval in multi-ethnic populations: the cohorts for heart and aging research in genomic epidemiology.

Authors: A A Seyerle; C M Sitlani; R Noordam; S M Gogarten; J Li; X Li; D S Evans; F Sun; M A Laaksonen; A Isaacs; K Kristiansson; H M Highland; J D Stewart; T B Harris; S Trompet; J C Bis; G M Peloso; J A Brody; L Broer; E L Busch; Q Duan; A M Stilp; C J O'Donnell; P W Macfarlane; J S Floyd; J A Kors; H J Lin; R Li-Gao; T Sofer; R Méndez-Giráldez; S R Cummings; S R Heckbert; A Hofman; I Ford; Y Li; L J Launer; K Porthan; C Newton-Cheh; M D Napier; K F Kerr; A P Reiner; K M Rice; J Roach; B M Buckley; E Z Soliman; R de Mutsert; N Sotoodehnia; A G Uitterlinden; K E North; C R Lee; V Gudnason; T Stürmer; F R Rosendaal; K D Taylor; K L Wiggins; J G Wilson; Y-Di Chen; R C Kaplan; K Wilhelmsen; L A Cupples; V Salomaa; C van Duijn; J W Jukema; Y Liu; D O Mook-Kanamori; L A Lange; R S Vasan; A V Smith; B H Stricker; C C Laurie; J I Rotter; E A Whitsel; B M Psaty; C L Avery
Journal: Pharmacogenomics J Date: 2017-07-18 Impact factor: 3.550

2. Modeling gene-environment interactions in longitudinal family studies: a comparison of methods and their application to the association between the IGF pathway and childhood obesity.

Authors: Cheng Wang; Marie-Hélène Roy-Gagnon; Jean-François Lefebvre; Kelly M Burkett; Lise Dubois
Journal: BMC Med Genet Date: 2019-01-11 Impact factor: 2.103

2 in total