Literature DB >> 25914450

A method to associate all possible combinations of genetic and environmental factors using GxE landscape plot.

Satoshi Nagaie1, Soichi Ogishima1, Jun Nakaya2, Hiroshi Tanaka3.   

Abstract

UNLABELLED: Genome-wide association studies (GWAS) and linkage analysis has identified many single nucleotide polymorphisms (SNPs) related to disease. There are many unknown SNPs whose minor allele frequencies (MAFs) as low as 0.005 having intermediate effects with odds ratio between 1.5~3.0. Low frequency variants having intermediate effects on disease pathogenesis are believed to have complex interactions with environmental factors called gene-environment interactions (GxE). Hence, we describe a model using 3D Manhattan plot called GxE landscape plot to visualize the association of p-values for gene-environment interactions (GxE). We used the Gene-Environment iNteraction Simulator 2 (GENS2) program to simulate interactions between two genetic loci and one environmental factor in this exercise. The dataset used for training contains disease status, gender, 20 environmental exposures and 100 genotypes for 170 subjects, and p-values were calculated by Cochran-Mantel-Haenszel chi-squared test on known data. Subsequently, we created a 3D GxE landscape plot of negative logarithm of the association of p-values for all the possible combinations of genetic and environmental factors with their hierarchical clustering. Thus, the GxE landscape plot is a valuable model to predict association of p-values for GxE and similarity among genotypes and environments in the context of disease pathogenesis. ABBREVIATIONS: GxE - Gene-environment interactions, GWAS - Genome-wide association study, MAFs - Minor allele frequencies, SNPs - Single nucleotide polymorphisms, EWAS - Environment-wide association study, FDR - False discovery rate, JPT+CHB - HapMap population of Japanese in Tokyo, Japan - Han Chinese in Beijing.

Entities:  

Year:  2015        PMID: 25914450      PMCID: PMC4404419          DOI: 10.6026/97320630011161

Source DB:  PubMed          Journal:  Bioinformation        ISSN: 0973-2063


Background

There are two main types of research methodologies that lead to the identification of hundreds of genetic variants associated with disease onset. One of them is GWAS, which has emerged as a powerful and successful tool to identify common human disease alleles by using high-throughput genotyping technology [1]. GWAS aims to detect common variants with small effects. The other is linkage analysis, which aims to establish linkages between genes using family relationships [2]. The linkage approach is often used in discovering rare variants with major effects. However, low-frequency variants with intermediate effects have not been captured by GWAS and linkage analysis, due to insufficient frequencies and effect sizes [3]. These variants are expected to have complex interactions with environmental factors called geneenvironment interactions (GxE). GxEs are not mere additive or synergistic interactions, but are complex interactions. For a very specific GxE interaction, the association p-value and risk of disease is high. Le Marchand et al. showed that the relative risk of colorectal cancer was 8.8 for specific combination of environmental factors (smoking and preference for well-done meat) and metabolic enzymes (CYP1A2 or NAT2 rapid or slow metabolizers), unlike conventional methods where the risk is calculated by the product of individual factors [4]. As opposed to GWAS where the disease risk is attributed to genetic factors, Butte et al. proposed an environment-wide association study (EWAS) approach [5]. In addition to using both the environmental risk factors obtained from EWAS and the risk SNPs from GWAS, the combination of SNP (rs13266634) and trans-β- carotene was a significant GxE item correlated to type 2 diabetes. The per-risk-allele effect sizes in subjects with low serum levels of trans-β- carotene were 40% greater than the marginal effect size. EWAS offers an unbiased consideration of environmental and genetic factors that is useful in identification of larger and more relevant effect sizes for disease associations [6]. However, Butte et al. showed a Manhattan plot for genome-wide genetic factors, but for very limited environmental factors. We describe a prediction model using 3D Manhattan plot called GxE landscape plot, in negative logarithm of significance values associated with disease pathogenesis for genotype factors and environmental factors with their hierarchal clustering as a prediction model. The GxE landscape plot enables us to comprehensively visualize negative logarithm of p-values for combinations of associated genetic and environmental factors. This is a useful model to predict similarity of genotypes and environments in the context of association p-values with disease using hierarchical clustering.

Methodology

Overview of methodology for creating GxE landscape plot:

There are 4 steps to create the GxE landscape plot (Figure 1). (Step 1) Dataset creation. A data set should contain a disease status, gender, environmental exposures and genotypes for each subject. In this study, to prove validity of our prediction model, we generated a data set by using simuGWAS, which was developed in simuPOP [7] and GENS2 program [8]. We iterated this simulation 20 times using GENS2. (Step 2) The association p-values with disease were calculated for all the possible combinations of genetic and environmental factors by using the Cochran-Mantel-Haenszel chi-squared test on 3 × 2 × 2 contingency tables. (Step 3) Clustering analysis was applied for calculated negative logarithm of p-values for genotypes and environmental factors. (Step 4) The GxE landscape plot was generated plotting negative logarithm of significance level for all possible combinations of genotype and environmental factors. In the GxE landscape plot, significant p-values are reflected as peaks, and insignificant p-values as plateaus.
Figure 1

Overview of methodology for creating GxE landscape plot. Chart of the 4 steps that have been used to generate GxE landscape plot with simulated data using GENS2 program: (1) preparation of data set by GENS program, (2) calculation of pvalues by Cochran-Mantel-Haenszel chi-squared test on 3 × 2 × 2 contingency table, (3) hierarchical clustering for genetic and environmental factors, (4) visualization of association p-values for all the combinations of genetic and environmental factors.

Training dataset creation by simulation of gene-gene and gene-environment interactions:

GENS2 is a program based on data with realistic patterns of linkage disequilibrium, and exerts no constraints on the number of individuals to be simulated or the number of nonpredisposing genetic/environmental factors to be considered. GENS2 tool can simulate gene-environment and gene-gene interactions. We simulated interactions among genetic loci and environmental factors using GENS2. Population data for the genomic model (JPT+CHB chr7) was downloaded from the HapMap3 [9] database and converted by simuGWAS for use in the GENS2 program. This population data for the simulation of genomic factors was set to be the initial population. In the simulation process, we utilized 170 sample sizes. Also a geneenvironment model was used, with the following parameters: disease predisposing loci (DPL): rs1881690, rs1979600, rs4960568, rs6972501, rs7793905, rs936997; high and low risk allele; dominance parameter; relative risk for high risk homozygote; environmental parameters odds ratio; and disease environmental variable distribution parameters: mean, standard deviation, disease penetrance. If the disease status value is greater than that of 75th percentile, we set the value to 1; otherwise, 0. We removed 23,277 SNPs not possessing the 3 genotypes, out of 75,320 SNPs. In total, 100 SNPs were used for our analysis; 6 SNPs were associated with disease and the other 94 SNPs were randomly selected. We iterated this simulation 20 times to obtain 20 populations. We assumed that 20 populations were divided into two major population groups, and we made each major population group have a specific gene-environment interaction to cause disease.

Calculation of p-values:

To detect SNPs associated with environmental factors, we applied Cochran-Mantel-Haenszel chi-squared test to estimate the statistical significance of gene-environment interaction using the R 3.1.2 statistical software with the Bioconductor package [10, 11]. For each statistical test, we obtained p-values from multiple hypothetical tests that were adjusted by a false discovery rate (FDR) [12].

Creating GxE landscape plot:

We created a novel 3D Manhattan plot, called a GxE landscape plot, of the negative logarithm of associated significance values with disease pathogenesis for genotype factors and environmental factors with their hierarchal clustering as a prediction model. Hierarchical clustering was applied on 100 genetic factors (SNPs) and 20 environmental factors, performed using the R statistical software with Pearson׳s correlation coefficient as a similarity index and using the complete linkage method as an agglomeration. Negative logarithms of p-values were corrected for multiple comparisons by using Benjamini and Hochberg׳s method.

Discussion

Figure 2 shows a prediction model with a 3D Manhattan plot called GxE landscape plot showing negative logarithm of pvalues for all the possible combinations of genotypes and environmental factors. The genotypes and environmental factors were hierarchically clustered. The genetic and environmental factors were roughly clustered into two groups in combination of genetic and environmental factors depicted as two peaks surrounded by yellow ellipse in Figure 2A and Figure 2B. We iterated GENE2 simulation 20 times to obtain 20 populations. The 20 populations were divided into two major population groups. This is simulated to make each major population group with specific gene-environment interactions to cause the disease.
Figure 2

GxE landscape plot. 3D GxE landscape plot of negative logarithm of the association p-values for all the possible combinations of genetic and environmental factors with their hierarchical clustering. Genetic and environmental factors were roughly clustered into two groups in combination of genetic and environmental factors, depicted as two peaks surrounded by yellow ellipse. (A) GxE landscape plot with hierarchical clustering for genetic and environmental factors. (B) Hierarchical clustering of genetic factors (SNPs) and 20 environmental factors. The data are shown in a table format, in which rows represent individual environmental factors and columns represent individual SNPs. The color in each cell reflects the negative logarithm of the association p-values for combinations of genetic and environmental factors. As for genetic factors, except for 2 SNPs, other SNPs were clustered into two groups surrounded by blue and red squares. As for environmental factors, all the environmental factors were also clustered into two groups clearly surrounded by blue and red squares.

All SNPs (6 SNPs) except 2 SNPs were clustered into two groups for genetic factors; 4 SNPs out of 6 SNPs (66.6% of SNPs) were correctly clustered on GxE landscape plot. Clustering of genetic factors is considered to reflect linkage disequilibrium among genes. It is observed that all the environmental factors were clustered into two groups clearly; 20 environmental factors out of 20 environmental factors (100% of environmental factors) were correctly clustered on GxE landscape plot. Thus, our model can predict underlying cluster of populations with high accuracy (100% for environmental factors, 66.6% for genetic factors). It showed that the model predicts similarity of genetic factors and environmental factors associated with the disease. It should be noted that the pvalues obtained in this study are based on the pretext that the confounding factors between the genotype and the environment are removed. The model GxE landscape plot allows us to clearly visualize the distinctive features of the populations, and also allows us to detect novel genetic and environmental factor interactions. The Butte׳s EWAS approach is a comprehensive testing and screening manner for gene-environment interactions [5], but is a limited visualization for environmental factors based on Manhattan plot. On the other hand, this method is a comprehensive visualization of association p-values for all the possible genomic and environmental factors. The GxE landscape plot enables us to overview landscape of association significance level for GxE. There is a limitation in the process of data analysis at the 75th percentile. It should be noted that the value is set at 1 if the disease status value is greater than the value of 75th and 0 otherwise for the statistical test. However, for a more accurate representation of the data, it would be more desirable to use dispersion values instead. Dispersion values provide more detailed information about the disease status. The slope of the landscape is equivalent to the rate of change of the p-values. It is possible to infer the disease pathogenesis using the change of slope in the landscape. For example, if the peak of the landscape is steep, and the others low; it can be recommended for the subject to refrain from partaking in the environmental factor that results in high peaks.

Conclusion

GWAS and linkage analysis have identified many SNPs related to several diseases. However, there remain many unknown SNPs whose MAFs are low having intermediate effects. Low frequency variants having intermediate effects on disease pathogenesis are believed to have complex GxE. We describe a model using 3D Manhattan plot called GxE landscape plot to visualize association p-values for GxE. We used GENS2 to simulate interactions between two genetic loci and one environmental factor, and p-values were calculated by Cochran-Mantel-Haenszel chi-squared test on simulation data. We thus created a 3D GxE landscape plot of negative logarithm of the association p-values for all the possible combinations of genetic and environmental factors with their hierarchical clustering. The GxE landscape plot is a valuable model to predict similarity among genotypes and environments in the context of association p-values with disease pathogenesis.
  9 in total

1.  simuPOP: a forward-time population genetics simulation environment.

Authors:  Bo Peng; Marek Kimmel
Journal:  Bioinformatics       Date:  2005-07-14       Impact factor: 6.937

2.  Combined effects of well-done red meat, smoking, and rapid N-acetyltransferase 2 and CYP1A2 phenotypes in increasing colorectal cancer risk.

Authors:  L Le Marchand; J H Hankin; L R Wilkens; L M Pierce; A Franke; L N Kolonel; A Seifried; L J Custer; W Chang; A Lum-Jones; T Donlon
Journal:  Cancer Epidemiol Biomarkers Prev       Date:  2001-12       Impact factor: 4.254

3.  The future of genetic studies of complex human diseases.

Authors:  N Risch; K Merikangas
Journal:  Science       Date:  1996-09-13       Impact factor: 47.728

4.  Integrating common and rare genetic variation in diverse human populations.

Authors:  David M Altshuler; Richard A Gibbs; Leena Peltonen; David M Altshuler; Richard A Gibbs; Leena Peltonen; Emmanouil Dermitzakis; Stephen F Schaffner; Fuli Yu; Leena Peltonen; Emmanouil Dermitzakis; Penelope E Bonnen; David M Altshuler; Richard A Gibbs; Paul I W de Bakker; Panos Deloukas; Stacey B Gabriel; Rhian Gwilliam; Sarah Hunt; Michael Inouye; Xiaoming Jia; Aarno Palotie; Melissa Parkin; Pamela Whittaker; Fuli Yu; Kyle Chang; Alicia Hawes; Lora R Lewis; Yanru Ren; David Wheeler; Richard A Gibbs; Donna Marie Muzny; Chris Barnes; Katayoon Darvishi; Matthew Hurles; Joshua M Korn; Kati Kristiansson; Charles Lee; Steven A McCarrol; James Nemesh; Emmanouil Dermitzakis; Alon Keinan; Stephen B Montgomery; Samuela Pollack; Alkes L Price; Nicole Soranzo; Penelope E Bonnen; Richard A Gibbs; Claudia Gonzaga-Jauregui; Alon Keinan; Alkes L Price; Fuli Yu; Verneri Anttila; Wendy Brodeur; Mark J Daly; Stephen Leslie; Gil McVean; Loukas Moutsianas; Huy Nguyen; Stephen F Schaffner; Qingrun Zhang; Mohammed J R Ghori; Ralph McGinnis; William McLaren; Samuela Pollack; Alkes L Price; Stephen F Schaffner; Fumihiko Takeuchi; Sharon R Grossman; Ilya Shlyakhter; Elizabeth B Hostetter; Pardis C Sabeti; Clement A Adebamowo; Morris W Foster; Deborah R Gordon; Julio Licinio; Maria Cristina Manca; Patricia A Marshall; Ichiro Matsuda; Duncan Ngare; Vivian Ota Wang; Deepa Reddy; Charles N Rotimi; Charmaine D Royal; Richard R Sharp; Changqing Zeng; Lisa D Brooks; Jean E McEwen
Journal:  Nature       Date:  2010-09-02       Impact factor: 49.962

5.  An Environment-Wide Association Study (EWAS) on type 2 diabetes mellitus.

Authors:  Chirag J Patel; Jayanta Bhattacharya; Atul J Butte
Journal:  PLoS One       Date:  2010-05-20       Impact factor: 3.240

6.  Systematic identification of interaction effects between genome- and environment-wide associations in type 2 diabetes mellitus.

Authors:  Chirag J Patel; Rong Chen; Keiichi Kodama; John P A Ioannidis; Atul J Butte
Journal:  Hum Genet       Date:  2013-01-20       Impact factor: 4.132

Review 7.  Finding the missing heritability of complex diseases.

Authors:  Teri A Manolio; Francis S Collins; Nancy J Cox; David B Goldstein; Lucia A Hindorff; David J Hunter; Mark I McCarthy; Erin M Ramos; Lon R Cardon; Aravinda Chakravarti; Judy H Cho; Alan E Guttmacher; Augustine Kong; Leonid Kruglyak; Elaine Mardis; Charles N Rotimi; Montgomery Slatkin; David Valle; Alice S Whittemore; Michael Boehnke; Andrew G Clark; Evan E Eichler; Greg Gibson; Jonathan L Haines; Trudy F C Mackay; Steven A McCarroll; Peter M Visscher
Journal:  Nature       Date:  2009-10-08       Impact factor: 49.962

8.  Simulating gene-gene and gene-environment interactions in complex diseases: Gene-Environment iNteraction Simulator 2.

Authors:  Michele Pinelli; Giovanni Scala; Roberto Amato; Sergio Cocozza; Gennaro Miele
Journal:  BMC Bioinformatics       Date:  2012-06-14       Impact factor: 3.169

9.  The NHGRI GWAS Catalog, a curated resource of SNP-trait associations.

Authors:  Danielle Welter; Jacqueline MacArthur; Joannella Morales; Tony Burdett; Peggy Hall; Heather Junkins; Alan Klemm; Paul Flicek; Teri Manolio; Lucia Hindorff; Helen Parkinson
Journal:  Nucleic Acids Res       Date:  2013-12-06       Impact factor: 16.971

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.