Literature DB >> 22373371

Pathway-based joint effects analysis of rare genetic variants using Genetic Analysis Workshop 17 exon sequence data.

Pingzhao Hu1, Wei Xu, Lu Cheng, Xiang Xing, Andrew D Paterson.   

Abstract

Pathway-based analysis has been recently used in joint tests of association between disease and a group of common genetic variants. Here we explore this idea for the joint effects analysis of rare genetic variants and their association with quantitative traits and disease. We accumulate multiple rare minor alleles in a genetic risk score for each individual in a given pathway; this score is then used to assess association with quantitative phenotypes and disease. We demonstrate that this approach may be better than studying single rare variants or a gene risk score for identifying individuals with significantly greater risk.

Entities:  

Year:  2011        PMID: 22373371      PMCID: PMC3287882          DOI: 10.1186/1753-6561-5-S9-S45

Source DB:  PubMed          Journal:  BMC Proc        ISSN: 1753-6561


Background

In the past few years, genome-wide association studies have been widely used to identify genetic risk factors for complex diseases. This analysis paradigm has made significant progress in many genetic studies. For example, so far, many variants have been discovered that are associated with common diseases, such as type 2 diabetes [1]. To date, however, the utility of genetic markers to improve disease risk prediction still explains only a small proportion of the genetic variance for many complex diseases. This missing heritability might be explained by common variants with weak effects and/or additional rare variants with strong or weak effects, acting additively and/or interacting with other genetic and environmental variants [2]. For the common variants explanation, multilocus-based genetic risk score and pathway-based methods have been developed. For example, the use of a multilocus genetic risk score has been proposed to evaluate the risk of breast cancer and its subtypes [3]. A pathway-based analysis strategy has been used to search for related genes and common single-nucleotide polymorphisms (SNPs) that contribute to Parkinson’s disease [4]. For rare variants, methods for statistical analysis are still limited. Here, we integrate the multilocus genetic risk score and pathway analysis strategies used for common variants to analyze rare genetic factors and evaluate their association with quantitative phenotypes and disease.

Methods

Data description

We use replicate 1 with 697 unrelated individuals from the Genetic Analysis Workshop 17 (GAW17) data. There are 24,487 autosomal SNPs. All SNPs and samples have a genotype call rate of 100%. We define rare SNPs as those with a minor allele frequency (MAF) less than 1%. For the phenotype data, there are three quantitative risk factors (Q1, Q2, and Q4), which are simulated as normally distributed phenotypes, and one disease risk factor, for which there are 209 case subjects and 488 control subjects. Genes influencing Q1 and the disease are primarily from the vascular endothelial growth factor (VEGF) pathway, whereas those influencing Q2 are primarily associated with cardiovascular disease risk and inflammation. There are no causal genes or pathways related to Q4 [5].

Univariate SNP association analysis

For each rare SNP, we performed linear regression analysis for association between genotypes and Q1, Q2, and Q4 and logistic regression analysis for association between genotypes and disease status. We adjusted for sex, age, smoking, and population stratification by generating two dummy covariates for the three populations: Asian, African, and European. Genotypes were coded additively. We obtained the corresponding adjusted odds ratio (OR) and P-value for each rare SNP.

Joint effects analysis of rare genetic variants

We analyzed the joint effects of rare genetic variants at both the gene and the pathway level. The objective was to test for association of an aggregation of rare minor alleles with quantitative phenotypes and disease by combing genetic information across multiple variants within a given gene or pathway, respectively. To do this, we first mapped rare SNPs to genes and pathways as follows. In step 1, we mapped rare SNPs to genes. We obtained the nearest gene name for each rare SNP from the snp_info file provided by GAW17. In step 2, we mapped the genes from step 1 to the c2 curated canonical pathways (version 3) from the Broad Institute (http://www.broadinstitute.org/gsea/msigdb/). This database includes 888 gene sets collected from 186 pathways from the Kyoto Encyclopedia of Genes and Genomes (KEGG) (http://www.genome.jp/kegg/), 430 pathways from Reactome, 217 pathways from BioCarta among others. We kept only the pathways with at least five genes in our data set, which left 472 pathways for follow-up analysis. We note that each of these three databases includes the VEGF causal pathway. In the next step, we defined a genetic risk score for each individual as the count of minor alleles of all rare variants in a given gene or pathway [6]. Finally, we performed linear and logistic regression analyses to test for the association of the count of minor alleles with the traits (Q1, Q2, Q4, and disease) in each gene and pathway, respectively, by adjusting for sex, age, smoking, and population stratification, similar to the univariate SNP association analysis.

Results

We removed 1,314 SNPs that had a Hardy-Weinberg equilibrium test P-value smaller than 1 × 10−6 in control subjects, leaving 23,173 SNPs. We performed multi-dimensional scaling analysis using these remaining SNPs and identified seven outlier samples (Figure 1): One outlier is European (red circles) and clusters with the Asian group (green circles), and the other six outliers are African (black circles) and are separate from the major African cluster (lower right-hand corner). The seven outlier samples include five control and two case subjects. We removed these outliers in subsequent analyses.
Figure 1

Multidimensional scaling analysis of 697 samples from the 1000 Genomes Project. We used 23,173 SNPs in the multidimensional scaling analysis and removed seven outlier samples (one European and six Africans) in the subsequent analysis. Red circles, Europeans; green circles, Asians; black circles, Africans.

Multidimensional scaling analysis of 697 samples from the 1000 Genomes Project. We used 23,173 SNPs in the multidimensional scaling analysis and removed seven outlier samples (one European and six Africans) in the subsequent analysis. Red circles, Europeans; green circles, Asians; black circles, Africans. We focused on 18,094 rare variants (MAF < 1%) from the 23,173 SNPs that passed quality control. Overall, we did not find any rare variants with significant association for Q2, Q4, and disease at the Bonferroni-corrected significance level of 0.05 (corresponding to an unadjusted P-value of 2.8 × 10−6). We did find three rare variants that were significantly associated with Q1 (Table 1). One of these (C13S524) is in causal gene FLT1, whereas the other two are not in any causal gene.
Table 1

Identified significant rare genetic variants for Q1

SNPChromosomePositionMAFP-valueβ (standard error)In causal gene?
C13S5241327,899,9150.00432.33 × 10−71.92 (0.37)Yes (FLT1)
C2S23552112,864,1550.00876.43 × 10−71.31 (0.26)No (RGPD8)
C2S21742107,855,1740.00942.66 × 10−61.04 (0.22)No (RGPD4)
Identified significant rare genetic variants for Q1 The 18,094 rare variants mapped to 2,439 genes. Of these 2,439 genes, 911 had 1 rare SNP in each gene, 655 had 2–5 rare SNPs, and 873 had more than 5 rare SNPs. The joint effects analysis of rare genetic variants at the gene level did not identify significant association for Q2, Q4, or disease at the Bonferroni-corrected significance level of 0.05 (corresponding to an unadjusted P-value of 2.1 × 10−5). We found four genes that were significantly associated with Q1 (Table 2), two of which (FLT1 and KDR) were causal genes and two of which (RGPD8 and EPHB1) were not causal genes.
Table 2

Significant association of genes with Q1

GeneChromosomeNumber of rare SNPsP-valueβ (standard error)Causal genes?
RGPD8232.55 × 10−81.30 (0.23)No
FLT113257.37 × 10−80.67 (0.12)Yes
KDR4148.52 × 10−70.88 (0.18)Yes
EPHB1361.53 × 10−50.81 (0.19)No
Significant association of genes with Q1 The 2,439 genes that include at least one rare variant were mapped to 472 canonical pathways. We did not find significant association of pathways with either Q2 or Q4 at a Bonferroni-corrected significance level of 0.05 (corresponding to an unadjusted P-value of 1.0 × 10−4). The most significant pathway for Q2 was Reactome Phase II Conjugation (P = 6.5 × 10−4), which did not include causal genes. As described before, each of the three pathway databases (KEGG, Reactome, and BioCarta) includes the VEGF causal pathway for Q1 and disease. We report the association results of this causal pathway in Table 3, which shows that the VEGF causal pathway in the three databases is significantly associated with Q1 but is significantly associated with disease only in the Reactome and BioCarta databases. The likely reason for this is that the VEGF pathway genes in KEGG include fewer causal genes for either Q1 or disease than those in either Reactome and BioCarta. All 5 genes of the VEGF pathway in the Reactome database are causal genes for Q1; 9 of 11 genes of the VEGF pathway in the BioCarta database are causal for either Q1 (6) or disease (3); and only 7 of the 17 genes in the VEGF pathway in KEGG are causal for either Q1 (2) or disease (5).
Table 3

Association of rare SNPs in the VEGF pathway with traits in three databases

TraitDatabaseNumber of rare SNPsNumber of genes in GAW17 data setNumber of causal genesP-valueRank of significance
Q1Reactome53551.77 × 10−15*1
BioCarta951166.81 × 10−14*2
KEGG661721.83 × 10−7*11

DiseaseReactome53504.95 × 10−5*3
BioCarta951134.69×10−5*2
KEGG661752.17×10−44

* Significant at corrected significance level of 0.05.

Association of rare SNPs in the VEGF pathway with traits in three databases * Significant at corrected significance level of 0.05. Because the VEGF pathway is significantly associated with disease, as shown in Table 3, we present in Figure 2 the distributions of minor allele count in case and control subjects for this pathway in the BioCarta and Reactome databases. The figure clearly shows that the rare minor allele count is significantly greater in case subjects than in control subjects.
Figure 2

Distribution of minor allele counts in case and control subjects. The frequencies of minor alleles in different bins were estimated for the VEGF pathway in the Reactome and BioCarta databases. Red bars, case subjects; blue bars, control subjects.

Distribution of minor allele counts in case and control subjects. The frequencies of minor alleles in different bins were estimated for the VEGF pathway in the Reactome and BioCarta databases. Red bars, case subjects; blue bars, control subjects. To evaluate whether the significance of the causal pathways in different databases is driven by only one rare SNP out of each gene in a given pathway, we did a leave-one-out cross-validation (LOOCV). To do this test, for each pathway we first removed one rare SNP from each gene and then recalculated the genetic risk score for each individual as the accumulation of minor alleles for all rare variants in a given pathway by excluding the rare SNP. Finally, we performed the regression analysis described earlier. We repeated these steps for all the rare SNPs in a given pathway. Table 4 shows the results for the VEGF causal pathways for Q1 and disease. The table clearly shows that the significance of the VEGF causal pathway for Q1 is not affected by one single rare SNP but by multiple rare minor alleles in the pathway, whereas the disease risk is driven by a small number of truly associated rare SNPs. For example, all five SNPs in the BioCarta database and all six SNPs in the Reactome database are in causal genes for either Q1 or disease.
Table 4

LOOCV rare SNP results for VEGF pathway with trait in three databases

TraitDatabaseNumber of rare SNPsP-value (all SNPs)One SNP removed at a time

Number of times P ≤ 1 × 10−4 aNumber of times P > 1 × 10−4
Q1BioCarta956.81 × 10−14950
Reactome531.77 × 10−15530
KEGG661.83 × 10−7660

DiseaseBioCarta954.69 × 10−5905
Reactome534.95 × 10−5476
KEGG662.17 × 10−4165

a At corrected significance level 0.05.

LOOCV rare SNP results for VEGF pathway with trait in three databases a At corrected significance level 0.05. Because our analysis has narrowed the associations to the VEGF pathway in the three databases, we further evaluated the associations between SNPs in the pathway with Q1 and disease using a smaller Bonferroni correction based on the number of SNPs in the pathway in each of the three databases (see P-value column in Table 5). Based on the significance levels, we counted the number of true positives (causal SNPs significantly associated with Q1 or disease) and estimated the power for Q1 and disease in the three databases. Our results show that the two-stage approach (identifying the causal pathway and then detecting SNP associations in the causal pathway) has slightly larger power than trying to detect SNP associations directly for Q1 (see Tables 2 and 5).
Table 5

Power of rare SNPs in the VEGF pathway

TraitDatabaseNumber of rare SNPsNumber of causal rare SNPsP-valueaPower (%) (true positives)
Q1BioCarta95339.34 × 10−49.1 (3)
Reactome53255.26 × 10−416.0 (4)
KEGG66117.58 × 10−418.2 (2)

DiseaseBioCarta9559.34 × 10−40 (0)
Reactome5305.26 × 10−40 (0)
KEGG6677.58 × 10−40 (0)

a Significant at corrected significance level 0.05.

Power of rare SNPs in the VEGF pathway a Significant at corrected significance level 0.05.

Discussion and conclusions

In this study, we evaluated the associations between rare genetic variants and quantitative traits or disease status at the SNP, gene, and pathway levels. Overall, we did not find significant associations at the SNP, gene, and pathway levels for Q2 and Q4, but we found that the VEGF causal pathway is significantly associated with both Q1 and disease status. We also found that one causal SNP and two causal genes are significantly associated with Q1. We further confirmed that these enriched pathway signals are not driven by a single rare SNP but by multiple rare variants in multiple genes in the pathways for Q1. We assumed in our analysis that all minor alleles influenced each trait with the same direction of effect, because the minor allele in the simulated data is associated with higher means of quantitative traits (Q1 and Q2) and liability (disease). However, this assumption may not hold in real data, resulting in decreased power to detect causal genes. We have also observed that, although most of the causal genes for Q1 are not the causal genes for disease, those causal genes for Q1 actually play a key role in disease (such as the five genes of the VEGF pathway in the Reactome database). Our results show that a simple but efficient pathway-based analysis of rare genetic variants can identify potential genetic risk factors that were missed in the SNP- and gene-level analysis. Our study does have some limitations. One is that our analysis was focused on only one simulated data set. A better strategy would be to analyze all simulated data sets. Another limitation is that we assumed that there was a linear relationship between the traits (Q1, Q2, and Q4) and the covariates used in the study. We observed that some of the covariates had a nonlinear relationship with the traits. Therefore nonlinear regression models may be more suitable. We will explore these models in detail in the future.

Competing interests

The authors declare that there are no competing interests.

Authors’ contributions

PH designed the study, performed the data analysis and drafted the manuscript. WX and ADP participated in designing the study. ADP supervised the study. All authors participated in the statistical analysis and helped to draft the manuscript. All authors read and approved the final manuscript.
  6 in total

Review 1.  Rare variant association analysis methods for complex traits.

Authors:  Jennifer Asimit; Eleftheria Zeggini
Journal:  Annu Rev Genet       Date:  2010       Impact factor: 16.830

2.  Pathway-based approaches for analysis of genomewide association studies.

Authors:  Kai Wang; Mingyao Li; Maja Bucan
Journal:  Am J Hum Genet       Date:  2007-12       Impact factor: 11.025

Review 3.  Statistical analysis of rare sequence variants: an overview of collapsing methods.

Authors:  Carmen Dering; Claudia Hemmelmann; Elizabeth Pugh; Andreas Ziegler
Journal:  Genet Epidemiol       Date:  2011       Impact factor: 2.135

4.  Incidence of breast cancer and its subtypes in relation to individual and multiple low-penetrance genetic susceptibility loci.

Authors:  Gillian K Reeves; Ruth C Travis; Jane Green; Diana Bull; Sarah Tipper; Krys Baker; Valerie Beral; Richard Peto; John Bell; Diana Zelenika; Mark Lathrop
Journal:  JAMA       Date:  2010-07-28       Impact factor: 56.272

5.  Twelve type 2 diabetes susceptibility loci identified through large-scale association analysis.

Authors:  Benjamin F Voight; Laura J Scott; Valgerdur Steinthorsdottir; Andrew P Morris; Christian Dina; Ryan P Welch; Eleftheria Zeggini; Cornelia Huth; Yurii S Aulchenko; Gudmar Thorleifsson; Laura J McCulloch; Teresa Ferreira; Harald Grallert; Najaf Amin; Guanming Wu; Cristen J Willer; Soumya Raychaudhuri; Steve A McCarroll; Claudia Langenberg; Oliver M Hofmann; Josée Dupuis; Lu Qi; Ayellet V Segrè; Mandy van Hoek; Pau Navarro; Kristin Ardlie; Beverley Balkau; Rafn Benediktsson; Amanda J Bennett; Roza Blagieva; Eric Boerwinkle; Lori L Bonnycastle; Kristina Bengtsson Boström; Bert Bravenboer; Suzannah Bumpstead; Noisël P Burtt; Guillaume Charpentier; Peter S Chines; Marilyn Cornelis; David J Couper; Gabe Crawford; Alex S F Doney; Katherine S Elliott; Amanda L Elliott; Michael R Erdos; Caroline S Fox; Christopher S Franklin; Martha Ganser; Christian Gieger; Niels Grarup; Todd Green; Simon Griffin; Christopher J Groves; Candace Guiducci; Samy Hadjadj; Neelam Hassanali; Christian Herder; Bo Isomaa; Anne U Jackson; Paul R V Johnson; Torben Jørgensen; Wen H L Kao; Norman Klopp; Augustine Kong; Peter Kraft; Johanna Kuusisto; Torsten Lauritzen; Man Li; Aloysius Lieverse; Cecilia M Lindgren; Valeriya Lyssenko; Michel Marre; Thomas Meitinger; Kristian Midthjell; Mario A Morken; Narisu Narisu; Peter Nilsson; Katharine R Owen; Felicity Payne; John R B Perry; Ann-Kristin Petersen; Carl Platou; Christine Proença; Inga Prokopenko; Wolfgang Rathmann; N William Rayner; Neil R Robertson; Ghislain Rocheleau; Michael Roden; Michael J Sampson; Richa Saxena; Beverley M Shields; Peter Shrader; Gunnar Sigurdsson; Thomas Sparsø; Klaus Strassburger; Heather M Stringham; Qi Sun; Amy J Swift; Barbara Thorand; Jean Tichet; Tiinamaija Tuomi; Rob M van Dam; Timon W van Haeften; Thijs van Herpt; Jana V van Vliet-Ostaptchouk; G Bragi Walters; Michael N Weedon; Cisca Wijmenga; Jacqueline Witteman; Richard N Bergman; Stephane Cauchi; Francis S Collins; Anna L Gloyn; Ulf Gyllensten; Torben Hansen; Winston A Hide; Graham A Hitman; Albert Hofman; David J Hunter; Kristian Hveem; Markku Laakso; Karen L Mohlke; Andrew D Morris; Colin N A Palmer; Peter P Pramstaller; Igor Rudan; Eric Sijbrands; Lincoln D Stein; Jaakko Tuomilehto; Andre Uitterlinden; Mark Walker; Nicholas J Wareham; Richard M Watanabe; Gonçalo R Abecasis; Bernhard O Boehm; Harry Campbell; Mark J Daly; Andrew T Hattersley; Frank B Hu; James B Meigs; James S Pankow; Oluf Pedersen; H-Erich Wichmann; Inês Barroso; Jose C Florez; Timothy M Frayling; Leif Groop; Rob Sladek; Unnur Thorsteinsdottir; James F Wilson; Thomas Illig; Philippe Froguel; Cornelia M van Duijn; Kari Stefansson; David Altshuler; Michael Boehnke; Mark I McCarthy
Journal:  Nat Genet       Date:  2010-07       Impact factor: 38.330

6.  Genetic Analysis Workshop 17 mini-exome simulation.

Authors:  Laura Almasy; Thomas D Dyer; Juan Manuel Peralta; Jack W Kent; Jac C Charlesworth; Joanne E Curran; John Blangero
Journal:  BMC Proc       Date:  2011-11-29
  6 in total
  5 in total

1.  Pathway-based approach using hierarchical components of collapsed rare variants.

Authors:  Sungyoung Lee; Sungkyoung Choi; Young Jin Kim; Bong-Jo Kim; Heungsun Hwang; Taesung Park
Journal:  Bioinformatics       Date:  2016-09-01       Impact factor: 6.937

2.  Pathway analysis approaches for rare and common variants: insights from Genetic Analysis Workshop 18.

Authors:  Stella Aslibekyan; Marcio Almeida; Nathan Tintle
Journal:  Genet Epidemiol       Date:  2014-09       Impact factor: 2.135

3.  Inflated type I error rates when using aggregation methods to analyze rare variants in the 1000 Genomes Project exon sequencing data in unrelated individuals: summary results from Group 7 at Genetic Analysis Workshop 17.

Authors:  Nathan Tintle; Hugues Aschard; Inchi Hu; Nora Nock; Haitian Wang; Elizabeth Pugh
Journal:  Genet Epidemiol       Date:  2011       Impact factor: 2.135

4.  Association between a multi-locus genetic risk score and inflammatory bowel disease.

Authors:  Pingzhao Hu; Aleixo M Muise; Xiang Xing; John H Brumell; Mark S Silverberg; Wei Xu
Journal:  Bioinform Biol Insights       Date:  2013-05-19

5.  Transduction motif analysis of gastric cancer based on a human signaling network.

Authors:  G Liu; D Z Li; C S Jiang; W Wang
Journal:  Braz J Med Biol Res       Date:  2014-05       Impact factor: 2.590

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.