| Literature DB >> 21695280 |
Rosemary Braun1, Kenneth Buetow.
Abstract
Genome-wide association studies (GWAS) have become increasingly common due to advances in technology and have permitted the identification of differences in single nucleotide polymorphism (SNP) alleles that are associated with diseases. However, while typical GWAS analysis techniques treat markers individually, complex diseases (cancers, diabetes, and Alzheimers, amongst others) are unlikely to have a single causative gene. Thus, there is a pressing need for multi-SNP analysis methods that can reveal system-level differences in cases and controls. Here, we present a novel multi-SNP GWAS analysis method called Pathways of Distinction Analysis (PoDA). The method uses GWAS data and known pathway-gene and gene-SNP associations to identify pathways that permit, ideally, the distinction of cases from controls. The technique is based upon the hypothesis that, if a pathway is related to disease risk, cases will appear more similar to other cases than to controls (or vice versa) for the SNPs associated with that pathway. By systematically applying the method to all pathways of potential interest, we can identify those for which the hypothesis holds true, i.e., pathways containing SNPs for which the samples exhibit greater within-class similarity than across classes. Importantly, PoDA improves on existing single-SNP and SNP-set enrichment analyses, in that it does not require the SNPs in a pathway to exhibit independent main effects. This permits PoDA to reveal pathways in which epistatic interactions drive risk. In this paper, we detail the PoDA method and apply it to two GWAS: one of breast cancer and the other of liver cancer. The results obtained strongly suggest that there exist pathway-wide genomic differences that contribute to disease susceptibility. PoDA thus provides an analytical tool that is complementary to existing techniques and has the power to enrich our understanding of disease genomics at the systems-level.Entities:
Mesh:
Year: 2011 PMID: 21695280 PMCID: PMC3111473 DOI: 10.1371/journal.pgen.1002101
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
Procedure for Pathways of Distinction Analysis.
| 1. | For a each pathway |
| 2. | For each gene on the pathway, select associated SNPs (e.g., using dbSNP) and choose the one with the strongest association with case status, determined using Fisher's exact test; |
| 3. | For each sample |
| 4. | Compare the distribution of |
| 5. | Repeat steps 2–5 using permuted case/control labels, and normalize |
| 6. | Compare the distinction score |
Figure 1PoDA applied to simulated data.
Alleles at 50 loci for 250 cases and 250 controls were simulated such that each SNP was in HWE and not associated with case status, but homozygous minor (red) at both loci 1 and 2 or 1 and 3 yielded a three-fold relative risk (a). A 12-SNP pathway comprising SNPs 1–12 shows differential distributions (b); a random 12-SNP pathway does not (c). Boxplots are overlayed on the scatterplots of for clarity.
Figure 2PoDA applied to four highly-significant SNPs.
Shown is the distribution of values in CGEMS cases (red) and controls (black) for a SNP-set comprised of four highly-significant SNPs located in the gene [4]. As expected, there is a substantial difference in case and control values, with the cases having higher (i.e., closer to other cases) than controls. The discreteness of the distributions are due to the fact that with four SNPs, a finite number of values are possible.
PID pathways with significant in the CGEMS breast cancer GWAS.
| Pathway | Source | Length |
|
| O.R. |
|
| Purine metabolism | Kegg | 136 | 1.86 | 6.36e-03 | 1.59 | 4.15e-21 |
| Calcium signaling pathway | Kegg | 100 | 1.38 | 1.82e-03 | 1.55 | 6.99e-20 |
| Melanogenesis | Kegg | 84 | 2.36 | 4.55e-03 | 1.53 | 1.47e-18 |
| Gap junction | Kegg | 80 | 1.54 | 5.45e-03 | 1.49 | 1.49e-16 |
| ErbB signaling pathway | Kegg | 81 | 1.36 | 1.45e-02 | 1.46 | 4.68e-15 |
| Long-term potentiation | Kegg | 60 | 1.71 | 9.09e-04 | 1.45 | 4.34e-15 |
| GnRH signaling pathway | Kegg | 79 | 1.36 | 1.18e-02 | 1.44 | 1.32e-14 |
| TCR signaling in naive CD4+ T cells | NCI-Nature | 60 | 2.11 | 5.45e-03 | 1.42 | 7.80e-13 |
| Prostate cancer | Kegg | 75 | 1.45 | 4.09e-02 | 1.38 | 4.37e-11 |
| PKC-catalyzed phosphorylation myosin phosphatase | BioCarta | 20 | 1.97 |
| 1.30 | 5.82e-09 |
| CCR3 signaling in eosinophils | BioCarta | 21 | 1.59 | 1.09e-02 | 1.29 | 8.86e-08 |
| Biosynthesis of unsaturated fatty acids | Kegg | 18 | 1.69 | 2.45e-02 | 1.26 | 1.38e-06 |
| Attenuation of GPCR signaling | BioCarta | 11 | 1.75 | 1.09e-02 | 1.25 | 2.41e-06 |
| Stathmin and breast cancer resistance to antimicrotubule agents | BioCarta | 18 | 1.84 | 4.82e-02 | 1.24 | 4.96e-06 |
| Visual signal transduction: Cones | NCI-Nature | 20 | 1.56 | 4.73e-02 | 1.24 | 2.24e-06 |
| Dentatorubropallidoluysian atrophy (DRPLA) | Kegg | 11 | 1.84 | 2.73e-03 | 1.24 | 2.24e-06 |
| Intrinsic prothrombin activation pathway | BioCarta | 22 | 1.35 | 3.18e-02 | 1.23 | 4.61e-06 |
| Eicosanoid metabolism | BioCarta | 19 | 1.69 | 1.91e-02 | 1.23 | 3.44e-06 |
| Effects of botulinum toxin | NCI-Nature | 7 | 1.44 | 2.27e-02 | 1.20 | 3.50e-05 |
| Activation of PKC through G-protein coupled receptors | BioCarta | 10 | 1.50 | 9.09e-03 | 1.20 | 8.42e-06 |
| Streptomycin biosynthesis | Kegg | 9 | 1.36 | 3.55e-02 | 1.17 | 1.89e-04 |
| PECAM1 interactions | Reactome | 6 | 2.70 | 5.45e-03 | 1.17 | 7.28e-05 |
| HDL-mediated lipid transport | Reactome | 8 | 1.47 | 2.00e-02 | 1.14 | 1.59e-03 |
| Granzyme A mediated apoptosis pathway | BioCarta | 8 | 1.97 | 1.73e-02 | 1.12 | 6.60e-04 |
(Pathways with over 60% SNPs covered by another pathway have been removed; for the complete list, see Table S1). Pathway-length based resampled -values, denoted , are given for significant pathways, along with the odds ratios and associated FDRs for a logistic regression model.
Figure 3Four significant pathways in breast cancer data.
Scatter plots of for each pathway are overlayed with boxplots are given in the left panel; higher values of indicate that the sample is closer to other cases than it is to other controls. Distributions of for cases (red) and controls (black) are given to the right. A significant shift toward higher values is seen in the cases. Odds ratios and FDR-adjusted OR values are given.
PID pathways with significant in the liver cancer GWAS.
| Pathway | Source | Length |
|
| O.R. |
|
| Cell adhesion molecules (CAMs) | Kegg | 86 | 1.57 | 9.09e-03 | 1.66 | 3.56e-13 |
| ErbB signaling pathway | Kegg | 76 | 1.45 | 3.45e-02 | 1.61 | 2.59e-10 |
| Signaling events mediated by Stem cell factor receptor (c-Kit) | NCI-Nature | 40 | 2.35 | 5.45e-03 | 1.58 | 7.31e-10 |
| Neurotrophic factor-mediated Trk receptor signaling | NCI-Nature | 50 | 1.60 | 2.36e-02 | 1.55 | 2.49e-08 |
| Lissencephaly gene (LIS1) in neuronal migration and development | NCI-Nature | 21 | 2.02 | 7.27e-03 | 1.52 | 1.44e-07 |
| Angiopoietin receptor Tie2-mediated signaling | NCI-Nature | 40 | 2.36 | 1.36e-02 | 1.51 | 5.77e-08 |
| Reelin signaling pathway | NCI-Nature | 28 | 1.62 | 5.45e-03 | 1.46 | 7.35e-08 |
| Syndecan-4-mediated signaling events | NCI-Nature | 27 | 1.74 | 1.64e-02 | 1.46 | 1.19e-06 |
| Galactose metabolism | Kegg | 19 | 1.65 | 2.27e-02 | 1.44 | 5.01e-06 |
| Vibrio cholerae infection | Kegg | 35 | 1.84 | 2.64e-02 | 1.43 | 6.67e-07 |
| Paxillin-independent events mediated by a4b1 and a4b7 | NCI-Nature | 19 | 2.14 | 1.00e-02 | 1.40 | 6.67e-07 |
| Antigen processing and presentation | Kegg | 34 | 3.26 | 1.36e-02 | 1.40 | 3.71e-08 |
| Corticosteroids and Cardioprotection | BioCarta | 21 | 1.98 | 3.55e-02 | 1.39 | 1.24e-05 |
| Lissencephaly gene (Lis1) in neuronal migration and development | BioCarta | 15 | 1.60 | 1.36e-02 | 1.37 | 2.52e-05 |
| IL12 signaling mediated by STAT4 | NCI-Nature | 25 | 1.93 | 4.55e-02 | 1.37 | 1.58e-05 |
| Biosynthesis of unsaturated fatty acids | Kegg | 13 | 1.76 | 1.64e-02 | 1.36 | 6.44e-05 |
| Growth hormone signaling pathway | BioCarta | 18 | 1.75 | 3.18e-02 | 1.36 | 7.46e-05 |
| Canonical Wnt signaling pathway | NCI-Nature | 28 | 1.92 | 4.73e-02 | 1.35 | 9.36e-06 |
| NO2-dependent IL-12 pathway in NK cells | BioCarta | 8 | 1.82 | 2.73e-03 | 1.32 | 5.83e-05 |
| Signaling events mediated by HDAC Class III | NCI-Nature | 19 | 2.12 | 3.91e-02 | 1.32 | 4.19e-05 |
| Removal of aminoterminal propeptides from | Reactome | 7 | 3.12 | 5.45e-03 | 1.29 | 8.46e-05 |
| Aminophosphonate metabolism | Kegg | 13 | 1.91 | 3.36e-02 | 1.26 | 8.17e-04 |
| Antigen processing and presentation | BioCarta | 6 | 2.61 | 1.82e-03 | 1.22 | 3.36e-05 |
| Classical complement pathway | BioCarta | 12 | 2.27 | 1.55e-02 | 1.19 | 1.67e-04 |
| Chylomicron-mediated lipid transport | Reactome | 7 | 1.94 | 3.27e-02 | 1.16 | 1.49e-02 |
(Pathways with over 60% SNPs covered by another pathway have been removed; for the complete list, see Table S2). Pathway-length based resampled -values, denoted , are given for significant pathways, along with the odds ratios and associated FDRs for a logistic regression model.
Figure 4Four significant pathways in liver cancer data.
Scatter plots of for each pathway are overlayed with boxplots are given in the left panel; higher values of indicate that the sample is closer to other cases than it is to other controls. Distributions of for cases (red) and controls (black) are given to the right. A significant shift toward higher values is seen in the cases. Odds ratios and FDR-adjusted OR values are given.
PoDA results for sucessive unions of significant pathways in the CGEMS breast cancer data.
| Pathway | Length |
| O.R. |
|
| Top-2 | 318 |
| 2.02 | 1.63e-46 |
| Top-3 | 397 | 1.00e-04 | 2.19 | 2.07e-54 |
| Top-4 | 474 |
| 2.33 | 3.65e-62 |
| Top-5 | 522 |
| 2.45 | 6.83e-66 |
| Top-6 | 544 |
| 2.44 | 8.51e-66 |
| Top-7 | 558 | 2.00e-04 | 2.47 | 1.22e-67 |
| Top-8 | 626 |
| 2.59 | 1.01e-73 |
| Top-9 | 658 |
| 2.64 | 9.84e-75 |
| Top-10 | 700 |
| 2.77 | 9.72e-79 |
| Top-11 | 710 |
| 2.80 | 1.42e-79 |
| Top-12 | 723 |
| 2.82 | 2.06e-80 |
| Top-13 | 739 |
| 2.89 | 3.31e-82 |
| Top-14 | 744 |
| 2.93 | 2.86e-83 |
| Top-15 | 770 |
| 2.96 | 6.41e-85 |
| Top-16 | 774 |
| 2.97 | 5.10e-85 |
| Top-17 | 791 |
| 2.95 | 2.43e-85 |
| Top-18 | 800 |
| 3.06 | 1.15e-87 |
| Top-19 | 814 |
| 3.14 | 1.19e-89 |
| Top-20 | 832 |
| 3.26 | 4.51e-92 |
| Top-21 | 837 |
| 3.28 | 2.92e-92 |
| Top-22 | 839 |
| 3.29 | 2.41e-92 |
| Top-23 | 845 |
| 3.34 | 1.45e-93 |
| Top-24 | 854 |
| 3.38 | 4.62e-95 |
Pathway-length based resampled values, denoted , are given along with the odds ratios and associated FDRs for a logistic regression model.
PoDA results for sucessive unions of significant pathways in the liver cancer data.
| Pathway | Length |
| O.R. |
|
| Top-2 | 321 | 5.38e-02 | 2.37 | 1.20e-27 |
| Top-3 | 402 | 2.80e-03 | 2.63 | 1.40e-34 |
| Top-4 | 474 | 1.10e-03 | 2.86 | 6.50e-38 |
| Top-5 | 539 | 9.00e-04 | 3.22 | 4.03e-42 |
| Top-6 | 560 | 1.00e-04 | 3.39 | 1.19e-43 |
| Top-7 | 580 |
| 3.50 | 1.39e-44 |
| Top-8 | 589 | 6.00e-04 | 3.50 | 1.35e-44 |
| Top-9 | 603 | 4.00e-04 | 3.52 | 1.23e-44 |
| Top-10 | 624 |
| 3.60 | 1.33e-45 |
| Top-11 | 640 |
| 3.73 | 3.69e-47 |
| Top-12 | 646 |
| 3.78 | 1.68e-47 |
| Top-13 | 667 |
| 3.81 | 9.29e-48 |
| Top-14 | 709 | 3.00e-04 | 3.88 | 1.90e-48 |
| Top-15 | 751 |
| 4.09 | 2.11e-49 |
| Top-16 | 761 |
| 4.09 | 1.76e-49 |
| Top-17 | 797 |
| 4.45 | 1.29e-50 |
| Top-18 | 805 |
| 4.46 | 5.24e-51 |
| Top-19 | 823 |
| 4.56 | 2.20e-51 |
| Top-20 | 838 |
| 4.56 | 1.73e-51 |
Pathway-length based resampled values, denoted , are given along with the odds ratios and associated FDRs for a logistic regression model.
Figure 5Union of top three pathways.
SNPs from the top three pathways are combined to compute for the breast cancer data (a) and the liver cancer data (b). Distributions of for cases (red) and controls (black) are given to the right. A significant shift toward higher values is seen in the cases.