| Literature DB >> 22046267 |
Burcu Bakir-Gungor1, Osman Ugur Sezerman.
Abstract
Genome-wide association studies (GWAS) with hundreds of żthousands of single nucleotide polymorphisms (SNPs) are popular strategies to reveal the genetic basis of human complex diseases. Despite many successes of GWAS, it is well recognized that new analytical approaches have to be integrated to achieve their full potential. Starting with a list of SNPs, found to be associated with disease in GWAS, here we propose a novel methodology to devise functionally important KEGG pathways through the identification of genes within these pathways, where these genes are obtained from SNP analysis. Our methodology is based on functionalization of important SNPs to identify effected genes and disease related pathways. We have tested our methodology on WTCCC Rheumatoid Arthritis (RA) dataset and identified: i) previously known RA related KEGG pathways (e.g., Toll-like receptor signaling, Jak-STAT signaling, Antigen processing, Leukocyte transendothelial migration and MAPK signaling pathways); ii) additional KEGG pathways (e.g., Pathways in cancer, Neurotrophin signaling, Chemokine signaling pathways) as associated with RA. Furthermore, these newly found pathways included genes which are targets of RA-specific drugs. Even though GWAS analysis identifies 14 out of 83 of those drug target genes; newly found functionally important KEGG pathways led to the discovery of 25 out of 83 genes, known to be used as drug targets for the treatment of RA. Among the previously known pathways, we identified additional genes associated with RA (e.g. Antigen processing and presentation, Tight junction). Importantly, within these pathways, the associations between some of these additionally found genes, such as HLA-C, HLA-G, PRKCQ, PRKCZ, TAP1, TAP2 and RA were verified by either OMIM database or by literature retrieved from the NCBI PubMed module. With the whole-genome sequencing on the horizon, we show that the full potential of GWAS can be achieved by integrating pathway and network-oriented analysis and prior knowledge from functional properties of a SNP.Entities:
Mesh:
Year: 2011 PMID: 22046267 PMCID: PMC3201947 DOI: 10.1371/journal.pone.0026277
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Outline of our assessment process.
In Step 1, a gene-wise Pw-value for association with disease was computed by integrating functional information. In Step 2, significant Pw-values were loaded as two separate attributes of the genes in a PPI network and visualized using Cytoscape [23]. At this step, active sub-networks of interacting gene products that were also associated with the disease, are identified using jActive Modules plugin [24]. In Step 3, genes in an identified active sub-network were tested whether they are part of functionally important KEGG pathways.
Description of data sources used in our regional score.
| Functional Category | Tool | Description | Meta-tool |
| Protein Coding | LS-SNP, SNPs3D, SIFT, SNPeffect | SNP annotation tool, Impact of nsSNPs on protein function, Prediction of amino acid substitution effects, SNP annotation with human disease | F-SNP |
| Protein Coding | PolyPhen | Prediction of amino acid substitution effects | SPOT,F-SNP |
| Protein Coding, Splicing Regulation, Transcriptional Regulation | Ensembl | Extensive genomic database including SNPs and gene transcripts | F-SNP |
| Splicing Regulation | ESEfinder, ESRSearch, PESX, RescueESE | Exonic splice sites, Exonic-splicing regulatory (ESR) sequences, Exon splicing enhancers/silencers, Exonic splice sites | F-SNP |
| Transcriptional Regulation | ConsiteTFSearch | Conserved transcription factor binding sites,Transcription factor binding sites | F-SNP |
| Transcriptional Regulation | SNPnexus | Conserved transcription factor binding sites | SNPnexus |
| Transcriptional Regulation, Conserved Region | GoldenPath | MicroRNA, cpgIslands, evolutionary conserved regions | F-SNP |
| Conserved Region | ECRBase | Evolutionary conserved regions | SPOT |
| Post-translation | KinasePhos, OGPET, Sulfinator | Phosphorylation sites, Prediction of O-glycosylation sites in proteins, Tyrosine sulfination sites | F-SNP |
| Genomic Coordinates | dbSNP | General SNP/gene transcript properties | SPOT |
| Genomic Coordinates | UCSC | Extensive genomic database including SNPs and gene transcripts | F-SNP |
| LD estimation | HapMap,Haploview | Dense genotyping on multiple populations, useful for LD estimatesEstimation of r2 LD coefficients for each population | SPOT |
Figure 2The highest scoring sub-network.
a. This sub-network is composed of 275 nodes and 778 edges (as found in Step 2 of PANOGA). Node size is shown as proportional to the degree of a node. b. Zoomed in view of the highest scoring sub-network. 20 genes known in literature as associated with RA are shown in green. Blue denotes the genes in our highest scoring sub-network that cannot be associated with RA in literature.
Figure 3Node degree distributions of our highest scoring sub-network vs. random network.
a. Our sub-network follows a power-law (P(k) = ax−γ, a = 120.03, γ = 1.353, R2 = 0.773, Correlation = 0.891 in log log scale), showing that our network displays scale-free properties, as expected from a biological network. b. The random network is obtained via randomization of our highest scoring sub-network using Erdos-Renyi algorithm.
Overrepresented KEGG Pathways found in the highest scoring sub-network as associated with RA.
| KEGG Term | Num. of Genes Found | Asso-ciated Genes (%) | Term Pvalue Corr. w/Bonfer. | Associated Genes Found |
|
| 30 | 14,9 | 9,33E-11 | ACTB, ACTG1, AKT1, COL4A4, CRKL, CTNNB1, EGF, EGFR, FLNA, FLNB, FLT4, FYN, GRLF1, ITGA5, |
|
| 20 | 22,9 | 2,13E-10 | AKT1, |
|
| 22 | 16,4 | 1,80E-08 | ACTG1, ACTN2, CASK, CTNNB1, EPB41L1, EPB41L2, EPB41L3, GNAI1, INADL, KRAS, LLGL1, MAGI1, MAGI3, PARD3, PRKCE, PRKCI, |
|
| 26 | 13,7 | 2,31E-08 | ADCY2, ADCY5, |
|
| 17 | 22,6 | 1,16E-07 | ACTB, BAIAP2, CREBBP, CTNNB1, |
| Bacterial invasion of epithelial cells | 15 | 20,5 | 1,57E-07 | ACTB, ACTG1, CBL, CLTC, CTNNB1, CTTN, DNM3, ELMO1, |
|
| 20 | 15,8 | 2,36E-07 | ARHGDIB, CALM1, CALM3, |
|
| 15 | 21,4 | 3,67E-07 |
|
|
| 32 | 9,7 | 1,12E-06 |
|
|
| 14 | 19,1 | 1,44E-06 | CBL, CRK, CRKL, |
|
| 18 | 13,2 | 1,42E-05 | CD226, |
|
| 17 | 11 | 1,72E-05 | ACTG1, ACTN2, CTNNB1, EZR, GNAI1, GRLF1, |
|
| 16 | 14,8 | 2,70E-05 | CBL, |
|
| 13 | 12,7 | 1,97E-03 |
|
|
| 11 | 13,9 | 2,08E-03 | CALR, CANX, HLA-B, |
|
| 8 | 20 | 2,16E-03 |
|
|
| 20 | 7,4 | 6,13E-03 | CACNA1A, |
|
| 8 | 17,3 | 6,24E-03 |
|
|
| 11 | 12,5 | 6,84E-03 | CAPN1, |
|
| 15 | 9,6 | 7,41E-03 | CBL, CREBBP, |
Bold formatting denotes experimentally verified RA associated genes and pathways, italic formatting denotes computationally found, RA associated genes and pathways, bold and italic formatting denotes both experimental and computational verification regarding susceptibility to RA.
Comparison of found KEGG pathways with previous studies in terms of number of genes associated within each KEGG term.
| KEGG Term | Number of Genes Found | Term Pvalue Corrected Bonfer-roni | |||||
| Baranzini et.al. | Martin et.al. | Wu et.al | Zhang et.al. | PANOGA (only GWAS p-values) | PANOGA (w/2 attributes SPOT Pw and F-SNP Pw) | ||
|
| 0 | 0 | 36 | 32 | 22 | 30 | 9,33E-11 |
|
| 0 | 0 | 23 | 0 | 18 | 20 | 2,13E-10 |
|
| 0 | 0 | 0 | 5 | 20 | 22 | 1,80E-008 |
|
| 0 | 0 | 0 | 0 | 24 | 26 | 2,31E-08 |
|
| 0 | 0 | 0 | 18 | 16 | 17 | 1,16E-07 |
| Bacterial invasion of epithelial cells | 0 | 0 | 0 | 0 | 15 | 16 | 1,57E-007 |
|
| 0 | 0 | 0 | 0 | 20 | 20 | 2,36E-07 |
|
| 0 | 22 | 0 | 7 | 14 | 15 | 3,67E-07 |
|
| 0 | 0 | 0 | 0 | 29 | 32 | 1,12E-06 |
|
| 4 | 0 | 21 | 18 | 10 | 14 | 1,44E-06 |
|
| 8 | 26 | 0 | 10 | 12 | 18 | 1,42E-05 |
|
| 0 | 24 | 14 | 0 | 17 | 17 | 1,72E-05 |
|
| 4 | 21 | 16 | 16 | 13 | 16 | 2,70E-05 |
|
| 0 | 0 | 22 | 6 | 7 | 13 | 1,97E-03 |
|
| 6 | 0 | 0 | 3 | 11 | 11 | 2,08E-03 |
| Allograft rejection | 0 | 0 | 0 | 0 | 8 | 8 | 2,16E-03 |
|
| 0 | 0 | 43 | 34 | 16 | 20 | 6,13E-03 |
|
| 5 | 0 | 0 | 1 | 8 | 8 | 6,24E-03 |
|
| 0 | 18 | 12 | 11 | 6 | 11 | 6,84E-03 |
|
| 0 | 25 | 0 | 16 | 13 | 15 | 7,41E-03 |
|
| 0 | 0 | 22 | 0 | 10 | 11 | 5,04E-02 |
|
| 0 | 35 | 0 | 4 | 15 | 16 | 1,63E-01 |
|
| 3 | 0 | 15 | 13 | 8 | 9 | 2,71E-01 |
| Total | 30 | 171 | 224 | 194 | 332 | 385 | |
Italic formatting denotes computationally found pathways, bold formatting denotes experimentally verified RA associated pathways, bold and italic formatting denotes both experimental and computational verification.
Figure 4Functionally grouped annotation network of our highest scoring sub-network.
The relationships between the KEGG terms (nodes) were based on the similarity of their associated genes. The size of the nodes reflected the statistical significance of the terms (term p-values corrected with Bonferroni). Edges represent the existence of shared genes. The thickness of the edges is proportional to the number of genes shared and calculated using kappa statistics, in a similar way as described in [35]. The grouped terms (according to their kappa scores) were shown in same color.
Figure 5Zoomed in view of the entire functional annotation network.
The most significant pathway term of the group with the lowest term p-value (the group leading term) was shown in bold using the group specific color.
Figure 6Comparison of KEGG pathway terms with literature verified RA genes/our gene set were shown in green/red, respectively.
Nodes represent the identified pathway terms from any one of the two sets. The color gradient showed the gene proportion of each set associated with the term. White color represented equal proportions from the two comparison sets. The size of the nodes reflected the statistical significance of the terms (term p-values corrected with Bonferroni). Following the convention in Figure 4, edges represented the existence of the shared genes between the pathway terms and node border colors mapped to the group colors. Zoomed in view of panel a is shown in panel b.