| Literature DB >> 24708602 |
Usha Kuppuswamy1, Seshan Ananthasubramanian1,2, Yanli Wang1, Narayanaswamy Balakrishnan3, Madhavi K Ganapathiraju1,2.
Abstract
BACKGROUND: The number of genome-wide association studies (GWAS) has increased rapidly in the past couple of years, resulting in the identification of genes associated with different diseases. The next step in translating these findings into biomedically useful information is to find out the mechanism of the action of these genes. However, GWAS studies often implicate genes whose functions are currently unknown; for example, MYEOV, ANKLE1, TMEM45B and ORAOV1 are found to be associated with breast cancer, but their molecular function is unknown.Entities:
Year: 2014 PMID: 24708602 PMCID: PMC4124845 DOI: 10.1186/1748-7188-9-10
Source DB: PubMed Journal: Algorithms Mol Biol ISSN: 1748-7188 Impact factor: 1.405
Figure 1Rows from color-coded HTML file showing the CC and MF terms for gene CHRNB2: cholinergic receptor, nicotinic, beta 2 (neuronal) (Entrez id: 1141). GO terms were ranked according to relative association scores for the human genes with atleast one interaction.
Figure 2Evaluation of the specificity of the proposed method for gene annotations predictions. Figures show the plots of the number of associated genes versus the GO term index (A) for all five sets (Set1 to Set5) of 100 genes each at a threshold =30, (B) for one set of 100 genes at five thresholds=10, 20, 30, 40, 50.
Figure 3Evaluation of the proposed method for gene annotations predictions. Figures in the top panel show the plots of precision versus threshold for 5 sets of 100 genes each for (A) CC GO terms and (B) MF GO terms. The thresholds ranged from 5 to 50 in steps of 5. Figures in the bottom panel show the plots of recall versus threshold plotted for 5 sets of 100 genes each for (C) CC GO terms and (D) MF GO terms.
Figure 4Comparison of the proposed method with randomized PPI network. Figures (A) and (B) show the comparison of precision-recall curves between the proposed method (for each of the five genesets) and a randomized PPI network constructed exploiting the GO DAG structure for (A) for CC GO terms and (B) MF GO terms. Figures (C) and (D) show the comparison of average precision-recall curves between the proposed method (averaged over 5 genesets) and a randomized PPI network for (C) CC GO terms and (D) MF GO terms. Figures (E) and (F) show the comparison of F-score versus threshold curves, between the proposed method and a randomized PPI network for (E) CC GO terms and (F) MF GO terms. F-score is calculated as the harmonic mean of precision and recall values.
Semi-quantitative comparison of probabilistic approach with standard function prediction approach[22]using Mnaimneh dataset
| GO:0008213 | 18/20 | YLR333C, YOR182C, YGR162W, YHR021C, YMR282C, YJL189W, YLR287C-A,
YJR056C, YLR455W, YLR185W, YNL313C, YDL002C, YNL132W, YMR031C,
YFR032C-A, YNL162W, YML017W, YEL054C |
| GO:0001510 | 18/20 | YDR161W, YHR052W, YGR162W, YIL091C, YDR101C, YDL063C, YLR009W,
YIL096C, YJR032W, YCR016W, YLR287C , YKL078W, YGR071C, YOL077C,
YPL226W, YOR361C, YGR173W, YPL193W |
| GO:0018193 | 17/20 | YOR182C, YHR021C, YJL189W, YLR287C-A, YNL313C, YLR185W, YNL162W,
YDL075W, YJL136C, YMR282C, YHR141C, YOL098C, YGR034W, YLR406C,
YGR162W, YFR032C-A, YIL069C |
| GO:0050790 | 18/20 | YGR162W, YLR455W, YNL031C, YMR124W, YLR019W, YGL140C, YEL025C,
YMR031C, YBR079C, YFR016C, YJL084C, YBL002W, YJR056C, YDR334W,
YPL282C, YNL301C, YMR144W, YLR419W, YKL219W |
| GO:0009966 | 18/20 | YHR155W, YJL084C, YDR520C, YHL004W, YER033C, YNL289W, YKR104W, YKL209C, YBR071W, YGR162W, YOR077W, YIR016W, YHL029C, YGR054W, YDL123W, YCR030C, YOR166C, YNL208W |
Table shows the fraction of the genes correctly predicted using the probabilistic approach. The dataset consisted of 1622 yeast genes and 138 BP GO terms.
Figure 5Plot representing the number of diseases associated with GWAS genes Individual GWAS genes are plotted on x-axis and the numbers of diseases they are associated with are on y-axis. 4,485 GWAS genes are arranged in descending order of the number of disease/trait associations.
Figure 6Distribution of GWAS genes among GO categories: Cellular Component, Biological Process and Molecular Function. (A) Venn diagram showing the distribution of all identified GWAS genes; about 273 genes were identified with no CC, MF and BP components. (B) Venn diagram showing the distribution of GWAS genes listed in the human protein-protein interactions downloaded from HPRD website. About 31 genes were identified with no CC, MF and BP components.
Figure 7Rows from color-coded HTML file showing the CC and MF terms for GWAS gene TRA@ T cell receptor alpha locus (Entrez id: 6955) associated with Narcolepsy. GO terms were ranked according to relative association scores for the human GWAS genes with atleast one known interaction.