| Literature DB >> 22666244 |
Gaston K Mazandu1, Nicola J Mulder.
Abstract
The wide coverage and biological relevance of the Gene Ontology (GO), confirmed through its successful use in protein function prediction, have led to the growth in its popularity. In order to exploit the extent of biological knowledge that GO offers in describing genes or groups of genes, there is a need for an efficient, scalable similarity measure for GO terms and GO-annotated proteins. While several GO similarity measures exist, none adequately addresses all issues surrounding the design and usage of the ontology. We introduce a new metric for measuring the distance between two GO terms using the intrinsic topology of the GO-DAG, thus enabling the measurement of functional similarities between proteins based on their GO annotations. We assess the performance of this metric using a ROC analysis on human protein-protein interaction datasets and correlation coefficient analysis on the selected set of protein pairs from the CESSM online tool. This metric achieves good performance compared to the existing annotation-based GO measures. We used this new metric to assess functional similarity between orthologues, and show that it is effective at determining whether orthologues are annotated with similar functions and identifying cases where annotation is inconsistent between orthologues.Entities:
Year: 2012 PMID: 22666244 PMCID: PMC3361142 DOI: 10.1155/2012/975783
Source DB: PubMed Journal: Adv Bioinformatics ISSN: 1687-8027
Figure 1Fictitious hierarchical structure illustrating the computation of term semantic values. Terms are nodes with “r” as a root.
Figure 2Hierarchical structure illustrating how our approach works. Nodes are represented by integers from 0 to 11 with 0 as a root. The numbers beside each node represent its topological position characteristic and information content.
Figure 3Subgraph of the GO BP. Each box represents a GO term with GO ID, D value (Zhang et al. measure). This is used to illustrate our approach and compare its effectiveness to the Zhang et al. approach.
Names and characteristics of GO terms in Figure 3, including topological position characteristics μ and information content IC from our approach and IC and IC from the Zhang et al. approach.
| GO Id | Level |
| IC | IC | IC |
|---|---|---|---|---|---|
| GO:0042770 | 6 | 0.0456910e-27 | 6.525565e+01 | 10.11006 | 0.71747 |
| GO:0042772 | 7 | 0.1142274e-28 | 6.664195e+01 | 12.30729 | 0.87340 |
| GO:0030330 | 7 | 0.1142274e-28 | 6.664195e+01 | 11.20867 | 0.79544 |
| GO:0000077 | 7 | 0.0171747e-34 | 8.235221e+01 | 10.92099 | 0.77502 |
| GO:0008630 | 10 | 0.0335723e-86 | 2.014164e+02 | 12.30729 | 0.87340 |
| GO:0006978 | 8 | 0.0434930e-57 | 1.343825e+02 | 12.30729 | 0.87340 |
| GO:0006977 | 9 | 0.0419985e-79 | 1.850743e+02 | 12.30729 | 0.87340 |
| GO:0042771 | 11 | 0.1278292e-116 | 2.691569e+02 | 12.30729 | 0.87340 |
| GO:0031571 | 8 | 0.1103023e-50 | 1.173338e+02 | 12.30729 | 0.87340 |
| GO:0031572 | 8 | 0.0735349e-50 | 1.177393e+02 | 12.30729 | 0.87340 |
| GO:0031573 | 8 | 0.4293676e-36 | 8.373851e+01 | 12.30729 | 0.87340 |
| GO:0031574 | 8 | 0.2206046e-50 | 1.166406e+02 | 12.30729 | 0.87340 |
Semantic similarity values between child-parent pairwise terms in Figure 3 from the Wang et al. and Zhang et al. approaches are compared to our approach. S refers to the semantic similarity between two GO terms obtained using the Wang semantic similarity approach from G-SESAME (Gene Semantic Similarity Analysis and Measurements) Tools. D values, S , S , and S refer to the Zhang et al. approach and S GO refers to the semantic similarity approach developed here.
| Parent GO Id | Child GO Id |
|
|
|
|
|
|---|---|---|---|---|---|---|
| GO:0042770 | GO:0042772 | 0.97920 | 0.940 | 10.11006 | 0.71747 | 0.90199 |
| GO:0042770 | GO:0030330 | 0.97920 | 0.940 | 10.11006 | 0.71747 | 0.94847 |
| GO:0042770 | GO:0008630 | 0.32398 | 0.704 | 10.11006 | 0.71747 | 0.90199 |
| GO:0042770 | GO:0000077 | 0.79240 | 0.802 | 10.11006 | 0.71747 | 0.96144 |
| GO:0042772 | GO:0006978 | 0.49591 | 0.882 | 12.30729 | 0.87340 | 1.00000 |
| GO:0030330 | GO:0006978 | 0.49591 | 0.889 | 11.20867 | 0.79544 | 0.95328 |
| GO:0030330 | GO:0006977 | 0.36008 | 0.615 | 11.20867 | 0.79544 | 0.95328 |
| GO:0030330 | GO:0042771 | 0.24760 | 0.696 | 11.20867 | 0.79544 | 0.95328 |
| GO:0008630 | GO:0042771 | 0.74832 | 0.931 | 12.30729 | 0.87340 | 1.00000 |
| GO:0000077 | GO:0031571 | 0.70186 | 0.830 | 10.92099 | 0.77502 | 0.94032 |
| GO:0000077 | GO:0031572 | 0.69945 | 0.850 | 10.92099 | 0.77502 | 0.94032 |
| GO:0000077 | GO:0031573 | 0.98344 | 0.948 | 10.92099 | 0.77502 | 0.94032 |
| GO:0000077 | GO:0031574 | 0.70603 | 0.870 | 10.92099 | 0.77502 | 0.94032 |
| GO:0031571 | GO:0006977 | 0.63398 | 0.774 | 12.30729 | 0.87340 | 1.00000 |
Figure 4ROC evaluations of functional similarity approaches based on the human PPI dataset derived from different PPI databases.
Area under ROC curves (AUCs) and precision for the human PPI dataset. For each group, the top score is in bold.
| Approaches | Area under curve (AUC) | Precision | Accuracy | |||
|---|---|---|---|---|---|---|
| Excluding IEA | Including IEA | Excluding IEA | Including IEA | Excluding IEA | Including IEA | |
| GO-universal |
|
|
|
|
|
|
| Resnik | 0.933 | 0.931 | 0.724 | 0.701 | 0.713 | 0.739 |
| Lin | 0.763 | 0.691 | 0.610 | 0.568 | 0.481 | 0.549 |
|
| ||||||
| SimUIC |
|
|
| 0.916 |
|
|
| SimGIC |
|
| 0.922 |
| 0.974 | 0.974 |
| SimUI | 0.975 | 0.978 | 0.866 | 0.845 | 0.926 | 0.937 |
Comparison of performance of our approach with Wang et al., Zhang et al. and annotation-based ones using Pearson's correlation with enzyme Commission (eC), Pfam and sequence similarity, and resolution. Results are obtained from the CESSM online tool. For each ontology, the top two best scores among 12 approaches are in bold.
| Ontology | Approaches | Similarity measure correlation | Resolution | |||
|---|---|---|---|---|---|---|
| EC | PFAM | Seq Sim | ||||
| BP | GO-Universal | (BMA) |
|
|
|
|
| Wang et al. | 0.43266 |
| 0.63356 |
| ||
| Zhang et al. | 0.21944 | 0.26495 | 0.20270 | 0.30148 | ||
| Resnik | Avg | 0.30218 | 0.32324 | 0.40685 | 0.33673 | |
| Max | 0.30756 | 0.26268 | 0.30273 | 0.64522 | ||
| BMA |
| 0.45878 | 0.73973 | 0.90041 | ||
| Term-based | SimUIC | 0.38458 | 0.43693 | 0.74410 | 0.84503 | |
| SimGIC | 0.39811 | 0.45470 |
| 0.83730 | ||
|
| ||||||
| MF | GO-Universal | (BMA) |
| 0.60285 | 0.55163 | 0.52905 |
| Wang et al. |
| 0.49101 | 0.37101 | 0.33109 | ||
| Zhang et al. | 0.49753 | 0.41147 | 0.32235 | 0.39865 | ||
| Resnik | Avg | 0.39635 | 0.44038 | 0.50143 | 0.41490 | |
| Max | 0.45393 | 0.18152 | 0.12458 | 0.38056 | ||
| BMA | 0.60271 | 0.57183 |
|
| ||
| Term-based | SimUIC | 0.65826 |
| 0.60512 |
| |
| SimGIC | 0.62196 |
|
| 0.95590 | ||
Proportion in percentage of Human-Mouse orthologue pairs sharing high functional similarity.
| Using all GO evidence codes | Leaving out IEA and ISS | |||
|---|---|---|---|---|
| Approach | BP | MF | BP | MF |
| GO-Universal | 76 | 82 | 12 | 49 |
| Resnik | 76 | 80 | 13 | 38 |
Some human-mouse protein orthologue pairs without GO-based functional similarity.
| Protein ID | Organism | Annotation information | ||||
|---|---|---|---|---|---|---|
| GO ID | GO name | Code | Source | |||
| BP | A1Z1Q3 | Homo sapiens | GO:0042278 | Purine nucleoside metabolic process | IDA | UniProtKB |
| Q3UYG8 | Mus musculus | GO:0007420 | Brain development | IEP | UniProtKB | |
| Q96EQ8 | Homo sapiens | GO:0032480 | Negative regulation of type I interferon production | TAS | Reactome | |
| GO:0045087 | Innate immune response | TAS | Reactome | |||
| Q9D9R0 | Mus musculus | GO:0016567 | Protein ubiquitination | EXP | GOC | |
| O00451 | Homo sapiens | GO:0007169 | Transmembrane receptor protein tyrosine kinase signaling pathway | TAS | PINC | |
| GO:0035860 | Glial cell-derived neurotrophic factor receptor signaling pathway | TAS | GOC | |||
| O08842 | Mus musculus | GO:0007399 | Nervous system development | IMP | MGI | |
| Q9BS16 | Homo sapiens | GO:0000087 | M phase of mitotic cell cycle | TAS | Reactome | |
| GO:0000236 | Mitotic prometaphase | TAS | Reactome | |||
| GO:0000278 | Mitotic cell cycle | TAS | Reactome | |||
| GO:0006334 | Nucleosome assembly | TAS | Reactome | |||
| GO:0034080 | Cenh3-containing nucleosome assembly at centromere | TAS | Reactome | |||
| Q9ESN5 | Mus musculus | GO:0045944 | Positive regulation of transcription from RNA polymerase II promoter | IDA | MGI | |
| O15347 | Homo sapiens | GO:0006310 | DNA recombination | ISS | UniProtKB | |
| GO:0007275 | Multicellular organismal development | TAS | PINC | |||
| O54879 | Mus musculus | GO:0045578 | Negative regulation of B cell differentiation | IDA | MGI | |
| GO:0045638 | Negative regulation of myeloid cell differentiation | IDA | MGI | |||
| Q9NP31 | Homo sapiens | GO:0001525 | Angiogenesis | IEA | UniProtKB | |
| GO:0007165 | Signal transduction | TAS | PINC | |||
| GO:0007275 | Multicellular organismal development | IEA | UniProtKB | |||
| GO:0030154 | Cell differentiation | IEA | UniProtKB | |||
| Q9QXK9 | Mus musculus | GO:0008283 | Cell proliferation | IMP | occurs_in (CL:0000084) | |
| Q9C035 | Homo sapiens | GO:0009615 | Response to virus | IEA | UniProtKB | |
| GO:0044419 | Interspecies interaction between organisms | IEA | UniProtKB | |||
| GO:0070206 | Protein trimerization | IDA | UniProtKB:Q9C035-1 | |||
| P15533 | Mus musculus | GO:0006351 | Transcription, DNA-dependent | IEA | UniProtKB | |
| GO:0006355 | Regulation of transcription, DNA-dependent | IEA | UniProtKB | |||
|
| ||||||
| MF | Q86XR7 | Homo sapiens | GO:0004871 | Signal transducer activity | IMP | UniProtKB |
| Q8BJQ4 | Mus musculus | GO:0005515 | Protein binding | IPI | BHF-UCL | |
| Q99218 | Homo sapiens | GO:0030345 | Structural constituent of tooth enamel | IDA | BHF-UCL | |
| P63277 | Mus musculus | GO:0005515 | Protein binding | IPI | MGI, BHF-UCL | |
| GO:0008083 | Growth factor activity | IMP | BHF-UCL | |||
| GO:0042802 | Identical protein binding | IPI | BHF-UCL | |||
| GO:0043498 | Cell surface binding | IMP | BHF-UCL | |||
| GO:0046848 | Hydroxyapatite binding | IDA | BHF-UCL | |||
| P45379 | Homo sapiens | GO:0003779 | Actin binding | IDA | UniProtKB | |
| GO:0005523 | Tropomyosin binding | IDA | UniProtKB | |||
| GO:0030172 | Troponin C binding | IPI | UniProtKB | |||
| GO:003113 | Troponin I binding | IPI | UniProtKB | |||
| GO:0016887 | Atpase activity | IDA | UniProtKB:P45379-1-6-7-8 | |||
| P50752 | Mus musculus | GO:0005200 | Structural constituent of cytoskeleton | IDA | occurs_in (CL:0000193) | |
| Q9H0E3 | Homo sapiens | GO:0003713 | Transcription coactivator activity | IDA | UniProtKB | |
| GO:0004402 | Histone acetyltransferase activity | IDA | UniProtKB | |||
| Q8BIH0 | Mus musculus | GO:0005515 | Protein binding | IPI | UniProtKB | |
| Q5T9L3 | Homo sapiens | GO:0004871 | Signal transducer activity | ISS | UniProtKB | |
| Q6DID7 | Mus musculus | GO:0005515 | Protein binding | IPI | UniProtKB | |
| GO:0017147 | Wnt-protein binding | IDA | UniProtKB | |||
| A8CG34 | Homo sapiens | GO:0005515 | Protein binding | IPI | UniProtKB | |
| Q8K3Z9 | Mus musculus | GO:0017056 | Structural constituent of nuclear pore | IEA | ENSEMBL | |
| O15446 | Homo sapiens | GO:0003899 | DNA-directed RNA polymerase activity | IEA | UniProtKB | |
| Q76KJ5 | Mus musculus | GO:0005515 | Protein binding | IPI | MGI | |