| Literature DB >> 24078912 |
Gaston K Mazandu1, Nicola J Mulder.
Abstract
Several approaches have been proposed for computing term information content (IC) and semantic similarity scores within the gene ontology (GO) directed acyclic graph (DAG). These approaches contributed to improving protein analyses at the functional level. Considering the recent proliferation of these approaches, a unified theory in a well-defined mathematical framework is necessary in order to provide a theoretical basis for validating these approaches. We review the existing IC-based ontological similarity approaches developed in the context of biomedical and bioinformatics fields to propose a general framework and unified description of all these measures. We have conducted an experimental evaluation to assess the impact of IC approaches, different normalization models, and correction factors on the performance of a functional similarity metric. Results reveal that considering only parents or only children of terms when assessing information content or semantic similarity scores negatively impacts the approach under consideration. This study produces a unified framework for current and future GO semantic similarity measures and provides theoretical basics for comparing different approaches. The experimental evaluation of different approaches based on different term information content models paves the way towards a solution to the issue of scoring a term's specificity in the GO DAG.Entities:
Mesh:
Year: 2013 PMID: 24078912 PMCID: PMC3775452 DOI: 10.1155/2013/292063
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Figure 1Illustrating the inconsistency of the Resnik approach.
Figure 2Flowchart of different families, approaches, and categories of existing IC-based GO term semantic similarity measures.
Comparison of different IC-based approach parameters. In the GO term semantic similarity approaches, c is the MICA between GO terms a and b, and n is the number disjunctive common ancestors between terms a and b, the nth being the MICA between a and b.
| Family | Approach | Parameters |
| |||
|---|---|---|---|---|---|---|
|
|
|
|
| |||
| Annotation | Lin | ∞ | 0 |
| 1 |
|
| Relevance | ∞ | 0 |
| 1 − |
| |
| Li et al. | ∞ | 0 |
| 1 − (1+IC( |
| |
| GraSM | ∞ | 0 |
|
|
| |
|
| ||||||
| Wang et al. | 1 | 0 | 1,1 | 1 |
| |
| Topology | Zhang et al. | ∞ | 0 | 1,1 | 1 |
|
| GO-Universal | ∞ | 0 | 1,0 or 0,1 | 1 |
| |
Figure 3Snapshot of the term GO:0004003 in the molecular function ontology.
Comparison of performance of different approaches for GO BP ontology. This comparison is done using Pearson's correlation with enzyme commission (EC), Pfam and sequence similarity, and resolution. Results are obtained from the CESSM online tool. The best scores among each group are in bold, and Nmax, Nunif, and Nunivers are suffixes indicating different IC normalization strategies, namely, the highest IC value, uniform, and GO-universal strategies, respectively.
| Family | Approach | Similarity measure correlation | Resolution | ||
|---|---|---|---|---|---|
| EC | PFAM | Seq Sim | |||
| Annotation | Resnik-Nmax | 0.41166 | 0.29151 | 0.54563 |
|
| Resnik-Nunif | 0.41166 | 0.29151 | 0.54563 | 0.49265 | |
| Nunivers |
|
|
| 0.48490 | |
| Lin | 0.48032 | 0.38900 | 0.57956 | 0.43343 | |
| Li et al. |
|
|
|
| |
| Relevance | 0.48188 | 0.38682 | 0.57550 | 0.43823 | |
| GraSM-Lin | 0.48673 |
| 0.61739 | 0.51701 | |
| GraSM-Nmax | 0.44826 | 0.35941 | 0.63497 | 0.54996 | |
| GraSM-Nunif | 0.44826 | 0.35941 | 0.63497 | 0.48491 | |
| GraSM-Nunivers |
| 0.44158 |
|
| |
| XGraSM-Lin | 0.39811 |
| 0.68669 |
| |
| XGraSM-Nmax | 0.45493 | 0.37152 | 0.69892 | 0.53910 | |
| XGraSM-Nunif | 0.45493 | 0.37152 | 0.69892 | 0.47533 | |
| XGraSM-Nunivers |
| 0.45220 |
| 0.91425 | |
|
| |||||
| Topology | Wang et al. | 0.45451 | 0.47867 | 0.65214 |
|
| Zhang et al. |
| 0.45527 | 0.61862 | 0.44350 | |
| GO-universal | 0.45958 |
|
| 0.43772 | |
Comparison of performance of different approaches for GO MF ontology. This comparison is done using Pearson's correlation with enzyme commission (EC), Pfam and sequence similarity, and resolution. Results are obtained from the CESSM online tool. The best scores among each group are in bold, and Nmax, Nunif, and Nunivers are suffixes indicating different IC normalization strategies, namely, the highest IC value, uniform, and GO-universal strategies, respectively.
| Family | Approach | Similarity measure correlation | Resolution | ||
|---|---|---|---|---|---|
| EC | PFAM | Seq Sim | |||
| Annotation | Resnik-Nmax | 0.64381 |
|
|
|
| Resnik-Nunif | 0.64381 | 0.49101 | 0.59662 | 0.28872 | |
| Nunivers |
| 0.47693 | 0.40945 | 0.41671 | |
| Lin | 0.67404 | 0.42844 | 0.36060 | 0.36583 | |
| Li et al. |
|
|
|
| |
| Relevance | 0.67618 | 0.42112 | 0.35081 | 0.39798 | |
| GraSM-Lin | 0.68125 | 0.44009 | 0.37243 | 0.38321 | |
| GraSM-Nmax | 0.65180 |
|
| 0.37213 | |
| GraSM-Nunif | 0.65180 | 0.49844 | 0.60405 | 0.28859 | |
| GraSM-Nunivers |
| 0.48638 | 0.41889 |
| |
| XGraSM-Lin | 0.70480 | 0.53732 | 0.47682 | 0.43007 | |
| XGraSM-Nmax | 0.67136 |
|
| 0.36781 | |
| XGraSM-Nunif | 0.67136 | 0.58792 | 0.70911 | 0.28524 | |
| XGraSM-Nunivers |
| 0.55251 | 0.48988 |
| |
|
| |||||
| Topology | Wang et al. | 0.64327 | 0.46102 | 0.37272 | 0.34873 |
| Zhang et al. |
| 0.43453 | 0.35581 | 0.38646 | |
| GO-universal | 0.67661 |
|
|
| |
Comparison of performance of different approaches for GO CC ontology. This comparison is done using Pearson's correlation with enzyme commission (EC), Pfam and sequence similarity, and resolution. Results are obtained from the CESSM online tool. The best scores among each group are in bold, and Nmax, Nunif, and Nunivers are suffixes indicating different IC normalization strategies, namely, the highest IC value, uniform, and GO-universal strategies, respectively.
| Family | Approach | Similarity measure correlation | Resolution | ||
|---|---|---|---|---|---|
| EC | PFAM | Seq Sim | |||
| Annotation | Resnik-Nmax | 0.34355 |
|
| 0.40935 |
| Resnik-Nunif | 0.34355 | 0.43796 | 0.55437 | 0.33651 | |
| Nunivers | 0.32079 | 0.40931 | 0.53991 |
| |
| Lin | 0.29912 | 0.38851 | 0.4998 |
| |
| Li et al. |
|
|
| 0.95511 | |
| Relevance | 0.30183 | 0.39132 | 0.50435 | 0.96131 | |
| GraSM-Lin | 0.30463 | 0.38749 | 0.51142 |
| |
| GraSM-Nmax | 0.36341 |
|
| 0.40677 | |
| GraSM-Nunif | 0.36341 | 0.45946 | 0.60546 | 0.33439 | |
| GraSM-Nunivers |
| 0.41170 | 0.55626 | 0.95247 | |
| XGraSM-Lin | 0.30812 | 0.39642 | 0.57390 |
| |
| XGraSM-Nmax |
|
|
| 0.39428 | |
| XGraSM-Nunif | 0.37079 | 0.47364 | 0.68564 | 0.32412 | |
| XGraSM-Nunivers | 0.32451 | 0.41225 | 0.59762 | 0.94673 | |
|
| |||||
| Topology | Wang et al. |
|
|
| 0.94019 |
| Zhang et al. | 0.00477 | 0.00246 | 0.00188 | 0.36749 | |
| GO-universal | 0.15787 | 0.19982 | 0.13119 |
| |