| Literature DB >> 22848383 |
Daniel Faria1, Andreas Schlicker, Catia Pesquita, Hugo Bastos, António E N Ferreira, Mario Albrecht, André O Falcão.
Abstract
Despite the structure and objectivity provided by the Gene Ontology (GO), the annotation of proteins is a complex task that is subject to errors and inconsistencies. Electronically inferred annotations in particular are widely considered unreliable. However, given that manual curation of all GO annotations is unfeasible, it is imperative to improve the quality of electronically inferred annotations. In this work, we analyze the full GO molecular function annotation of UniProtKB proteins, and discuss some of the issues that affect their quality, focusing particularly on the lack of annotation consistency. Based on our analysis, we estimate that 64% of the UniProtKB proteins are incompletely annotated, and that inconsistent annotations affect 83% of the protein functions and at least 23% of the proteins. Additionally, we present and evaluate a data mining algorithm, based on the association rule learning methodology, for identifying implicit relationships between molecular function terms. The goal of this algorithm is to assist GO curators in updating GO and correcting and preventing inconsistent annotations. Our algorithm predicted 501 relationships with an estimated precision of 94%, whereas the basic association rule learning methodology predicted 12,352 relationships with a precision below 9%.Entities:
Mesh:
Year: 2012 PMID: 22848383 PMCID: PMC3405096 DOI: 10.1371/journal.pone.0040519
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Manual evaluation of the 20 most supported rules selected by our GO relationship learning algorithm.
| Subject Term | Predicate Term | Support | Confidence | Agreement | Evaluation |
| GTPase activity | GTP binding | 62218 | 100% | 95% | True |
| ribonucleoside binding | DNA-directed RNA polymerase activity | 18893 | 100% | 100% | Reverse |
| DNA topoisomerase (ATP-hydrolyzing) activity | ATP binding | 18778 | 97% | 82% | True |
| phosphopantetheine binding | acyl carrier activity | 8433 | 100% | 94% | Reverse |
| 1-aminocyclopropane-1-carboxylate synthase activity | pyridoxal phosphate binding | 7101 | 100% | 97% | True |
| adenylate kinase activity | ATP binding | 5514 | 99% | 86% | True |
| tRNA dihydrouridine synthase activity | FAD binding | 4559 | 100% | 100% | True |
| 5-formyltetrahydrofolate cyclo-ligase activity | ATP binding | 4427 | 100% | 94% | True |
| glycine-tRNA ligase activity | ATP binding | 4073 | 99% | 88% | True |
| holo-[acyl-carrier-protein] synthase activity | magnesium ion binding | 4017 | 100% | 89% | True |
| arginine-tRNA ligase activity | ATP binding | 4005 | 99% | 88% | True |
| cysteine synthase activity | pyridoxal phosphate binding | 4001 | 100% | 97% | True |
| copper-exporting ATPase activity | ATP binding | 3993 | 99% | 89% | True |
| shikimate kinase activity | ATP binding | 3947 | 99% | 91% | True |
| histidine-tRNA ligase activity | ATP binding | 3692 | 99% | 83% | True |
| alanine-tRNA ligase activity | ATP binding | 3630 | 99% | 82% | True |
| tetrahydrofolylpolyglutamate synthase activity | ATP binding | 3585 | 100% | 91% | True |
| cysteine desulfurase activity | pyridoxal phosphate binding | 3512 | 98% | 80% | True |
| D-alanine-D-alanine ligase activity | ATP binding | 3477 | 99% | 85% | True |
| lysine-tRNA ligase activity | ATP binding | 3460 | 99% | 82% | True |
Each association is classified as: true if evidence for a relationship between the terms was found; reverse if the reverse rule is true; unknown if no conclusive evidence was found for or against the association; and false if a counterexample was found. The support is given in number of co-annotations.