| Literature DB >> 24130572 |
Hugo P Bastos1, Luka A Clarke, Francisco M Couto.
Abstract
Protein functional annotation consists in associating proteins with textual descriptors elucidating their biological roles. The bulk of annotation is done via automated procedures that ultimately rely on annotation transfer. Despite a large number of existing protein annotation procedures the ever growing protein space is never completely annotated. One of the facets of annotation incompleteness derives from annotation uncertainty. Often when protein function cannot be predicted with enough specificity it is instead conservatively annotated with more generic terms. In a scenario of protein families or functionally related (or even dissimilar) sets this leads to a more difficult task of using annotations to compare the extent of functional relatedness among all family or set members. However, we postulate that identifying sub-sets of functionally coherent proteins annotated at a very specific level, can help the annotation extension of other incompletely annotated proteins within the same family or functionally related set. As an example we analyse the status of annotation of a set of CAZy families belonging to the Polysaccharide Lyase class. We show that through the use of visualization methods and semantic similarity based metrics it is possible to identify families and respective annotation terms within them that are suitable for possible annotation extension. Based on our analysis we then propose a semi-automatic methodology leading to the extension of single annotation terms within these partially annotated protein sets or families.Entities:
Keywords: annotation extension; annotation metrics; functional annotation; gene ontology; protein annotation coherence
Year: 2013 PMID: 24130572 PMCID: PMC3795322 DOI: 10.3389/fgene.2013.00201
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1Sub-graph of the GO .
Figure 2Common approaches in sequence-based functional annotation systems.
GO annotation scores (GOscore and GOoccurrence) and respective size in number of annotated proteins for each CAZy family in the PL enzyme class.
| Size | 391 | 34 | 228 | 43 | 37 | 21 | 63 | 184 | 89 | 77 | 44 | 19 | 5 | 9 | 3 | 22 | 30 | 3 | 1 | 29 |
| GOocc | 0.146 | 0.798 | 0.306 | 0.373 | 1.000 | 0.405 | 0.288 | 0.303 | 0.128 | 0.261 | 0.325 | 0.586 | 0.550 | 0.420 | 1.000 | 1.000 | 1.000 | 0.667 | 1.000 | 0.880 |
| GOscore | 0.196 | 0.511 | 0.593 | 0.309 | 0.599 | 0.192 | 0.202 | 0.508 | 0.166 | 0.202 | 0.129 | 0.202 | 0.718 | 0.180 | 0.202 | 0.640 | 0.599 | 0.202 | 0.202 | 0.577 |
Figure 3Graph subsuming the GO molecular function aspect annotation of CAZy's PL4 family.
GO term enrichment for CAZy family PL4 with Benjamini-Yekuteli corrected .
| Carbohydrate binding | 1.87e-47 | 0.658 | 43 |
| Carbon-oxygen lyase activity, acting on polysaccharides | 7.50e-23 | 0.699 | 16 |
| Carboxypeptidase activity | 3.43e-12 | 0.813 | 7 |
| Lyase activity | 1.76e-10 | 0.404 | 25 |
| Calcium ion binding | 4.75e-01 | 1.000 | 1 |
| Catalytic activity | 1.00e+00 | 0.166 | 32 |
Figure 4Graph subsuming the GO molecular function aspect annotation of CAZy's PL8 family.
GO term enrichment for CAZy family PL8 with Benjamini-Yekuteli corrected .
| Carbon-oxygen lyase activity, acting on polysaccharides | 8.15e-306 | 0.699 | 180 |
| Carbohydrate binding | 1.18e-186 | 0.658 | 178 |
| Hyaluronate lyase activity | 2.00e-095 | 1.000 | 31 |
| Chondroitin-sulfate-ABC endolyase activity | 3.69e-011 | 1.000 | 4 |
| Chondroitin AC lyase activity | 1.24e-005 | 1.000 | 2 |
| Chondroitin-sulfate-ABC exolyase activity | 5.49e-003 | 1.000 | 1 |
| Xanthan lyase activity | 5.49e-003 | 1.000 | 1 |
| Lyase activity | 3.37e-002 | 0.404 | 181 |
| Metal ion binding | 1.00e+00 | 0.687 | 2 |
Figure 5Outline of proposed methodology for annotation extension.