| Literature DB >> 16549020 |
Arunachalam Vinayagam1, Coral del Val, Falk Schubert, Roland Eils, Karl-Heinz Glatting, Sándor Suhai, Rainer König.
Abstract
BACKGROUND: Vast progress in sequencing projects has called for annotation on a large scale. A Number of methods have been developed to address this challenging task. These methods, however, either apply to specific subsets, or their predictions are not formalised, or they do not provide precise confidence values for their predictions. DESCRIPTION: We recently established a learning system for automated annotation, trained with a broad variety of different organisms to predict the standardised annotation terms from Gene Ontology (GO). Now, this method has been made available to the public via our web-service GOPET (Gene Ontology term Prediction and Evaluation Tool). It supplies annotation for sequences of any organism. For each predicted term an appropriate confidence value is provided. The basic method had been developed for predicting molecular function GO-terms. It is now expanded to predict biological process terms. This web service is available via http://genius.embnet.dkfz-heidelberg.de/menu/biounit/open-husarEntities:
Mesh:
Substances:
Year: 2006 PMID: 16549020 PMCID: PMC1434778 DOI: 10.1186/1471-2105-7-161
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Comparison of the prediction performance for molecular function and biological process.
| Molecular function | Biological process | |
| Number of sequences used for SVM training and validation | 36,771 | 27,109 |
| Number of instances used in training and validation sets | 856,632 | 1,342,270 |
| Positive instances | 31% | 12% |
| Recall at 80% precision | 65% | 76% |
Comparison of our system (GOPET) with the annotation systems GOtcha and GOFigure to predict the molecular function for eight Xenopus leavis sequences. Basically, the first four hits are shown, for GOPET and GOtcha with confidence values ≥ 80%.
| GOPET | GOtcha | GOFigure | ||||||
| Contig | GO ID | Confidence | GO term | GO ID | Estimated likelihood | GO term | GO ID | GO term |
| TC212171 | 0008233 | 100% | peptidase activity | 0003824 | 98% | enzyme activity | 0004263 | chymotrypsin activity |
| 0004175 | 100% | endopeptidase activity | 0008233 | 98% | peptidase activity | 0004295 | trypsin activity | |
| 008236 | 98% | serine-type peptidase activity | 0016787 | 98% | hydrolase activity | |||
| 0016787 | 98% | hydrolase activity | 0008236 | 98% | serine-type peptidase activity | |||
| TC196381 | 004175 | 100% | endopeptidase activity | 0003824 | 93% | enzyme activity | 0004263 | chymotrypsin activity |
| 0016787 | 98% | hydrolase activity | 0004295 | trypsin activity | ||||
| 0008233 | 98% | peptidase activity | ||||||
| 0008236 | 97% | serine-type peptidase activity | ||||||
| TC209487 | 0003824 | 100% | enzyme activity | 0003824 | 90% | enzyme activity | 0004177 | aminopeptidase activity |
| 0016787 | 100% | hydrolase activity | 0004301 | epoxide hydrolase activity | ||||
| 0004177 | 90% | aminopeptidase activity | ||||||
| 0017171 | 85% | serine hydrolase activity | ||||||
| TC187949 | 0004888 | 100% | transmembrane receptor activity | 0004872 | 93% | receptor activity | 0004888 | transmembrane receptor activity |
| 0004872 | 97% | receptor activity | 0004888 | 93% | transmembrane receptor activity | |||
| 0004871 | 93% | signal transducer activity | ||||||
| 0004930 | 93% | G-protein coupled receptor activity | ||||||
| TC194305 | 0003824 | 100% | enzyme activity | - | - | - | 0004674 | protein serine/threonine kinase activity |
| 0016740 | 99% | transferase activity | 0005524 | ATP binding | ||||
| 0016301 | 99% | kinase activity | ||||||
| 0004672 | 97% | protein kinase activity | ||||||
| TC210151 | 0004872 | 100% | receptor activity | 0004872 | 98% | receptor activity | 0004926 | non-G-protein coupled 7TM receptor activity |
| 0004888 | 97% | transmembrane receptor activity | 0004871 | 98% | signal transducer activity | 0004930 | G-protein coupled receptor activity | |
| 0004928 | 82% | frizzled receptor activity | 0004888 | 98% | transmembrane receptor activity | |||
| 0004930 | 98% | G-protein coupled receptor activity | ||||||
| 0004926 | 92% | non-G-protein coupled 7TM receptor activity | ||||||
| 0004928 | 80% | frizzled receptor activity | ||||||
| TC199713 | 0004602 | 100% | glutathione peroxidase activity | 0003824 | 99% | enzyme activity | 0004601 | peroxidase activity |
| 0016491 | 98% | oxidoreductase activity | 0004601 | 99% | peroxidase activity | 0004602 | glutathione peroxidase activity | |
| 0004601 | 85% | peroxidase activity | 0016491 | 99% | oxidoreductase activity | |||
| 0016684 | 99% | oxidoreductase, acting on peroxide as acceptor activity | ||||||
| 0004602 | 97% | glutathione peroxidase activity | ||||||
| TC190605 | 0003824 | 100% | enzyme activity | 0003824 | 92% | enzyme activity | 0004518 | nuclease activity |
| 0016787 | 100% | hydrolase activity | ||||||
| 0017171 | 87% | serine hydrolase activity | ||||||
Figure 1General workflow of the GOPET web-server.
Figure 2Example output of GOPET: the first column shows the GO ID, followed by its aspect (molecular function), the confidence value of the prediction and the short description of the GO-term. The example here shows the query results for sequence Xfz4 (Xenopus frizzled 4). Note, that Xfz4 is a maternal mRNA whose carboxyl-terminal half contains putative transmembrane segments. Furthermore, it is homologous to the murine gene product Mfz4, a frizzled transmembrane protein [34]. It has been shown elsewhere, that the C-terminal cytoplasmic Lys-thr-X-X-X-Trp motif in frizzled receptors mediates Wnt/beta-catenin signalling [35].