| Literature DB >> 17570146 |
Benjamin Audit1, Emmanuel D Levy, Wally R Gilks, Leon Goldovsky, Christos A Ouzounis.
Abstract
Using a previously developed automated method for enzyme annotation, we report the re-annotation of the ENZYME database and the analysis of local error rates per class. In control experiments, we demonstrate that the method is able to correctly re-annotate 91% of all Enzyme Classification (EC) classes with high coverage (755 out of 827). Only 44 enzyme classes are found to contain false positives, while the remaining 28 enzyme classes are not represented. We also show cases where the re-annotation procedure results in partial overlaps for those few enzyme classes where a certain inconsistency might appear between homologous proteins, mostly due to function specificity. Our results allow the interactive exploration of the EC hierarchy for known enzyme families as well as putative enzyme sequences that may need to be classified within the EC hierarchy. These aspects of our framework have been incorporated into a web-server, called CORRIE, which stands for Correspondence Indicator Estimation and allows the interactive prediction of a functional class for putative enzymes from sequence alone, supported by probabilistic measures in the context of the pre-calculated Correspondence Indicators of known enzymes with the functional classes of the EC hierarchy. The CORRIE server is available at: http://www.genomes.org/services/corrie/.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17570146 PMCID: PMC1892082 DOI: 10.1186/1471-2105-8-S4-S3
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Schematic view of the CORRIE annotation framework. The only requirement for CORRIE is a classification of sequences. Here, we start with the classification of enzymes found in SwissProt. This enables us to create two tables, one for sequences and one for classes. From pairwise sequence comparisons we derive a score table, which describes all the classes hit by each sequence. BLAST scores are further integrated into correspondence indicators (CIs), which describe the relationship each sequence has with the classes it hits. Next, CIs are integrated to compute the probability that a sequence belongs to a particular class. The table "CI reference" is central to the framework as it constitutes a reference against which new proteins are compared and classified. This is illustrated in Figure 2.
Figure 2Illustration of the probability calculation implemented in CORRIE. To annotate a new sequence s, s is first aligned against all proteins in CORRIE. Here, s has similarity with proteins from two distinct classes: A and B. CIs between s and A, and between s and B are calculated [10]. The probability that s belongs to A (i.e. that s has function A) is calculated by comparison of the CI between s and A, with the CIs of proteins that belong or not to A. In this case, the ten proteins closest to s in the CI space are shown in the red dotted rectangle. Since all ten proteins truly belong to A, CORRIE estimates to P = 1 the probability for s to truly belong to A. When considering class B, ten proteins closest to s in the CI space do not belong to B. Therefore, CORRIE estimates to P = 0 the probability for s to truly belong to B. In this case, s would be annotated as having function A with probability 1.
Local error rate per EC class, for those cases where there is more than one error.
| 3.2.1.4 | 3 | 22 | 13.64 | Cellulase |
| 3.2.1.8 | 3 | 29 | 10.34 | Endo-1,4-beta-xylanase |
| 2.4.1.21 | 4 | 99 | 4.04 | Starch synthase |
| 1.6.5.3 | 9 | 457 | 1.97 | NADH dehydrogenase (ubiquinone) |
| 2.7.11.1 | 14 | 819 | 1.71 | Non-specific Ser/Thr protein kinase |
| 1.1.1.37 | 2 | 208 | 0.96 | Malate dehydrogenase |
| 3.6.3.14 | 14 | 1904 | 0.74 | H+-transporting two-sector ATPase |
| 4.2.1.33 | 2 | 310 | 0.65 | 3-isopropylmalate dehydratase |
| 2.7.7.6 | 4 | 1673 | 0.24 | DNA-directed RNA polymerase |
Column names: EC – EC number assignment by CORRIE; Errors – number of errors assigned to this class; Assignments – total number of assignments to this class; Error % – the local error rate; Description – the description of the corresponding EC reaction.
Overlapping EC classes, for those cases where there are more than two errors from a true EC class to an assigned EC class.
| 2.7.7.7 | DNA-directed DNA polymerase | 2.7.7.6 | DNA-directed RNA polymerase | DNA-dependent nucleotidyltransferase | Substrate: DNA or RNA |
| 1.6.99.5 | NADH dehydrogenase (quinone) | 1.6.5.3 | NADH dehydrogenase (ubiquinone) | NADH dehydrogenase | Electron acceptor: quinone or ubiquinone |
| 3.2.1.91 | Cellulose 1,4-beta-cellobiosidase | 3.2.1.4 | Cellulase | Hydrolysis of 1,4-beta-D-glucosidic linkages | Exo-hydrolysis or endo-hydrolysis |
| 2.7.1.137 | Phosphatidylinositol 3-kinase | 2.7.11.1 | Non-specific Ser/Thr protein kinase | Kinase | Substrate: PI3 or Ser/Thr |
| 2.4.1.242 | NDP-glucose – starch glucosyltransferase | 2.4.1.21 | Starch synthase | Starch glucosyltransferase | Substrate: NDP-glucose or ADP-glucose |
| 3.6.3.15 | Sodium-transporting two-sector ATPase | 3.6.3.14 | H+-transporting two-sector ATPase | Ion transporting two sector ATPase | Ion specificity: NA+ or H+ |
Column names: True EC/Name – the real EC number/name; Assigned EC/Name – the assigned properties made by CORRIE; Common activity/Difference – similarities and differences of substrate specificity and mechanisms for the corresponding reaction pairs.