| Literature DB >> 21122127 |
C Bannert1, A Welfle, C Aus dem Spring, D Schomburg.
Abstract
BACKGROUND: Models for the simulation of metabolic networks require the accurate prediction of enzyme function. Based on a genomic sequence, enzymatic functions of gene products are today mainly predicted by sequence database searching and operon analysis. Other methods can support these techniques: We have developed an automatic method "BrEPS" that creates highly specific sequence patterns for the functional annotation of enzymes.Entities:
Mesh:
Substances:
Year: 2010 PMID: 21122127 PMCID: PMC3009691 DOI: 10.1186/1471-2105-11-589
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Breps Protocol: Outline. Outline of the BrEPS protocol, which comprises of four different steps. See the main text for a detailed description.
Pattern verification: Example cases
| Pattern ECs/ | Target ECs/ | Case | |
|---|---|---|---|
| 1. | TP | ||
| 2. | FP | ||
| 3. | NA | ||
| 4. | TP | ||
| 5. | TP | ||
| 6. | TP | ||
| 7. | FP | ||
| 8. | TP | ||
| 9. | TP |
Table 1 shows nine examples explaining the assignment of the True Positive (TP), False Positive (FP) and Undecided (NA) status to a pattern that is matching an enzyme in the database. A detailed explanation is provided in the main text.
Figure 2Verification: True and False Positives at varying pattern lengths. Figure 2 shows the sum of True Positive annotations (TP) and False Positive annotations (FP) at pattern lengths between 9 and 200. At small pattern lengths around ten, the sum of FPs is often close to the number of TPs. This is rapidly changing with increasing pattern length, most patterns of more than 50 positions are specific.
Enzyme classes with extreme clustering properties
| Trees | Sequences | Seqs/Trees | |
|---|---|---|---|
| 3.1.21.4 | 73 | 94 | 1,29 |
| 3.1.6.1 | 5 | 11 | 2,20 |
| 1.2.7.7 | 6 | 15 | 2,50 |
| 3.2.1.37 | 6 | 15 | 2,50 |
| 5.4.99.5 | 7 | 18 | 2,57 |
| 1.1.99.- | 5 | 13 | 2,60 |
| 2.1.1.113 | 8 | 21 | 2,63 |
| 3.2.1.73 | 5 | 14 | 2,80 |
| 3.2.1.55 | 9 | 26 | 2,89 |
| 1.6.99.- | 5 | 15 | 3,00 |
| 2.4.1.198 | 8 | 24 | 3,00 |
| 3.4.16.4 | 6 | 18 | 3,00 |
| 4.2.1.75 | 5 | 15 | 3,00 |
| 4.2.2.10 | 4 | 12 | 3,00 |
| 2.8.1.8 | 1 | 389 | 389,00 |
| 2.1.2.3 | 1 | 392 | 392,00 |
| 1.1.1.267 | 1 | 393 | 393,00 |
| 4.1.1.37 | 1 | 426 | 426,00 |
| 6.1.1.10 | 1 | 436 | 436,00 |
| 2.2.1.7 | 1 | 440 | 440,00 |
| 2.6.1.9 | 1 | 446 | 446,00 |
| 2.5.1.7 | 1 | 458 | 458,00 |
| 2.1.2.11 | 1 | 468 | 468,00 |
| 4.2.1.19 | 1 | 472 | 472,00 |
| 4.2.1.9 | 1 | 496 | 496,00 |
Table 2 displays 20 enzyme classes with extreme clustering properties. Sequences within the upper ten enzyme classes have low sequence similarity and therefore cluster in multiple trees, whereas the sequences in the lower ten enzyme classes are so similar that they cluster in only one tree.
Microorganisms used in comparing BrEPS to PRIAM
| Species | NCBI Accession | SOT size | |
|---|---|---|---|
| Strict | Loose | ||
| 97 | 273 | ||
| 1034 | 2476 | ||
| 368 | 742 | ||
| 119 | 208 | ||
| 142 | 284 | ||
Table 3 shows the five microorganisms (with their NCBI Accession IDs) that were used to compare BrEPS and PRIAM. The size of the "Strict" and "Loose" standards of truth (SOT) we used is displayed on the right.
Sensitivity of BrEPS and PRIAM
| „Strict" SOT | |||||
|---|---|---|---|---|---|
| % True Positives | % TP incl. Sub-Subclass Hits | ||||
| BrEPS A | BrEPS B | PRIAM | BrEPS A | BrEPS B | PRIAM |
| 45,4 | 52,6 | 55,7 | 76,3 | 90,7 | 92,8 |
| 61,7 | 69,8 | 64,5 | 67,3 | 79,3 | 67,9 |
| 45,9 | 53,0 | 54,1 | 64,1 | 74,5 | 73,6 |
| 36,1 | 41,2 | 42,0 | 58,0 | 68,9 | 68,1 |
| 43,7 | 50,0 | 52,8 | 66,9 | 79,6 | 80,3 |
| 34,8 | 42,1 | 44,3 | 48,0 | 59,3 | 63,0 |
| 30,5 | 38,6 | 29,1 | 32,0 | 41,1 | 29,6 |
| 34,4 | 41,2 | 41,6 | 42,2 | 52,0 | 51,3 |
| 26,4 | 33,2 | 30,8 | 40,9 | 51,4 | 48,6 |
| 32,4 | 38,4 | 40,1 | 45,4 | 53,9 | 56,3 |
Table 4 shows the results of comparing BrEPS and PRIAM with two different standards of truth (SOT). Since the complete enzyme content of the five displayed microorganisms is not known, we only evaluate the sensitivity of both methods, i.e. the percentage of true positives (TP). BrEPS A is a strictly defined set of BrEPS patterns and a subset of BrEPS B, which is more loosely defined. The results are discussed in the main text.
Contribution of true positive EC numbers by BrEPS and PRIAM
| Species | BrEPS A | PRIAM | Both | BrEPS B | PRIAM | Both |
|---|---|---|---|---|---|---|
| 5,3 | 24,6 | 70,2 | 8,5 | 13,6 | 78,0 | |
| 5,0 | 9,7 | 85,3 | 8,6 | 1,2 | 90,1 | |
| 9,5 | 23,6 | 66,8 | 12,3 | 14,5 | 73,1 | |
| 7,4 | 20,4 | 72,2 | 12,3 | 15,8 | 71,9 | |
| 6,3 | 25,0 | 68,8 | 9,6 | 14,5 | 75,9 | |
| 5,5 | 25,8 | 68,8 | 12,9 | 17,3 | 69,8 | |
| 14,5 | 10,5 | 75,1 | 25,9 | 1,7 | 72,3 | |
| 10,4 | 26,1 | 63,5 | 15,8 | 16,6 | 67,6 | |
| 9,9 | 22,5 | 67,6 | 22,0 | 15,9 | 62,2 | |
| 9,5 | 27,8 | 62,7 | 13,6 | 17,4 | 68,9 | |
Table 5 shows the percentage of unique and common EC numbers that BrEPS and PRIAM contributed to the joint set of all true positive hits. BrEPS contributed between 5% and 15% unique hits, while PRIAM contributed in general between 10% and 25% unique hits; except for the E. coli rows in the rightmost column of Table 2, where PRIAM reached less than 2%.