| Literature DB >> 21541071 |
Dhwani K Desai1, Soumyadeep Nandi, Prashant K Srivastava, Andrew M Lynn.
Abstract
Various enzyme identification protocols involving homology transfer by sequence-sequence or profile-sequence comparisons have been devised which utilise Swiss-Prot sequences associated with EC numbers as the training set. A profile HMM constructed for a particular EC number might select sequences which perform a different enzymatic function due to the presence of certain fold-specific residues which are conserved in enzymes sharing a common fold. We describe a protocol, ModEnzA (HMM-ModE Enzyme Annotation), which generates profile HMMs highly specific at a functional level as defined by the EC numbers by incorporating information from negative training sequences. We enrich the training dataset by mining sequences from the NCBI Non-Redundant database for increased sensitivity. We compare our method with other enzyme identification methods, both for assigning EC numbers to a genome as well as identifying protein sequences associated with an enzymatic activity. We report a sensitivity of 88% and specificity of 95% in identifying EC numbers and annotating enzymatic sequences from the E. coli genome which is higher than any other method. With the next-generation sequencing methods producing a huge amount of sequence data, the development and use of fully automated yet accurate protocols such as ModEnzA is warranted for rapid annotation of newly sequenced genomes and metagenomic sequences.Entities:
Year: 2011 PMID: 21541071 PMCID: PMC3085309 DOI: 10.1155/2011/743782
Source DB: PubMed Journal: Adv Bioinformatics ISSN: 1687-8027
Figure 1Flow diagram of the ModEnzA protocol.
Figure 2ROC curves for genome-wide enzyme identification using the ModEnzA profiles. The classification of the complete genomes of the four organisms is shown in (a). A fraction of the EC number profiles (284 out of 2075 Tier I ModEnzA profiles) were retrained with an older version of the ENZYME database and compared to PRIAM and MetaShark (b). ModEnzA-RT-Retrained ModEnzA profiles.
Genome-wide enzyme identification for three bacterial genomes (E. coli, B. aphidicola, and M. pneumoniae) and one eukaryotic genome (P. falciparum) by ModEnzA and EFICAz.
| Methods | EFICAz | ModEnzA(Tier I) | ModEnzA(Tier I+II) | |
|---|---|---|---|---|
| Annotation benchmark | HAMAP | |||
|
| ||||
|
| ||||
| Sequences | 1012 | 859 (1051) | 902 (1021) | 930 (1082) |
| Sensitivity | 84.88 | 89.13 | 91.89 | |
| Specificity | 81.73 | 88.34 | 85.95 | |
| EC numbers | 755 | 653 (728) | 663 (697) | 699 (775) |
| Sensitivity | 86.49 | 87.81 | 92.58 | |
| Specificity | 89.69 | 95.12 | 90.19 | |
|
| ||||
| Sequences | 273 | 257 (273) | 264 (271) | 265 (273) |
| Sensitivity | 94.13 | 96.7 | 97.07 | |
| Specificity | 94.13 | 97.41 | 97.07 | |
| EC numbers | 245 | 226 (238) | 225 (229) | 225 (233) |
| Sensitivity | 92.24 | 91.83 | 91.83 | |
| Specificity | 94.95 | 98.25 | 96.56 | |
|
| ||||
| Sequences | 147 | 119 (149) | 126 (139) | 126 (139) |
| Sensitivity | 80.95 | 85.71 | 85.71 | |
| Specificity | 79.86 | 90.64 | 90.64 | |
| EC numbers | 127 | 101 (122) | 115 (122) | 115 (122) |
| Sensitivity | 79.52 | 90.55 | 90.55 | |
| Specificity | 82.78 | 94.26 | 94.26 | |
|
| ||||
| Annotation benchmark | PlasmoDB | |||
|
| ||||
|
| ||||
| Sequences | 771 | 341 (480) | 350 (415) | 358 (431) |
| Sensitivity | 44.22 | 45.39 | 46.43 | |
| Specificity | 71.04 | 84.33 | 83.06 | |
| EC numbers | 410 | 217 (247) | 212 (234) | 215 (242) |
| Sensitivity | 52.92 | 51.7 | 52.43 | |
| Specificity | 87.85 | 90.59 | 88.84 | |
Numbers within parentheses indicate the total number of sequences or EC numbers identified by each method.
Conflicting annotations for the 22 EC numbers predicted as belonging to the P. falciparum genome by ModEnzA but not annotated in PlasmoDB.
| EC No. | Sequence | KEGG | PlasmoCyc |
|---|---|---|---|
| 1.1.5.3 | PFC0275w | FAD-dependent glycerol-3-phosphate dehydrogenase, putative | FAD-dependent glycerol-3-phosphate dehydrogenase, putative |
| 1.17.7.1 | PF10_0221 | (E)-4-Hydroxy-3-methylbut-2-enyl-diphosphate synthase | Methylerythritol phosphate pathway |
| 1.3.1.8 | PF11_0370 | — | — |
| 1.3.5.2 | PFF0160c | — | Uridine-5′-phosphate biosynthesis |
| 2.1.1.48 | PF14_0156 | — | — |
| 2.3.1.180 | PFB0505c | 3-Oxoacyl-(acyl carrier protein) synthase III, putative | Fatty acid biosynthesis initiation I |
| 2.3.1.181 | MAL8P1.37 | Lipoyl(octanoyl) transferase | — |
| 2.4.1.141 | MAL8P1.133 | Beta-1,4-N-acetylglucosaminyltransferase | Dolichyl-diphosphooligosaccharide biosynthesis |
| 2.7.12.1 | PF14_0431 | dual-specificity kinase | — |
| 2.7.1.90 | PFI0755c | 6-phosphofructokinase | ATP-dependent phosphofructokinase, putative |
| 2.7.7.64 | PFE0875c | — | — |
| 2.8.1.8 | MAL13P1.220 | Lipoic acid synthetase | — |
| 3.1.13.4 | MAL8P1.104, PFE0980c | — | — |
| 3.1.21.2 | PF13_0176 | — | — |
| 3.4.21.10 | PFE0340c, PF11_0149, MAL8P1.16, PF14_0110 | — | — |
| 3.5.1.88 | PFI0380c | — | — |
| 3.6.1.1 | PF14_0541, PFL1700c, PFC0710w-a, PFC0710w-b | Inorganic pyrophosphatase | Inorganic pyrophosphatase, putative, V-type H(+)-translocating pyrophosphatase, putative |
| 3.6.1.7 | PF11_0121 | — | — |
| 3.6.3.44 | PFE1150w | — | ABC transporter, putative |
| 3.6.4.3 | PF14_0548 | — | — |
| 3.6.4.6 | PFC0140c | Vesicle-fusing ATPase | — |
| 3.6.5.5 | PF10_0368 | Dynamin GTPase | — |
“—”–Annotation not present in either PlasmoCyc or KEGG.
Three-digit annotations for the sequences selected from P. falciparum by Tier II and Tier III profiles.
| Gene | PlasmoDB product description* | PlasmoDB EC* | ModEnzA EC | EC description# |
|---|---|---|---|---|
| PF07_0059 | 4-nitrophenylphosphatase, putative | 3.1.3.-(Phosphoric monoester hydrolases.); 3.1.3.41 (4-nitrophenylphosphatase) | T2-3.1.3.41 | 4-nitrophenylphosphatase |
| PF08_0108 | Pepsinogen, putative | 3.4.23.1 | T2-3.4.23.2 | Pepsin B |
| PF10_0329 | Aspartyl protease, putative; Plasmepsin VII | None | T2-3.4.23.2 | Pepsin B |
| PF11_0161 | Falcipain-2 precursor, putative | 3.4.22.- | T3-3.4.22.32 | Stem bromelain |
| PF11_0162 | Falcipain-3 | 3.4.22.- | T3-3.4.22.32 | Stem bromelain |
| PF11_0165 | Falcipain 2 precursor | 3.4.22.- | T3-3.4.22.32 | Stem bromelain |
| PF11_0295 | Farnesyl pyrophosphate synthase, putative | 2.5.1.10 Geranyltranstransferase; 2.5.1.1 Dimethylallyltranstransferase | T2-2.5.1.67 | Chrysanthemyl diphosphate synthase |
| PF14_0075 | Plasmepsin, putative | 3.4.23.38 (Plasmepsin I) | T2-3.4.23.39 | Plasmepsin II |
| PF14_0076 | Plasmepsin 1 precursor | 3.4.23.38 (Plasmepsin I) | T2-3.4.23.39 | Plasmepsin II |
| PF14_0077 | Plasmepsin 2 | 3.4.23.39 (Plasmepsin II) | T2-3.4.23.39 | Plasmepsin II |
| PF14_0078 | HAP protein; Plasmepsin III | 3.4.23.-Aspartic endopeptidases | T2-3.4.23.39 | Plasmepsin II |
| PF14_0281 | Aspartyl protease, putative | None | T2-3.4.23.2 | Pepsin B |
| PF14_0334 | NAD(P)H-dependent glutamate synthase, putative | 1.4.7.1 Glutamate synthase (ferredoxin);1.4.1.14 -Glutamate synthase (NADH) | T2-1.4.1.14 | Glutamate synthase |
| PF14_0553 | Cysteine proteinase falcipain-1 | None | T3-3.4.22.32 | Stem bromelain |
| PF14_0625 | Hypothetical protein | 3.4.2.3; Transferred entry: 3.4.17.4 | T2-3.4.23.2 | Pepsin B |
| PFC0495w | Aspartyl protease, putative | 3.4.23.- Aspartic endopeptidases | T2-3.4.23.2 | Pepsin B |
| PFF0530w | Transketolase, putative | 2.2.1.1 Transketolase | T2-2.2.1.3 | Formaldehyde transketolase |
| PFI1125c | 3-oxoacyl-(acyl-carrier protein) reductase, putative | 1.1.1.100 (3-oxoacyl-[acyl-carrier-protein] reductase); 2.3.1.85 (Fatty-acid synthase) | T2-1.1.1.140 | Sorbitol-6-phosphate 2-dehydrogenase |
*Gene product descriptions and EC annotations obtained from PlasmoDB. #IUBMB EC description.