| Literature DB >> 22960854 |
Germán Plata1, Tobias Fuhrer, Tzu-Lin Hsiao, Uwe Sauer, Dennis Vitkup.
Abstract
Annotation of organism-specific metabolic networks is one of the main challenges of systems biology. Importantly, owing to inherent uncertainty of computational annotations, predictions of biochemical function need to be treated probabilistically. We present a global probabilistic approach to annotate genome-scale metabolic networks that integrates sequence homology and context-based correlations under a single principled framework. The developed method for global biochemical reconstruction using sampling (GLOBUS) not only provides annotation probabilities for each functional assignment but also suggests likely alternative functions. GLOBUS is based on statistical Gibbs sampling of probable metabolic annotations and is able to make accurate functional assignments even in cases of remote sequence identity to known enzymes. We apply GLOBUS to genomes of Bacillus subtilis and Staphylococcus aureus and validate the method predictions by experimentally demonstrating the 6-phosphogluconolactonase activity of YkgB and the role of the Sps pathway for rhamnose biosynthesis in B. subtilis.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22960854 PMCID: PMC3696893 DOI: 10.1038/nchembio.1063
Source DB: PubMed Journal: Nat Chem Biol ISSN: 1552-4450 Impact factor: 15.040
Figure 1Overview of the GLOBUS method
(a) A generic Enzyme Commission (EC) network, where nodes represent all known biochemical activities and edges indicate metabolites shared between activities. (b) For a genome of interest, the potential network locations of each gene are assigned based on sequence homology to known enzymes. (c) Each gene is initially assigned randomly to one of its possible locations. A fitness function is defined such that assignments to locations with high sequence identity and good context correlations with neighboring genes correspond to higher values of the fitness function (higher probability). (d) Gibbs sampling is used to sample all possible assignments of genes to their candidate network locations. At each step of a Gibbs chain a random gene is selected and re-assigned to one of its possible locations (arrows). The marginal probabilities for assigning every gene to each candidate network location are derived from converged Gibbs chains.
Figure 2GLOBUS precision-recall performance
Using available metabolic models (iBsu1103[33] for B. subtilis and iSB619[35] for S. aureus) we compared predictions by GLOBUS to predictions made using sequence homology; predictions for B. subtilis are on the top, and predictions for S. aureus are on the bottom. (a) Precision–recall curves for GLOBUS (black lines) were calculated by ranking genes using assignment probabilities. Precision-recall curves for homology (red lines) were calculated by ranking genes using sequence identity. (b) Recall of known metabolic genes (at 70% precision) as a function of sequence identity to the closest enzymes from other species with the annotated functions. (c) Prediction precision (at 90% recall) for known metabolic genes as a function of sequence identity to the closest enzymes from other species with the annotated functions. In the figure error bars represent the S.E.M,
Prediction of gene function in B. subtilis
In the table we show predictions without experimental validation that have GLOBUS-assigned probabilities above 0.5 and protein sequence identity to known enzymes below 50%. The first three activities in the table were experimentally validated in this study. The remaining annotations in the table are ordered by averaging the prediction ranks sorted by decreasing annotation probability and the prediction ranks sorted by decreasing sequence identity distance to known enzymes. The last column shows the average Z-score of phylogenetic correlations, gene clustering and gene co-expression when all sequences are assigned to their most probable locations. The Z-score for each type of data was calculated using the maximum context correlation between a gene and its immediate network neighbors (see Methods).
| Gene | EC | Enzyme name | Probability | Identity | Average |
|---|---|---|---|---|---|
|
| 2.7.7.24 | glucose-1-phosphate thymidylyltransferase | 0.93 | 44.4 | 11.6 |
|
| 4.2.1.46 | dTDP-glucose-4,6-dehydratase | 0.97 | 48 | 12.0 |
|
| 3.1.1.31 | 6-phosphogluconolactonase | 0.51 | 30.4 | 2.6 |
|
| 6.3.2.10 | UDP-N-acetylmuramoyl-tripeptide-D-alanyl-D-alanine | 0.98 | 32.8 | 9.0 |
|
| 5.1.3.13 | dTDP-4-dehydrorhamnose-3,5-epimerase | 0.95 | 33.1 | 8.4 |
|
| 1.5.99.8 | proline dehydrogenase | 0.76 | 25.6 | 3.6 |
|
| 4.2.1.45 | CDP-glucose-4,6-dehydratase | 0.76 | 27.5 | 11.0 |
|
| 6.3.4.15 | biotin-[acetyl-CoA-carboxylase] ligase | 0.77 | 31.7 | 2.3 |
|
| 1.4.4.2 | glycine dehydrogenase (decarboxylating) | 0.97 | 41.5 | 12.3 |
|
| 4.1.1.36 | phosphopantothenoylcysteine decarboxylase | 0.99 | 44.5 | 2.6 |
|
| 2.7.1.56 | 1-phosphofructokinase | 0.88 | 40.4 | 10.9 |
|
| 1.1.1.133 | dTDP-4-dehydrorhamnose reductase | 0.87 | 39.6 | 8.4 |
|
| 1.1.1.158 | UDP-N-acetylmuramate dehydrogenase | 0.97 | 43 | 5.2 |
|
| 2.7.6.3 | 2-amino-4-hydroxy-6-hydroxymethyldihydropteridine | 0.99 | 45.3 | 8.0 |
|
| 2.5.1.15 | dihydropteroate synthase | 0.99 | 47 | 8.2 |
|
| 2.1.1.13 | methionine synthase | 0.54 | 30.6 | 2.1 |
|
| 2.7.1.69 | protein-Npi-phosphohistidine-sugar phosphotransferase | 0.85 | 40.5 | 11.3 |
|
| 6.3.2.5 | phosphopantothenate-cysteine ligase | 0.97 | 44.5 | 2.9 |
|
| 4.2.1.51 | prephenate dehydratase | 0.69 | 36.1 | 6.7 |
|
| 4.1.1.21 | phosphoribosylaminoimidazole carboxylase | 0.89 | 43.5 | 13.3 |
|
| 3.6.1.15 | nucleoside-triphosphatase | 0.56 | 33.3 | 7.7 |
|
| 4.4.1.5 | lactoylglutathione lyase | 0.6 | 35.2 | 3.6 |
|
| 1.2.3.14 | abscisic-aldehyde oxidase | 0.62 | 35.8 | 1.0 |
|
| 2.7.1.4 | fructokinase | 0.77 | 41.5 | 5.3 |
|
| 2.7.7.33 | glucose-1-phosphate cytidylyltransferase | 0.88 | 43.2 | 11.0 |
|
| 3.2.1.52 | beta-N-acetylhexosaminidase | 0.52 | 33.1 | 3.1 |
|
| 6.4.1.4 | methylcrotonoyl-CoA carboxylase | 0.64 | 36.2 | 8.6 |
|
| 2.3.1.29 | glycine C-acetyltransferase | 0.97 | 49 | 9.4 |
|
| 2.5.1.3 | thiamine-phosphate diphosphorylase | 0.7 | 40.6 | 6.6 |
|
| 4.1.3.27 | anthranilate synthase | 0.74 | 42.8 | 8.6 |
Figure 3In vitro biochemical assays used to characterize activities of SpsI and SpsJ using high-precision mass spectrometry
(a) Reaction diagram. (b) Mass spectrum plot showing intensities for masses corresponding to the products dTDP-glucose and dTDP-4-dehydro-6-deoxy-glucose of the reactions catalyzed by SpsI and SpsJ (black arrows, detailed in panel c). Observed masses deviated by less than 0.001 atomic mass units (amu) from the corresponding reference masses. Spectra were recorded from two independent assays. (c, d) Bar plots show dependency of dTDP-glucose and dTDP-4-dehydro-6-deoxy-glucose accumulation on protein concentration of SpsI and SpsJ, respectively. As negative control (n.c.), the protein free filtrate of 6.99 μM spsI or 203.01 μM SpsJ solution was used. Error bars represent standard deviations calculated using two independent assays.
Figure 4In vitro biochemical assays used to characterize the 6-phospho-gluconolactonase activity of YkgB
(a) Reaction diagram for 6-phospho-gluconolactonase. (b) Time courses of lactone degradation at different YkgB concentrations were recorded by direct flow injection analysis. Different symbols represent replicate assays. (c) Relative intensity increase from initial to final lactone intensities as a function of YkgB concentration. As negative control (n.c.), the protein-free filtrate of 223.2 μM YkgB solution was used. Error bars represent standard deviations calculated using two independent assays.