| Literature DB >> 20406431 |
Juan C Triviño1, Florencio Pazos.
Abstract
BACKGROUND: Global studies of the protein repertories of organisms are providing important information on the characteristics of the protein space. Many of these studies entail classification of the protein repertory on the basis of structure and/or sequence similarities. The situation is different for metabolism. Because there is no good way of measuring similarities between chemical reactions, there is a barrier to the development of global classifications of "metabolic space" and subsequent studies comparable to those done for protein sequences and structures.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20406431 PMCID: PMC2883543 DOI: 10.1186/1752-0509-4-46
Source DB: PubMed Journal: BMC Syst Biol ISSN: 1752-0509
Figure 1Vectorial representation of chemical compounds and enzymatic activities. (A) Example of an oxidation reaction (R-OH → R = O) included in KEGG. The components of the enzyme vector (which is the difference between the vectors of the product and substrate) to some extent reflect the nature of the transformation: loss of 2 HCC, 1 OCH and 1 HOC triplets. On the left, a 3D projection of the 60-dimensional space defined by the compound and enzyme vectors is depicted, with some compounds, reactions and pathways shown. (B) Representation of the vectors within the KEGG dataset. Both reactions and compounds are defined by 60-D vectors where each component represents an atom triplet.
Parameters of the multivariate linear regression between three chemical properties and the components of the compounds vectors.
| R | R2 | |
|---|---|---|
| 0.902 | 0.814 | |
| 0.977 | 0.955 | |
| 0.977 | 0.954 | |
R is the Pearson correlation coefficient.
Figure 2Correlation between experimental and predicted hydrophobicity. Correlation between the experimental and predicted hydrophobicity (quantified by the logP value) for the dataset of 4407 molecules.
Figure 3Hierarchical clustering of the reaction vectors. The tree is constructed using the UPGMA method based on the Euclidean distances between the reaction vectors. The main EC class of the associated enzymes is represented by a color code. The representation was generated with iTOL [38].
Fisher's z-scores representing the dependence between a given clustering and a set of keywords.
| EC level | GO | Interpro | BRENDA | Prosite | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1st (X.-.-.-) | 6 | 223.10 | 48.62 | 6 | 322.88 | 42.53 | 6 | 118.29 | 65.73 | 6 | 178.76 | 61.90 |
| 2nd (X.X.-.-) | 36 | 1458.93 | 611.98 | 39 | 2163.93 | 905.54 | 39 | 334.23 | 329.99 | 33 | 455.45 | 316.99 |
| 3rd (X.X.X.-) | 79 | 2325.21 | 1660.40 | 79 | 3438.31 | 1948.35 | 108 | 1236.48 | 1017.59 | 67 | 783.31 | 600.11 |
The optimal reactome clustering with 21 groups, for which no equivalent exists in EC, is also included. Some EC groupings results in clusters without any enzyme associated to keywords in that particular dataset. These are excluded from the analysis and are responsible for the small variations in the number of clusters (e.g. 33, 36, 39 for the 2nd level). For comparison, the reactome is clustered with the same (reduced) number of clusters.