| Literature DB >> 16135255 |
Abstract
BACKGROUND: Accurate assignment of genes to pathways is essential in order to understand the functional role of genes and to map the existing pathways in a given genome. Existing algorithms predict pathways by extrapolating experimental data in one organism to other organisms for which this data is not available. However, current systems classify all genes that belong to a specific EC family to all the pathways that contain the corresponding enzymatic reaction, and thus introduce ambiguity.Entities:
Mesh:
Substances:
Year: 2005 PMID: 16135255 PMCID: PMC1239907 DOI: 10.1186/1471-2105-6-217
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Pathway graphs. Left: the pathway relation graph. Each pathway is represented as a node, and an edge is drawn between two pathways for each reaction that they share in common. Middle: the pathway conflict graph. Thick edges represent conflicts (i.e. the same gene was assigned to catalyze the same reaction in both pathways connected by the edge). Right: the final conflict graph. The edge between pathways P9 and P10 is a flat edge (no alternative assignments exist for that reaction) and therefore it is unmarked. At the end we are left with only two connected components with possibly solvable conflicts.
Summary of pathway assignments. For each pathway in the test set we report the number of reactions, the number of assignments considered, the number of curated (SGD verified) assignments, and the maximal and minimal assignment scores. The score reported is the weighted average score per pair of compared enzymes. The score reflects the average significance of a pairwise relation within a pathway. The larger the score the more significant is the relation. Negative scores suggest anti-correlation and near-zero scores provide no evidence that the two genes are related. Pathways are sorted based on assignment score.
| Pathway | Number of reactions | Number of assignments | Number of curated assignments | Max(Min) Score Normalized |
| methionine and | 2 | 2 | 2 | 10.45 (7.34) |
| isoleucine biosynthesis I | 5 | 12 | 4 | 10.32 (3.00) |
| leucine biosynthesis | 4 | 4 | 4 | 10.08 (4.01) |
| valine biosynthesis | 4 | 4 | 4 | 10.14 (4.99) |
| asparagine biosynthesis I | 2 | 4 | 4 | 8.80 (-4.88) |
| proline biosynthesis I | 3 | 1 | 1 | 8.43 (8.43) |
| homoserine methionine biosynthesis | 2 | 1 | 1 | 7.33 (7.33) |
| tryptophan biosynthesis | 5 | 2 | 2 | 5.29 (4.13) |
| aspartate biosynthesis II | 2 | 4 | 4 | 4.85 (0.75) |
| non-oxidative branch of the pentose phosphate pathway | 5 | 8 | 8 | 4.82 (0.84) |
| folic acid biosynthesis | 11 | 48 | 32 | 4.63 (1.26) |
| glutamate biosynthesis I | 2 | 2 | 2 | 4.08 (-4.88) |
| glutathione biosynthesis | 2 | 1 | 1 | 4.03 (4.03) |
| glutamate degradation VIII | 5 | 1 | 1 | 3.92 (3.92) |
| serine biosynthesis | 3 | 2 | 2 | 3.58 (-0.58) |
| purine biosynthesis 2 | 14 | 16 | 8 | 2.50 (2.06) |
| homocysteine and cysteine interconversion | 3 | 2 | 1 | 2.35 (2.01) |
| biotin biosynthesis I | 3 | 1 | 1 | 2.27 (2.27) |
| homocysteine degradation I | 2 | 1 | 1 | 2.01 (2.01) |
| threonine biosynthesis from homoserine | 2 | 1 | 1 | 0.87 (0.87) |
| glutamine – glutamate pathway II | 1 | 1 | 1 | 0.00 (0.00) |
| tyrosine biosynthesis I | 3 | 2 | 2 | -0.53 (-0.58) |
| glycine biosynthesis I | 2 | 2 | 2 | -0.91 (-3.60) |
| phenylalanine biosynthesis I | 3 | 2 | 2 | -2.09 (-2.80) |
Figure 2The Isoleucine Biosynthesis pathway diagram. The pathway layout is retrieved from the MetaCyc database. For each reaction we list the genes that can catalyze the reaction. A plus or minus sign indicates if the gene was assigned to the pathway in SGD. The expression profiles and their similarity score are shown for selected pairs of genes. Mapping between gene names and Biozon identifiers is given in Table 6.
Assignments for the pathway isoleucine biosynthesis I. Only reactions with alternative assignments are listed (last column), and the selection number refers to Figure 2. For example, the top assignment selects the second gene (ILV1) to catalyze reaction 4.3.1.19. Assignments are sorted based on the normalized score. Second column marks which assignments are true assignments, and which are considered false assignments. For each assignment we list the total number of pairwise similarities, the number of positive and negative scoring pairs and the number of zero scoring pairs (when no expression data is available).
| Number | Match | Normalized Score | Number of Pairs | Positive Pairs | Negative Pairs | Zero Pairs | Assignments |
| 1 | + | 10.32 | 10 | 10 | 0 | 0 | 4.3.1.19 : 2 |
| 2 | + | 9.54 | 10 | 10 | 0 | 0 | 4.3.1.19 : 2 |
| 3 | + | 7.37 | 10 | 10 | 0 | 0 | 4.3.1.19 : 2 |
| 4 | - | 6.49 | 10 | 8 | 2 | 0 | 4.3.1.19 : 3 |
| 5 | + | 6.30 | 10 | 9 | 1 | 0 | 4.3.1.19 : 2 |
| 6 | - | 6.08 | 10 | 6 | 0 | 4 | 4.3.1.19 : 1 |
| 7 | - | 5.52 | 10 | 6 | 0 | 4 | 4.3.1.19 : 1 |
| 8 | - | 5.00 | 10 | 7 | 3 | 0 | 4.3.1.19 : 3 |
| 9 | - | 4.78 | 10 | 8 | 2 | 0 | 4.3.1.19 : 3 |
| 10 | - | 3.84 | 10 | 6 | 0 | 4 | 4.3.1.19 : 1 |
| 11 | - | 3.00 | 10 | 6 | 4 | 0 | 4.3.1.19 : 3 |
| 12 | - | 3.00 | 10 | 5 | 1 | 4 | 4.3.1.19 : 1 |
Figure 3The Isoleucine Biosynthesis pathway from the reconstructed metabolic network of Saccharomyces Cerevisiae [18]. Reproduced with permission from Cold Spring Harbor Laboratory ©2004 (Duarte et al. 2004 [18]). The EC numbers and the genes associated with the reactions were added to diagram. The part that overlaps with the MetaCyc isoleucine biosynthesis pathway is circled.
Figure 4The folic acid biosynthesis pathway diagram. See Figure 2 for description. Note that FOL1, ADE3 and MIS1 are multi-functional enzymes.
Figure 5The folic acid biosynthesis pathway from the reconstructed metabolic network of Saccharomyces Cerevisiae [18]. Reproduced with permission from Cold Spring Harbor Laboratory ©2004 (Duarte et al. 2004 [18]). The EC numbers and the genes associated with the reactions were added to diagram. The parts that overlap with the MetaCyc folic acid biosynthesis pathway are circled. The green circles indicate consistency while the red one indicates inconsistency.
Figure 6The asparagine biosynthesis pathway. See Figure 2 for description. Both ASN1 and ASN2 are correlated with AAT2 but are anti-correlated with AAT1 (selected pairwise similarities are shown). The later is localized to a different cellular compartment than the others, and is likely to be involved in other pathways (see text for details).
Figure 7The asparagine biosynthesis pathway from the reconstructed metabolic network of Saccharomyces Cerevisiae 18. Reproduced with permission from Cold Spring Harbor Laboratory ©2004 (Duarte et al. 2004 [18]). The part that overlaps with the MetaCyc asparagine biosynthesis pathway is circled.
Distribution of pathway assignment scores. For each data set we ran our algorithm for pathway assignment. The algorithm considers all pathways simultaneously attempting to maximize expression similarity while minimizing the number of conflicts. The final normalized pathway assignment scores Score(A(P)) are divided into four categories based on the average expression similarity of their genes: strongly correlated genes (4 ≤ Score), mildly correlated genes (1
| Data Set | Assignment | score -1 ≤ | 1 < | 4 ≤ |
| Time-series | 4 | 5 | 11 | 32 |
| Rosetta | 2 | 2 | 7 | 41 |
Genome wide analysis. Connected components' scores before and after resolving conflicts. For each component we list the names of the constituent pathways, the number of conflicts (shared assignments) and the component score. Note that not all conflicts are solvable. For example, the first connected component contains three pathways, and the best initial assignment results in 6 conflicts. Of these only two are solvable (i.e. there are multiple enzymes that can be assigned to these reactions). The final assignment resolves these conflicts while reducing the score of the connected component only slightly (9.31 compared to 10.22).
| Component Number | Pathways | Number of Conflicts (solvable conflicts) | Component score | ||
| Before | After | Before | After | ||
| 1 | isoleucine biosynthesis I | 6(2) | 4 | 10.22 | 9.31 |
| 2 | aerobic glycerol degradation II | 5(3) | 2 | 7.58 | 7.22 |
| 3 | asparagine biosynthesis I | 4(0) | 4 | 6.82 | 6.82 |
| 4 | trehalose anabolism | 4(1) | 3 | 6.64 | 6.62 |
| 5 | pentose phosphate pathway, Mycoplasma pneumoniae | 5(2) | 3 | 5.60 | 4.78 |
| 6 | serine biosynthesis | 3(0) | 3 | 3.98 | 3.98 |
| 7 | glycine biosynthesis I | 3(2) | 1 | 3.41 | 3.41 |
| 8 | arginine biosynthesis, Bacillus subtilis | 0(0) | 0 | 2.08 | 2.08 |
| 9 | alanine degradation 3 | 1(1) | 0 | 0 | 0 |
| 10 | phenylalanine biosynthesis I | 2(1) | 1 | -1.31 | -1.33 |
Genome wide analysis. Statistics of the final assignments. For each pathway we list the number of possible assignments, the maximum and minimum scores observed over these assignments, and the final score (note that the final score might not be the maximum score, due to conflicts that were resolved at the refinement stage). The last column gives the number of pairwise relations considered in each assignment, and the number of negative-scoring pairs in the final assignment (in parentheses). Negative scores indicate anti or no correlation. Pathways are sorted by the final assignment score. Note that most pathways are assigned a high positive score, and almost all pairs in the final assignments are positive pairs.
| Pathway | Number of Assignments | Max Score | Min Score | Final Score | Number of pairs (negative Pairs) |
| pentose phosphate pathway, Mycoplasma pneumoniae | 2 | 11.03 | -0.73 | 11.03 | 1(0) |
| sulfate assimilation 2 | 1 | 11.02 | 11.02 | 11.02 | 1(0) |
| methionine and S-adenosylmethionine synthesis | 2 | 10.45 | 7.34 | 10.45 | 1(0) |
| isoleucine biosynthesis I | 12 | 10.32 | 3.00 | 10.32 | 10(0) |
| valine biosynthesis | 4 | 10.14 | 4.99 | 10.14 | 6(0) |
| trehalose biosynthesis | 2 | 10.02 | 9.65 | 9.65 | 1(0) |
| glutamate degradation I | 1 | 9.64 | 9.64 | 9.64 | 3(0) |
| arginine biosynthesis I | 1 | 9.58 | 9.58 | 9.58 | 3(0) |
| chorismate biosynthesis | 2 | 9.24 | 8.19 | 9.24 | 21(0) |
| glycolysis | 180 | 8.98 | 3.16 | 8.87 | 28(0) |
| asparagine biosynthesis I | 4 | 8.80 | -4.88 | 8.80 | 1(0) |
| trehalose anabolism | 8 | 8.80 | 0.27 | 8.80 | 6(0) |
| proline biosynthesis I | 1 | 8.43 | 8.43 | 8.43 | 3(0) |
| galactose metabolism | 4 | 7.91 | 4.62 | 7.91 | 6(0) |
| glycine degradation III | 2 | 7.73 | 7.73 | 7.73 | 1(0) |
| methylglyoxal degradation | 2 | 7.59 | 0.98 | 7.59 | 1(0) |
| tRNA charging pathway | 49152 | 7.41 | 1.69 | 7.41 | 171 (2) |
| glyoxylate cycle | 72 | 7.37 | 0.27 | 7.37 | 10(0) |
| homoserine methionine biosynthesis | 1 | 7.33 | 7.33 | 7.33 | 1(0) |
| pyruvate dehydrogenase | 2 | 6.43 | 4.33 | 6.43 | 1(0) |
| removal of superoxide radicals | 4 | 5.93 | -0.45 | 5.93 | 1(0) |
| aerobic glycerol degradation II | 180 | 6.19 | 1.64 | 5.58 | 28(1) |
| aspartate biosynthesis II | 4 | 4.85 | 0.75 | 4.85 | 1(0) |
| non-oxidative branch of the pentose phosphate pathway | 8 | 4.82 | 0.84 | 4.82 | 10(1) |
| oxidative branch of the pentose phosphate pathway | 6 | 4.78 | 1.07 | 4.78 | 3(0) |
| arginine biosynthesis, Bacillus subtilis | 3 | 4.55 | 2.69 | 4.55 | 34(4) |
| leucine biosynthesis | 4 | 10.08 | 4.01 | 4.31 | 3(0) |
| UDP-N-acetylglucosamine biosynthesis | 1 | 4.22 | 4.22 | 4.22 | 1(0) |
| cysteine biosynthesis II | 2 | 4.19 | 0.94 | 4.19 | 6(0) |
| tryptophan biosynthesis | 2 | 4.13 | 3.99 | 4.13 | 10(1) |
| glutamate biosynthesis I | 2 | 4.08 | -4.88 | 4.08 | 1(0) |
| glutathione biosynthesis | 1 | 4.03 | 4.03 | 4.03 | 1(0) |
| arginine degradation I | 1 | 4.00 | 4.00 | 4.00 | 3(0) |
| arginine proline degradation | 1 | 3.84 | 3.84 | 3.84 | 3(0) |
| serine biosynthesis | 2 | 3.58 | -0.58 | 3.58 | 3(0) |
| folic acid biosynthesis | 48 | 3.49 | 0.23 | 3.49 | 55 (12) |
| histidine biosynthesis I | 1 | 2.48 | 2.48 | 2.48 | 12(2) |
| purine biosynthesis 2 | 16 | 2.43 | 2.01 | 2.43 | 90 (27) |
| homocysteine and cysteine interconversion | 2 | 2.35 | 2.01 | 2.35 | 3(1) |
| biotin biosynthesis I | 1 | 2.27 | 2.27 | 2.27 | 3(2) |
| homocysteine degradation I | 1 | 2.01 | 2.01 | 2.01 | 1(0) |
| glutamate degradation VIII | 1 | 1.86 | 1.86 | 1.86 | 8(2) |
| homoserine biosynthesis | 1 | 1.14 | 1.14 | 1.14 | 3(1) |
| threonine biosynthesis from homoserine | 1 | 0.87 | 0.87 | 0.87 | 1(0) |
| de novo biosynthesis of pyrimidine ribonucleotides | 12 | 0.14 | -0.69 | 0.14 | 43 (25) |
| ornithine spermine biosynthesis | 2 | -0.24 | -2.48 | -0.24 | 3(2) |
| tyrosine biosynthesis I | 2 | -0.53 | -0.58 | -0.58 | 3(1) |
| glycine biosynthesis I | 2 | -0.91 | -3.60 | -0.91 | 1(1) |
| UDP-glucose conversion | 4 | -1.32 | -2.04 | -1.32 | 3(2) |
| ribose degradation | 2 | 8.06 | -1.74 | -1.74 | 1(1) |
| phenylalanine biosynthesis I | 2 | -2.09 | -2.80 | -2.09 | 3(2) |
| tryptophan kynurenine degradation | 1 | -2.46 | -2.46 | -2.46 | 1(1) |
The correspondence of genes to Biozon NR identifiers. We refer to genes using their unique and stable Biozon NR identifiers, at [41]. To view an entry with identifier x follow the URL: x.
| Gene | NR Identifiers |
| ILV1 | 005760000068 |
| CHA1 | 003600000165 |
| YKL218C | 003260000219 |
| ILV2 | 003090000098 |
| ILV6 | 006870000019 |
| ILV5 | 003950000069 |
| ILV3 | 005850000040 |
| BAT1 | 003930000034 |
| BAT2 | 003760000122 |
| FOL2 | 002430000075 |
| FOL1 | 008640000008 |
| FOL3 | 004270000071 |
| DFR1 | 002110001504 |
| MIS1 | 009750000001 |
| ADE3 | 009460000003 |
| SHM1 | 005650000392 |
| SHM2 | 004690000046 |
| YKL132C | 004300000053 |
| MET7 | 005480000035 |
| AAT1 | 004510000006 004510000730 |
| AAT2 | 004320000601 004170000010 |
| ASN2 | 005720000349 005710000020 |
| ASN1 | 005720000348 005710000019 |