| Literature DB >> 15642094 |
Pedro Romero1, Jonathan Wagg, Michelle L Green, Dale Kaiser, Markus Krummenacker, Peter D Karp.
Abstract
BACKGROUND: We present a computational pathway analysis of the human genome that assigns enzymes encoded therein to predicted metabolic pathways. Pathway assignments place genes in their larger biological context, and are a necessary first step toward quantitative modeling of metabolism.Entities:
Mesh:
Substances:
Year: 2004 PMID: 15642094 PMCID: PMC549063 DOI: 10.1186/gb-2004-6-1-r2
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
The number of human proteins that were assigned enzyme activities (which caused them to become connected to reaction objects within HumanCyc), according to the mechanism of reaction matching
| Type of match | Number of proteins |
| PathoLogic matched by EC number | 2,057 |
| PathoLogic matched by name | 314 |
| Ambiguous | 27 |
| Unmatched by PathoLogic | 27,185 |
| Probable enzymes | 1,320 |
| Manually matched | 625 |
Figure 1A typical description of a gene product's function in Ensembl. This example aims to communicate to the reader exactly what information was obtained from Ensembl; it shows multiple functions, synonyms and EC numbers, as well as a Swiss-Prot accession number, all in one line of text. A Perl script was developed to parse these descriptions and extract the relevant information.
HumanCyc statistics
| PGDB objects | Quantity |
| Replicons | 76 |
| Genes | 28,783 |
| Protein genes | 28,583 |
| Enzyme genes | 2,742 |
| RNA genes | 200 |
| tRNAs | 50 |
| Compounds | 661 |
| Polypeptides | 28,602 |
| Protein complexes | 22 |
| Enzymes | 2,709 |
| Enzymatic Reactions | 1,093 |
| With enzyme in HumanCyc | 896 |
| Pathways | 135 |
| Database links | 389,262 |
| Citations | 41,810 |
The entire set of pathways in HumanCyc, grouped by classes using the MetaCyc pathway classification hierarchy
| Class | Subclass | Pathway | EcoCyc | AraCyc |
| Biosynthesis | Polyamines | Betaine biosynthesis | * | * |
| Betaine biosynthesis II | ||||
| Spermine biosynthesis | * | |||
| Polyamine biosynthesis II | ||||
| Ornithine spermine biosynthesis | * | |||
| Polyamine biosynthesis | * | * | ||
| UDP- | * | |||
| UDP- | * | |||
| Nucleotides | * | |||
| Purine and pyrimidine metabolism | ||||
| Purine biosynthesis 2 | ||||
| * | ||||
| Salvage pathways of pyrimidine ribonucleotides | * | |||
| * | ||||
| Salvage pathways of pyrimidine deoxyribonucleotides | * | |||
| Fatty acids and lipids | Fatty acid elongation - saturated | * | * | |
| Fatty acid biosynthesis - initial steps | * | * | ||
| Phospholipid biosynthesis | * | * | ||
| Phospholipid biosynthesis II | ||||
| Mevalonate pathway | * | |||
| Triacylglycerol biosynthesis | * | |||
| Cofactors, prosthetic groups, electron carriers | Heme biosynthesis II | |||
| NAD biosynthesis II | ||||
| NAD biosynthesis III | ||||
| NAD phosphorylation and dephosphorylation | * | |||
| Pyridine nucleotide biosynthesis | * | * | ||
| Pyridine nucleotide cycling | * | |||
| Glutathione-glutaredoxin redox reactions | * | |||
| Glutathione biosynthesis | * | * | ||
| Thioredoxin pathway | * | * | ||
| Pantothenate and coenzyme A biosynthesis | * | * | ||
| Pyridoxal 5'-phosphate salvage pathway | * | * | ||
| FormylTHF biosynthesis | * | * | ||
| Polyisoprenoid biosynthesis | * | * | ||
| Methyl-donor molecule biosynthesis | * | |||
| Cell structures | Colanic acid building blocks biosynthesis | * | * | |
| GDP-mannose metabolism | * | * | ||
| Mannosyl-chito-dolichol biosynthesis | * | |||
| UDP- | * | |||
| Carbohydrates | GDP-D-rhamnose biosynthesis | |||
| Gluconeogenesis | * | * | ||
| Mannosyl-chito-dolichol biosynthesis | * | |||
| Trehalose degradation - low osmolarity | * | * | ||
| Aminoacyl-tRNAs | tRNA charging pathway | * | * | |
| Amino acid biosynthesis | Alanine biosynthesis II | * | ||
| Arginine biosynthesis 4 | * | |||
| Citrulline biosynthesis | ||||
| Asparagine biosynthesis I | ||||
| Aspartate biosynthesis II | ||||
| Cysteine biosynthesis II | ||||
| Glutamate biosynthesis II | * | |||
| Glutamine biosynthesis II | ||||
| Glycine cleavage | * | |||
| Glycine biosynthesis I | * | * | ||
| Methionine salvage pathway | ||||
| Proline biosynthesis I | * | * | ||
| Serine biosynthesis | * | * | ||
| Tyrosine biosynthesis II | ||||
| Degradation | Sugars and polysaccharides | Lactose degradation 4 | * | |
| Lactose degradation 2 | * | |||
| Sucrose degradation III | ||||
| Galactose metabolism | * | * | ||
| Glucose 1-phosphate metabolism | * | * | ||
| Glycogen degradation | * | * | ||
| Mannose degradation | * | |||
| Non-phosphorylated glucose degradation | * | |||
| UDP-glucose conversion | * | |||
| Ribose degradation | * | * | ||
| Trehalose degradation - low osmolarity | * | * | ||
| Sugar derivatives | Lactate oxidation | |||
| Mannitol degradation | * | |||
| Sorbitol degradation | * | |||
| Glucosamine catabolism | * | |||
| Other degradation | Removal of superoxide radicals | * | * | |
| Methylglyoxal degradation | ||||
| Nucleosides and nucleotides | (Deoxy)ribose phosphate metabolism | * | * | |
| Periplasmic NAD degradation | ||||
| Fatty acids | Fatty acid oxidation pathway | * | * | |
| Triacylglycerol degradation | * | |||
| Lipases pathway | * | |||
| Carboxylates, other | Propionate metabolism - methylmalonyl pathway | * | ||
| 2-Oxobutyrate degradation | ||||
| Acetate degradation | * | * | ||
| Pyruvate metabolism | ||||
| C1 compounds | Carbon monoxide dehydrogenase pathway | * | ||
| Serine-isocitrate lyase pathway | * | |||
| Amino acids, amines | Alanine degradation 3 | * | ||
| Arginine degradation III | ||||
| Arginase degradation pathway | ||||
| Arginine proline degradation | * | |||
| Asparagine degradation 1 | * | |||
| Aspartate degradation 1 | ||||
| Malate/aspartate shuttle pathway | ||||
| L-cysteine degradation IV | * | |||
| L-cysteine degradation VI | ||||
| Cysteine degradation I | ||||
| Glutamate degradation I | * | |||
| Glutamate degradation IV | ||||
| Glutamate degradation VII | * | |||
| Glutamine degradation 1 | ||||
| Glutamine degradation II | ||||
| Glycine degradation II | ||||
| Glycine degradation I | ||||
| Histidine degradation III | ||||
| Histidine degradation I | ||||
| Homocysteine degradation I | ||||
| Isoleucine degradation I | * | |||
| Isoleucine degradation III | ||||
| Leucine degradation II | ||||
| Leucine degradation I | * | |||
| Lysine degradation I | * | |||
| Methionine degradation 1 | * | |||
| 4-Hydroxyproline degradation | * | |||
| Phenylalanine degradation I | ||||
| Proline degradation III | ||||
| Proline degradation II | ||||
| L-serine degradation | * | * | ||
| Threonine degradation 2 | ||||
| Tryptophan degradation I | ||||
| Tryptophan degradation III | * | |||
| Tryptophan kynurenine degradation | ||||
| Tyrosine degradation | ||||
| Valine degradation I | * | |||
| Alcohols | Aerobic glycerol degradation II | * | ||
| Glycerol metabolism | * | * | ||
| Glycerol degradation I | * | |||
| Ethanol degradation | * | |||
| Amines and polyamines, other | Citrulline degradation | |||
| * | * | |||
| Glucosamine catabolism | * | |||
| Energy metabolism | Glycolysis 3 | * | ||
| Glycolysis | * | * | ||
| Glycolysis 2 | ||||
| Glyceraldehyde 3-phosphate degradation | * | |||
| Non-oxidative branch of the pentose phosphate pathway | * | * | ||
| Oxidative branch of the pentose phosphate pathway | * | * | ||
| Aerobic respiration - electron donors reaction list | * | |||
| Pyruvate dehydrogenase | * | * | ||
| TCA cycle - aerobic respiration | * | * | ||
| Entner-Doudoroff pathway | * |
More detailed subclasses were not included for brevity. An asterisk in one of the last two columns means that the pathway is also present in the EcoCyc (E. coli) and/or AraCyc (A. thaliana) databases, respectively. Note that pathway names are derived from the MetaCyc database, which explains why HumanCyc contains a pathway called 'Heme Biosynthesis II' but not 'Heme Biosynthesis I.'
Figure 2Predicted HumanCyc pathway for arginine degradation. The computer icon in the upper-right corner indicates this pathway was predicted computationally. Neither enzyme names nor gene names are drawn adjacent to the first three reactions of this pathway to indicate that these steps are pathway holes, meaning no enzyme has been identified for these steps in the human genome. The graphic at the bottom indicates the positions of genes within this pathways on the human chromosomes. Moving the mouse over a gene in the webpage for this diagram will identify the gene and the chromosome.
Figure 3Curated HumanCyc pathway for oxidative ethanol degradation. This pathway was not predicted by PathoLogic, but was entered into HumanCyc as part of our subsequent literature curation effort. The flask icon in the upper-right corner indicates this pathway is supported by experimental evidence. The complete comment for this pathway is available at [38]
A comparison of candidates for three missing enzymes
| Candidate | P (has-function) | Number of hits | Best E-value | Average rank | Percentage of query aligned | |
| Reaction hole: imidazolonepropionase | ||||||
| A | ENSG00000139344-MONOMER Functional annotation: UNKNOWN | 0.98 | 28 | 7.0e-69 | 1.0 | 91.9 |
| B | ENSG00000119125-MONOMER Functional annotation: Guanine deaminase | 0.00018 | 6 | 3.0e-6 | 3.5 | 37.9 |
| Reaction hole: | ||||||
| C | ENSG00000162066-MONOMER Functional annotation:CGI-14 protein | 0.998 | 9 | 1e-110 | 1.0 | 94.6 |
| D | ENSG00000119125-MONOMER Functional annotation: Guanine deaminase | 1.0e-5 | 4 | 0.85 | 4.0 | 19.9 |
| Reaction hole: aldose 1-epimerase | ||||||
| E | ENSG00000143891-MONOMER Functional annotation:AMBIGUOUS | 0.98 | 19 | 3e-74 | 1.58 | 81.9 |
| F | ENSG00000117308-MONOMER Functional annotation:UDP-glucose 4-epimerase | 0.93 | 4 | 1e-100 | 1.0 | 58.3 |
Comparison of known essential human nutrients with corresponding biosynthetic pathways in MetaCyc and in HumanCyc
| Essential nutrient in humans | Biosynthetic pathway in MetaCyc? | Biosynthetic pathway inferred in humans? |
| Amino acids | ||
| Arginine | Y | N |
| Histidine | Y | N |
| Isoleucine | Y | N |
| Leucine | Y | N |
| Lysine | Y | N |
| Methionine | Y | N |
| Phenylalanine | Y | N |
| Threonine | Y | N |
| Valine | Y | N |
| Vitamins | ||
| Ascorbic acid (Vitamin C) | Y | N |
| Biotin (Vitamin H) | Y | N |
| Folic acid (Vitamin M) | Y | N |
| Niacin (Vitamin B3) | N | N |
| Pantothenic acid | Y | Y |
| Pyridoxine (Vitamin B6) | N | N |
| Riboflavin (Vitamin B2) | Y | N |
| Thiamine (Vitamin B1) | Y | N |
| Cobalamin (Vitamin B12) | Y | N |
| Retinol (Vitamin A) | N | N |
| Vitamin D | N | N |
| Tocopherol (Vitamin E) | N | N |
| Vitamin K | N | N |
Note that a pathway cannot be predicted in HumanCyc if it does not exist in MetaCyc.
Pathways (including superpathways) that are common to human, bacteria and plant PGDBs
| Class | Subclass | Pathway |
| Biosynthesis | Polyamines | Betaine biosynthesis |
| Polyamine biosynthesis | ||
| Fatty acids and lipids | Phospholipid biosynthesis | |
| Fatty acid biosynthesis - initial steps | ||
| Fatty acid elongation - saturated | ||
| Cofactors, prosthetic groups, electron carriers | Pyridine nucleotide biosynthesis | |
| Thioredoxin pathway | ||
| Glutathione biosynthesis | ||
| Pantothenate and coenzyme A biosynthesis | ||
| Pyridoxal 5'-phosphate salvage pathway | ||
| Polyisoprenoid biosynthesis | ||
| FormylTHF biosynthesis | ||
| Cell structures | Colanic acid building blocks biosynthesis | |
| GDP-mannose metabolism | ||
| Carbohydrates | Gluconeogenesis | |
| Trehalose degradation - low osmolarity | ||
| Aminoacyl-tRNAs | tRNA charging pathway | |
| Amino acid biosynthesis | Proline biosynthesis I | |
| Glycine biosynthesis I | ||
| Serine biosynthesis | ||
| Degradation | Sugars and polysaccharides | Glucose 1-phosphate metabolism |
| Galactose metabolism | ||
| Trehalose degradation - low osmolarity | ||
| Glycogen degradation | ||
| Ribose degradation | ||
| Other degradation | Removal of superoxide radicals | |
| Nucleosides and nucleotides | (Deoxy)ribose phosphate metabolism | |
| Fatty acids | Fatty acid oxidation pathway | |
| Carboxylates, other | Acetate degradation | |
| Amino acids, amines | L-serine degradation | |
| Alcohols | Glycerol metabolism | |
| Energy metabolism | Pyruvate dehydrogenase | |
| TCA cycle - aerobic respiration | ||
| Glycolysis | ||
| Oxidative branch of the pentose phosphate pathway | ||
| Nonoxidative branch of the pentose phosphate pathway |
HumanCyc, H. sapiens; EcoCyc, E. coli; AraCyc, A. thaliana. The pathways in the table are included in all three PGDBs.
Figure 4Numbers of pathways, including superpathways, shared by the three PGDBs HumanCyc (H. sapiens), EcoCyc (E. coli), and AraCyc (A. thaliana). The numbers outside the circles represent the total number of pathways in the corresponding PGDB. The numbers inside the intersecting areas represent the number of pathways that fall into each area. For example, there are 55 pathways in common between HumanCyc and EcoCyc (20 + 35). AraCyc contains 177 total pathways: 76 that are unique to A. thaliana, and 101 that are shared with other organisms.
Information extracted from different data sources
| Data source (version) | Information extracted (for each gene or locus) | Number of genes | |
| Obtained | Nonredundant | ||
| Ensembl (Build 31) | Gene name, chromosome or contig, start and end positions, strand (transcription direction), exons, gene-product (including function name(s) or description(s), synonyms and EC number(s)), cross references (IDs) to other databases (SwissProt, HUGO, PDB, GO, RefSeq, OMIM, Entrez, SPTREMBL, EMBL, LocusLink). | 24,847 | |
| LocusLink (03/29/2003) | Gene name, chromosome, gene product (function name or description), function synonyms, EC number(s), gene and protein comments, cross references (IDs) to other databases (Entrez, UCSC Genome, RefSeq, GO, OMIM, UniGene, PubMed) | 18,880 | 3,936 |
| GenBank NC_001807 (mitochondrion) | Gene name, start and end positions, transcription direction, gene product (function name or description) | 35 | |
Functional information in Ensembl had to be extensively parsed to extract multiple functions, EC numbers, and/or synonyms. The 'nonredundant' column shows the number of genes from LocusLink that had no corresponding gene in the other two data sources (Ensembl and GenBank).