| Literature DB >> 16246909 |
Peter D Karp1, Christos A Ouzounis, Caroline Moore-Kochlacs, Leon Goldovsky, Pallavi Kaipa, Dag Ahrén, Sophia Tsoka, Nikos Darzentas, Victor Kunin, Núria López-Bigas.
Abstract
The BioCyc database collection is a set of 160 pathway/genome databases (PGDBs) for most eukaryotic and prokaryotic species whose genomes have been completely sequenced to date. Each PGDB in the BioCyc collection describes the genome and predicted metabolic network of a single organism, inferred from the MetaCyc database, which is a reference source on metabolic pathways from multiple organisms. In addition, each bacterial PGDB includes predicted operons for the corresponding species. The BioCyc collection provides a unique resource for computational systems biology, namely global and comparative analyses of genomes and metabolic networks, and a supplement to the BioCyc resource of curated PGDBs. The Omics viewer available through the BioCyc website allows scientists to visualize combinations of gene expression, proteomics and metabolomics data on the metabolic maps of these organisms. This paper discusses the computational methodology by which the BioCyc collection has been expanded, and presents an aggregate analysis of the collection that includes the range of number of pathways present in these organisms, and the most frequently observed pathways. We seek scientists to adopt and curate individual PGDBs within the BioCyc collection. Only by harnessing the expertise of many scientists we can hope to produce biological databases, which accurately reflect the depth and breadth of knowledge that the biomedical research community is producing.Entities:
Mesh:
Year: 2005 PMID: 16246909 PMCID: PMC1266070 DOI: 10.1093/nar/gki892
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
A list of the 16 species not available in CMR
| Genus and species name | Strain name | Coverage (%) | Status |
|---|---|---|---|
| PCC 7120 | >90 | UniProt | |
| Anopheles gambiae | PEST | 39 | Excluded |
| Ashbya gossypii | na | >90 | Deleted |
| Caenorhabditis briggsae | na | 0 | Excluded |
| Caenorhabditis elegans | na | 73 | Excluded |
| Cyanidioschyzon merolae | 10D | 0 | Excluded |
| Drosophila melanogaster | na | >90 | UniProt |
| Encephalitozoon cuniculi | na | >90 | UniProt |
| Leptospira interrogans | L1-130 | >90 | UniProt |
| Listeria monocytogenes | F2365 | >90 | UniProt |
| L.monocytogenes | F6854 | 40 | Excluded |
| L.monocytogenes | H7858 | 48 | Excluded |
| Mus musculus | na | 59 | Excluded |
| Nanoarchaeum equitans | Kin4-M | 78 | Excluded |
| Neurospora crassa | na | >90 | UniProt |
| Schizosaccharomyces pombe | na | >90 | UniProt |
Columns: genus/species name, strain name, coverage in UniProt and status in BioCyc (Tier 3 or exclusion). na (for strain name): non-applicable; >90% coverage in UniProt, included (7 cases).
aA.gossypii poor in annotations, despite high coverage in UniProt, deleted (1 case); ≤90% coverage in UniProt, excluded (8 cases).
Figure 1Distribution of BioCyc pathways across species. (a) Frequency analysis: the x-axis shows the number of detected pathways and the y-axis the number of species containing those pathways. (b) Completeness analysis: the x-axis shows the percentage of pathway completeness and the y-axis the frequency of pathways with the corresponding degree of completeness—more than 60% of pathways are more than 50% complete in the BioCyc collection of PGDBs.
The 30 pathways that occur most frequently across the BioCyc databases, and their frequency (f) of occurrence
| PATHWAY-UNIQUE ID | Pathway description | f |
|---|---|---|
| PWY0-162 | 153 | |
| GLYCOLYSIS | Glycolysis I | 152 |
| TRNA-CHARGING-PWY | tRNA charging pathway | 152 |
| DENOVOPURINE2-PWY | Purine nucleotides | 152 |
| PWY0-166 | 151 | |
| PHOSLIPSYN-PWY | Phospholipid biosynthesis I | 150 |
| GLYSYN-PWY | Glycine biosynthesis I | 149 |
| HEMESYN2-PWY | Biosynthesis of proto- and sirohaeme | 146 |
| P1-PWY | Salvage pathways of purine and pyrimidine nucleotides | 144 |
| FASYN-INITIAL-PWY | Fatty acid biosynthesis—initial steps | 144 |
| 1CMET2-PWY | formylTHF biosynthesis | 144 |
| PWY0-163 | Salvage pathways of pyrimidine ribonucleotides | 142 |
| P106-PWY | Serine-isocitrate lyase pathway | 142 |
| THIOREDOX-PWY | Thioredoxin pathway | 140 |
| ARO-PWY | Chorismate biosynthesis | 139 |
| P124-PWY | Glucose fermentation to lactate II | 139 |
| CALVIN-PWY | Calvin cycle | 136 |
| FOLSYN-PWY | Tetrahydrofolate biosynthesis | 136 |
| PWY0-662 | PRPP biosynthesis I | 136 |
| RIBOKIN-PWY | Ribose degradation | 133 |
| TRPSYN-PWY | Tryptophan biosynthesis | 133 |
| PEPTIDOGLYCANSYN-PWY | Peptidoglycan biosynthesis | 133 |
| PWY0-901 | Selenocysteine biosynthesis | 132 |
| FASYN-ELONG-PWY | Fatty acid elongation—saturated | 131 |
| PWY-841 | Purine nucleotides | 131 |
| RIBOSYN2-PWY | Riboflavin and FMN and FAD biosynthesis | 130 |
| P61-PWY | UDP-glucose conversion | 130 |
| DAPLYSINESYN-PWY | Lysine biosynthesis I | 130 |
| ILEUSYN-PWY | Isoleucine biosynthesis I | 130 |
| ACETATEUTIL-PWY | Acetate utilization | 129 |
Figure 2Relationship between number of pathways (x-axis) and number of genes (y-axis) for all species in the BioCyc collection. Bacterial species are shown in light grey, archaeal species in open-grey squares and eukaryotes in black. The fitted line—a linear regression curve—refers to Bacteria only; most Archaea exhibit a similar relationship. The two outlier bacterial species with fewer than 25 pathways can be seen on the left part of the graph: Mycobacterium avium paratuberculosis and Ralstonia solanacearum GMI1000. The three largest eukaryotic genomes with >10 000 genes show a significant underrepresentation of pathways for their genome size.