Literature DB >> 16381923

MetaCyc: a multiorganism database of metabolic pathways and enzymes.

Ron Caspi1, Hartmut Foerster, Carol A Fulcher, Rebecca Hopkinson, John Ingraham, Pallavi Kaipa, Markus Krummenacker, Suzanne Paley, John Pick, Seung Y Rhee, Christophe Tissier, Peifen Zhang, Peter D Karp.   

Abstract

MetaCyc is a database of metabolic pathways and enzymes located at http://MetaCyc.org/. Its goal is to serve as a metabolic encyclopedia, containing a collection of non-redundant pathways central to small molecule metabolism, which have been reported in the experimental literature. Most of the pathways in MetaCyc occur in microorganisms and plants, although animal pathways are also represented. MetaCyc contains metabolic pathways, enzymatic reactions, enzymes, chemical compounds, genes and review-level comments. Enzyme information includes substrate specificity, kinetic properties, activators, inhibitors, cofactor requirements and links to sequence and structure databases. Data are curated from the primary literature by curators with expertise in biochemistry and molecular biology. MetaCyc serves as a readily accessible comprehensive resource on microbial and plant pathways for genome analysis, basic research, education, metabolic engineering and systems biology. Querying, visualization and curation of the database is supported by SRI's Pathway Tools software. The PathoLogic component of Pathway Tools is used in conjunction with MetaCyc to predict the metabolic network of an organism from its annotated genome. SRI and the European Bioinformatics Institute employed this tool to create pathway/genome databases (PGDBs) for 165 organisms, available at the BioCyc.org website. These PGDBs also include predicted operons and pathway hole fillers.

Entities:  

Mesh:

Substances:

Year:  2006        PMID: 16381923      PMCID: PMC1347490          DOI: 10.1093/nar/gkj128

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

MetaCyc is a reference database of small molecule metabolism that contains experimentally verified pathway and enzyme information curated from the scientific literature (1). A metabolic pathway in MetaCyc consists of reactions, enzymes, metabolites, information on feedback regulation and genes that encode the enzymes for each species (Figure 1). The current version of MetaCyc (9.5) contains 621 pathways from >500 species (Tables 1 and 2) ranging from microbes to plants and humans, with >90% of the information curated from >7300 research articles. MetaCyc can be searched and browsed using a web browser. Pathways are dynamically generated from the database and graphically displayed with hyperlinks to various pages detailing reactions, enzymes, genes and compounds from MetaCyc, as well as external databases such as Swiss-Prot and PubMed. It, therefore, serves as a readily accessible source of up-to-date, literature-curated information on metabolic pathways and enzymes to researchers for use in basic research and genome analysis, and to students and teachers for educational purposes. In addition, MetaCyc, in conjunction with the Pathway Tools software (2), can be used to predict metabolic networks from a list of annotated sequences resulting from genome or transcript sequencing (3–5). Those predicted networks can provide a knowledge framework onto which reaction flux models can be built.
Figure 1

A representative example of a pathway in MetaCyc. Pathways can be displayed at varying levels of detail. This pathway display depicts an intermediate level of detail including enzymes, EC numbers, genes and chemical structures of the main compounds. Notice the brown arrows that provide hyperlinks to related upstream and downstream pathways.

Table 1

List of species that have five or more experimentally elucidated pathways represented in MetaCyc

BacteriaNo. of pathwaysEukaryaNo. of pathwaysArchaeaNo. of pathways
Escherichia coli179Arabidopsis thaliana116Sulfolobus solfataricus15
Pseudomonas putida35Homo sapiens42Methanocaldococcus jannaschii5
Bacillus subtilis24Glycine max36
Pseudomonas aeruginosa19Saccharomyces cerevisiae31
Mycobacterium tuberculosis17Pisum sativum20
Salmonella typhimurium14Zea mays19
Haemophilus influenzae13Solanum tuberosum12
Deinococcus radiodurans12Cicer arietinum12
Mycoplasma pneuomiae8Oryza sativa12
Mycobacterium smegmatis8Spinacia oleraca11
Sinorhizobium meliloti8Rattus norvegicus10
Thauera aromatica7Hordeum vulgare8
Klebsiella pneumoniae7Nicotiana tabacum8
Lactococcus lactis6Lycopersicon esculentum7
Corynebacterium glutamicum6Medicago sativa7
Methylobacterium extorquens AM16Brassica napus6
Pseudomonas fluorescens6Glycyrrhiza echinata6
Thermotoga maritima6Triticum aestivum6
Paracoccus denitrificans5Cucumis sativus5
Arthrobacter globiformis5Ricinus communis5
Mycoplasma capricolum5Pueraria montana5
Bradyrhizobium japonicum5
Bacillus cereus5

The species are grouped by taxonomic domain and are ordered within each domain based on the number of pathways to which the given species was assigned. Some pathways may be labeled with a higher-level taxon, such as genus, if all the species within that genus are thought to have the given pathway. However, such higher-level taxa are not included in this table.

Table 2

The distribution of pathways in MetaCyc based on the taxonomic classification of associated species

BacteriaNo. of pathwaysEukaryaNo. of pathwaysArchaeaNo. of pathways
Proteobacteria598Viridiplantae396Euryarchaeota47
Firmicutes169Metazoa61Crenarchaeota26
Actinobacteria94Fungi58
Deinococcus-Thermus18Euglenozoa4
Cyanobacteria13
Thermotogae8
Planctomycetes6
Bacteroidetes/Chlorobi6
Nitrospirae5
Aquificae2
Chloroflexi2
Spirochaetes1
Chlamydiae1
Chrysiogenetes1

The taxonomic groups (phyla for Bacteria and Archaea, kingdoms for Eukarya) are grouped by domain and are ordered within each domain based on the number of pathways associated with the taxon. Euglenozoa are listed separately as this group does not belong to any of the other eukaryotic kingdoms. A pathway may be associated with multiple organisms.

A number of major improvements made in the last 2 years are described in this article. There has been a significant increase in the content of the database, covering both primary metabolism and less common pathways, such as microbial degradation of environmental pollutants and plant secondary metabolism. Other improvements include reorganization and expansion of both the pathway and cellular component ontologies, and enhancement of the Web pages providing background information about MetaCyc, such as the user's guide at URL . For a thorough discussion of the major differences between MetaCyc and other pathway databases please see .

DATA CONTENT

As demonstrated in Table 3, there has been a significant increase in the number of database objects since the last Nucleic Acids Research publication 2 years ago (1). The number of metabolic pathways has increased by 26% from 491 to 621, while the number of enzymes, genes and citations has grown considerably more, by 75, 71 and 140%, respectively, owing to the fact that many existing pathways have been extensively edited and updated with comments, enzymes, genes and citations. There has also been a 128% increase in the number of organisms represented (currently at 506), reflecting the breadth of MetaCyc (Tables 1 and 2), and a 57% increase in the number of chemical compounds (currently 4620).
Table 3

The size of MetaCyc as a function of time from its first release in 1999 to the latest release in 2005 (version 9.5)

Database objects1999200020012002200320042005
Metabolic pathways296366445460491528621
Metabolic pathways with comments3983160180232280412
Enzymatic reactions3779400242184294481749555428
Enzymes8234411151267154319402698
Enzymes with comments7523410541123138917162376
Genes000600155418212662
Compounds1949218023352404295135514620
Literature citations18460423812718307050507368

Each row depicts the number of different database objects in MetaCyc during the final release for that year.

Data in MetaCyc are curated from the experimental literature, which is read and summarized by PhD-level curators. Curators at SRI cover microbial and animal pathways, while curators at Carnegie Institution cover pathways from higher plants. MetaCyc contains the full complement of EcoCyc metabolic pathways (6) and thus has most of the basic pathways of central and intermediary metabolism typical of enteric bacteria. Similarly, most of the pathways of central metabolism in higher plants are present in the database. Our current curation strategy focuses on both depth and breadth, by updating pathways that lack complete information, and adding new pathways outside of central metabolism. The MetaCyc Curator's Guide for Pathway/Genome Databases, located at URL , was developed to ensure the consistency of curation procedures. It documents the type of information that should be captured for each pathway, reaction, enzyme, gene and chemical compound. It also describes stylistic conventions. We recently revised it to explicitly define the organization of metabolic pathways in MetaCyc, and to provide detailed guidelines for defining pathway boundaries (the compounds with which a pathway should begin and end) and for defining links between pathways. To ensure the accuracy and coverage of new and existing pathways, we are currently in the process of inviting outside experts to support the database as editors and/or curators in their fields of expertise.

Pathways, enzymes, reactions and compounds

Most of the pathways in MetaCyc occur in the microorganism and plant kingdoms, a manifestation of their metabolic diversity. Nevertheless, animal pathways are also represented. Since November 2003 we added >140 new pathways in the areas of small molecule intermediary metabolism, and the biosynthesis and degradation of natural environmental compounds, environmental pollutants, xenobiotics and compounds involved in general cellular processes and secondary metabolism. Of these 63 are plant pathways, many of which concern the production of compounds involved in cellular regulation processes (phytohormones) and defense mechanisms (phytoalexins and phytoanticipins), and plant secondary (or specialized) metabolites. To accommodate these new types of pathways, we expanded the pathway ontology to include categories for plant secondary metabolism, comprising 8 main classes and 26 subclasses (see ). Selection criteria for the curation of these plant pathways include generality of occurrence across taxa, investigation in a model species like Arabidopsis and agronomic or medicinal importance. Arabidopsis pathways are exported to AraCyc, the Arabidopsis metabolism database (5) that was computationally predicted using MetaCyc as the reference database. In mammalian metabolism we added multiple new pathways, including those describing human neurotransmitter biosynthesis (in collaboration with experimentalists in this field), drug metabolism, cholesterol biosynthesis, arsenate detoxification and glutathione metabolism. These pathways were either curated within MetaCyc, or propagated from the HumanCyc database (3). In addition, several existing pathways of intermediary metabolism were curated with rat enzymes and genes. In parallel with curating new pathways, we extensively edited previously existing pathways. Approximately 60 microbial and 7 plant pathways have been updated and enhanced since 2003. One of our highest priorities is the curation of existing pathways that are in need of updating with enzymes, EC numbers, genes and comments with literature citations. While adding new pathways or revising existing pathways, we are also expanding coverage of pathway variants found in different organisms. In addition to curation within MetaCyc, we continue to import pathways from other databases. At each quarterly release we propagate newly curated pathways from EcoCyc (6) and HumanCyc (3) into MetaCyc. We encourage outside curators of our BioCyc family of PGDBs to submit curated pathways to us for possible inclusion in MetaCyc. For example, we incorporated several new yeast pathways in collaboration with curators from the Saccharomyces Genome Database (SGD) (4). We minimize redundancy by associating several representative species with a pathway that is shared among them. We are in the process of refining the database by deleting pathways that are deemed redundant, dividing large pathways that contain overlapping sections into separate, smaller pathways, and assembling small, related pathways into superpathways, to give an overview of metabolic interrelationships. A new pathway evidence code, EV-EXP-TAS (evidence-experimental-traceable author statement) was created to allow curators to cite review articles containing direct references to the primary literature in support of the pathways. We found this type of reference to be the most useful for large, complex pathways. We are increasingly enhancing the quality of enzyme information in MetaCyc by adding more kinetic data, including Km values and optimal pH and temperature, and listing enzyme regulators, including activators, inhibitors, cofactors and alternative substrates. We have revised and extended the categories for enzyme regulators based on their kinetics. Activators are now classified as allosteric, nonallosteric or of unknown mechanism, and inhibitors are classified as competitive, noncompetitive, uncompetitive, allosteric, irreversible, none of the preceding or of unknown mechanism. Definitions of these types of regulators are provided in Appendix I of (7). It should be noted that while kinetic data are present for most enzymes curated within the last 2 years, there are still many enzymes in the database which lack kinetic data, either because they were curated earlier and have not yet been updated, or because no such data are available in the literature. We introduced hyperlinks within pathway and enzyme comments, which can link commentary text to any data in the database. This dynamic linking greatly improves the ability of users to navigate between related database objects. Our chemical library has grown significantly in the past 2 years, from 2951 to 4620 compounds. Currently, over 93% of our compounds have structures. In addition, last year we began adding stereochemistry representations to the structures.

Taxonomies

We significantly enhanced the cellular component ontology to annotate the subcellular locations of enzymes and to encode the cellular compartments involved in transport reactions. The ontology, which has been described recently (8), currently comprises 160 terms. More about the enhanced cell component ontology can be found at . MetaCyc is routinely updated with the latest data from the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (NC-IUBMB), which includes new and modified EC numbers. The last supplement to have been incorporated is supplement 10 ().

Links to other databases

While MetaCyc (unlike organism-specific PGDBs) does not contain sequence information, we use extensive linking to external amino acid and nucleotide sequence databases. Whenever possible, enzyme and gene entries include links to Swiss-Prot (9) and the Entrez Nucleotide and Gene databases. Arabidopsis genes are linked to TAIR (The Arabidopsis Information Resource). Enzymes are often linked to protein structure databases such as PDB (10) when applicable. In addition, whenever possible, literature references are linked to PubMed.

ENHANCEMENTS TO THE PATHWAY TOOLS SOFTWARE

The Pathway Tools software provides query and visualization services to users and editing functions to curators (2,11). Recent enhancements to the software that are relevant to MetaCyc users include the following: Improved displays: The pathway display algorithms have been modified to produce more compact pathway diagrams that are more likely to fit within a single page. Enzyme/gene naming: Protein and gene names within a pathway display are now labeled with the initials of an organism's genus and species name (e.g. an Escherichia coli enzyme and gene are written as ‘acetylornithine decarboxylase (Ec)’ and ‘Ec-argD’, respectively). This notation aids in identifying individual proteins and genes from multiple organisms that are assigned to the same pathway. Chemical drawing tools: The Pathway Tools software now includes interfaces for both the Marvin () and JME () chemical drawing editors, permitting the user to enter or modify chemical structures within MetaCyc or other PGDBs. A new suite of comparative genomics tools is available in Pathway Tools in conjunction with the preceding expansion of the BioCyc database collection (see below). These tools include comparisons of the full pathway, reaction and metabolite sets present in a specified group of organisms; comparisons of genes associated with a single pathway or a single reaction across a specified group of organisms, including the operon distributions of those genes; and a comparative genome browser for visualizing chromosomal regions around a specified set of orthologous genes.

Expansion of BioCyc

A major application of MetaCyc is its use for predicting the metabolic pathways of an organism from its sequenced genome (12–14), using the PathoLogic program (15). We recently automated PathoLogic, allowing it to be applied to large numbers of genomes. Jointly with the European Bioinformatics Institute, we used this feature to expand our BioCyc collection of PGDBs to >200 organism-specific databases, each of which contains the predicted metabolic pathways of the organism, based on its annotated genome (16). These PGDBs are available for adoption and ongoing curation by the scientific community. Each PGDB contains the genome of the organism, which is accessible using the new Pathway Tools genome browser (), predicted operons (only for bacteria) (17) and predicted metabolic pathways (15). In addition, since predicted pathways often contain pathway holes (reactions in a predicted metabolic pathway for which no enzymes have been identified in the sequenced genome), we applied our pathway-hole filler algorithm to all the BioCyc PGDBs. This algorithm searches the sequenced genome and identifies candidate genes for these missing enzymes (18). All the PGDBs in the BioCyc collection can be used to analyze gene and protein expression data using the Omics Viewer, a Pathway Tools feature that allows expression data and metabolomics data to be painted onto the full metabolic network of an organism.

DATABASE AND SOFTWARE AVAILABILITY

MetaCyc is freely available via the Web at (updated four times a year). It is also available for download, free of charge to non-profit organizations or for a fee to commercial institutions, as a stand-alone application program for Linux, Windows and Solaris workstations (updated two times a year). A set of flat data files that is updated four times a year is also available online at .
  17 in total

1.  The Protein Data Bank.

Authors:  H M Berman; J Westbrook; Z Feng; G Gilliland; T N Bhat; H Weissig; I N Shindyalov; P E Bourne
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  Pathway databases: a case study in computational symbolic theories.

Authors:  P D Karp
Journal:  Science       Date:  2001-09-14       Impact factor: 47.728

3.  The genome of the natural genetic engineer Agrobacterium tumefaciens C58.

Authors:  D W Wood; J C Setubal; R Kaul; D E Monks; J P Kitajima; V K Okura; Y Zhou; L Chen; G E Wood; N F Almeida; L Woo; Y Chen; I T Paulsen; J A Eisen; P D Karp; D Bovee; P Chapman; J Clendenning; G Deatherage; W Gillet; C Grant; T Kutyavin; R Levy; M J Li; E McClelland; A Palmieri; C Raymond; G Rouse; C Saenphimmachak; Z Wu; P Romero; D Gordon; S Zhang; H Yoo; Y Tao; P Biddle; M Jung; W Krespan; M Perry; B Gordon-Kamm; L Liao; S Kim; C Hendrick; Z Y Zhao; M Dolan; F Chumley; S V Tingey; J F Tomb; M P Gordon; M V Olson; E W Nester
Journal:  Science       Date:  2001-12-14       Impact factor: 47.728

4.  The Pathway Tools software.

Authors:  Peter D Karp; Suzanne Paley; Pedro Romero
Journal:  Bioinformatics       Date:  2002       Impact factor: 6.937

5.  MetaCyc: a multiorganism database of metabolic pathways and enzymes.

Authors:  Cynthia J Krieger; Peifen Zhang; Lukas A Mueller; Alfred Wang; Suzanne Paley; Martha Arnaud; John Pick; Seung Y Rhee; Peter D Karp
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

6.  The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003.

Authors:  Brigitte Boeckmann; Amos Bairoch; Rolf Apweiler; Marie-Claude Blatter; Anne Estreicher; Elisabeth Gasteiger; Maria J Martin; Karine Michoud; Claire O'Donovan; Isabelle Phan; Sandrine Pilbout; Michel Schneider
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

7.  Evaluation of computational metabolic-pathway predictions for Helicobacter pylori.

Authors:  Suzanne M Paley; Peter D Karp
Journal:  Bioinformatics       Date:  2002-05       Impact factor: 6.937

8.  Saccharomyces Genome Database (SGD) provides biochemical and structural information for budding yeast proteins.

Authors:  Shuai Weng; Qing Dong; Rama Balakrishnan; Karen Christie; Maria Costanzo; Kara Dolinski; Selina S Dwight; Stacia Engel; Dianna G Fisk; Eurie Hong; Laurie Issel-Tarver; Anand Sethuraman; Chandra Theesfeld; Rey Andrada; Gail Binkley; Christopher Lane; Mark Schroeder; David Botstein; J Michael Cherry
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

9.  AraCyc: a biochemical pathway database for Arabidopsis.

Authors:  Lukas A Mueller; Peifen Zhang; Seung Y Rhee
Journal:  Plant Physiol       Date:  2003-06       Impact factor: 8.340

10.  Expansion of the BioCyc collection of pathway/genome databases to 160 genomes.

Authors:  Peter D Karp; Christos A Ouzounis; Caroline Moore-Kochlacs; Leon Goldovsky; Pallavi Kaipa; Dag Ahrén; Sophia Tsoka; Nikos Darzentas; Victor Kunin; Núria López-Bigas
Journal:  Nucleic Acids Res       Date:  2005-10-24       Impact factor: 16.971

View more
  118 in total

Review 1.  Integration of metabolic reactions and gene regulation.

Authors:  Chen-Hsiang Yeang
Journal:  Mol Biotechnol       Date:  2011-01       Impact factor: 2.695

Review 2.  In Silico Constraint-Based Strain Optimization Methods: the Quest for Optimal Cell Factories.

Authors:  Paulo Maia; Miguel Rocha; Isabel Rocha
Journal:  Microbiol Mol Biol Rev       Date:  2015-11-25       Impact factor: 11.056

3.  Multi-omic meta-analysis identifies functional signatures of airway microbiome in chronic obstructive pulmonary disease.

Authors:  Zhang Wang; Yuqiong Yang; Zhengzheng Yan; Haiyue Liu; Boxuan Chen; Zhenyu Liang; Fengyan Wang; Bruce E Miller; Ruth Tal-Singer; Xinzhu Yi; Jintian Li; Martin R Stampfli; Hongwei Zhou; Christopher E Brightling; James R Brown; Martin Wu; Rongchang Chen; Wensheng Shu
Journal:  ISME J       Date:  2020-07-27       Impact factor: 10.302

4.  Conservation of the metabolomic response to starvation across two divergent microbes.

Authors:  Matthew J Brauer; Jie Yuan; Bryson D Bennett; Wenyun Lu; Elizabeth Kimball; David Botstein; Joshua D Rabinowitz
Journal:  Proc Natl Acad Sci U S A       Date:  2006-12-11       Impact factor: 11.205

Review 5.  New metrics for comparative genomics.

Authors:  Michael Y Galperin; Eugene Kolker
Journal:  Curr Opin Biotechnol       Date:  2006-09-15       Impact factor: 9.740

6.  Parallel genomic evolution and metabolic interdependence in an ancient symbiosis.

Authors:  John P McCutcheon; Nancy A Moran
Journal:  Proc Natl Acad Sci U S A       Date:  2007-11-28       Impact factor: 11.205

Review 7.  Comparative genomic reconstruction of transcriptional regulatory networks in bacteria.

Authors:  Dmitry A Rodionov
Journal:  Chem Rev       Date:  2007-07-18       Impact factor: 60.622

8.  Complete genome sequence of the chemolithoautotrophic marine magnetotactic coccus strain MC-1.

Authors:  Sabrina Schübbe; Timothy J Williams; Gary Xie; Hajnalka E Kiss; Thomas S Brettin; Diego Martinez; Christian A Ross; Dirk Schüler; B Lea Cox; Kenneth H Nealson; Dennis A Bazylinski
Journal:  Appl Environ Microbiol       Date:  2009-05-22       Impact factor: 4.792

9.  Predicted functions and linkage specificities of the products of the Streptococcus pneumoniae capsular biosynthetic loci.

Authors:  David M Aanensen; Angeliki Mavroidi; Stephen D Bentley; Peter R Reeves; Brian G Spratt
Journal:  J Bacteriol       Date:  2007-08-31       Impact factor: 3.490

10.  The molecular basis of shoot responses of maize seedlings to Trichoderma harzianum T22 inoculation of the root: a proteomic approach.

Authors:  Michal Shoresh; Gary E Harman
Journal:  Plant Physiol       Date:  2008-06-18       Impact factor: 8.340

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.