Literature DB >> 17933764

MetaCrop: a detailed database of crop plant metabolism.

Eva Grafahrend-Belau¹, Stephan Weise, Dirk Koschützki, Uwe Scholz, Björn H Junker, Falk Schreiber.

Abstract

MetaCrop is a manually curated repository of high quality information concerning the metabolism of crop plants. This includes pathway diagrams, reactions, locations, transport processes, reaction kinetics, taxonomy and literature. MetaCrop provides detailed information on six major crop plants with high agronomical importance and initial information about several other plants. The web interface supports an easy exploration of the information from overview pathways to single reactions and therefore helps users to understand the metabolism of crop plants. It also allows model creation and automatic data export for detailed models of metabolic pathways therefore supporting systems biology approaches. The MetaCrop database is accessible at http://metacrop.ipk-gatersleben.de.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2007 PMID： 17933764 PMCID： PMC2238923 DOI： 10.1093/nar/gkm835

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Crop plants are the major source of human nutrition and important contributors to chemical feedstocks and renewable fuels (1–3). An in-depth understanding of the plant's metabolism is helpful for the improvement of their growth and yield (4,5). Data requirements in metabolic research are quite diverse: while some experts are interested in a qualitative global view of metabolism, others need detailed information about single reactions. Additionally, researchers investigating metabolism often have to rely on databases with unclear data quality resulting from genome-based metabolic network predictions. The situation in crop plant research is furthermore complicated by the fact that only one crop plant (Oryza sativa, rice) has been sequenced so far (6,7). An example that requires detailed metabolic information is the generation of models to quantitatively simulate complex biochemical networks, an area which is of increasing interest in systems biology. While repositories for such models exist, the collection of information necessary for model creation remains a time-consuming manual task and only very few models for crop plants exist at all. Here we present MetaCrop, a database that contains manually curated, highly detailed information about metabolic pathways in crop plants, including location information, transport processes and reaction kinetics. The web interface supports the exploration of the information from overview pathways to single reactions, data export and the creation of detailed models of metabolic pathways. With these features MetaCrop supports crop plant research in several ways: it improves the understanding of the metabolism, especially if one wants to get both a general overview and specific details for selected pathways. It allows the usage of the crop plant specific information in other tools, for example, to investigate experimental data in the network context. And it helps in creating models of metabolic processes for simulation approaches and in silico experiments.

DATABASE DESCRIPTION

Content

MetaCrop contains hand-curated information of about 40 major metabolic pathways in various crop plants with special emphasis on the metabolism of agronomically important organs such as seed and tuber. Species of both monocotyledons and dicotyledons are represented. Reactions incorporate information about involved enzymes (e.g. EC and CAS number), metabolites (e.g. CAS number, molecular weight and chemical formula), stoichiometry and detailed location (species, organ, tissue, compartment and developmental stage). Furthermore, for central metabolism (sucrose breakdown, glycolysis, TCA cycle) kinetic data is available for the reactions. References and relevant PubMed IDs are given. In order to have a controlled vocabulary allowing the comparison of data from different sources ontology terms were used (8,9). Currently the database focuses on the monocotyledon species Hordeum vulgare (barley), Triticum aestivum (wheat), Oryza sativa (rice), Zea mays (maize) and the dicotyledon species Solanum tuberosum (potato) and Brassica napus (canola). Additional data of other crop and non-crop plants is currently being added to the database. In total, about 400 enzymatic reactions, 60 transport processes, 5 compartments and 740 references are represented in MetaCrop (see Table 1, content as of July 2007). In order to enable the export of detailed metabolic networks for systems biology approaches, most of the data contained in the database corresponds to biochemical data (e.g. taxon-specific enzymatic information). In the case of missing biochemical information, proteomic information and genetic information, respectively, is represented for a given enzymatic reaction or transport process.

Table 1.

Information contained in MetaCrop

	Hordeum vulgare	Triticum aestivum	Oryza sativa	Zea mays	Solanum tuberosum	Brassica napus	Total^a
Pathways	36	33	34	34	34	26	38
Enzymatic reactions	291	271	278	273	207	168	392
Transport processes	7	6	9	27	14	7	59
Compartments	4	4	4	3	3	3	5
References	382	347	340	346	252	204	734

aIncluding other plants; pathways, reactions and other information occurring in more than one plant are only listed once.

Information contained in MetaCrop aIncluding other plants; pathways, reactions and other information occurring in more than one plant are only listed once.

Web interface

The web interface of the database is accessible at http://metacrop.ipk-gatersleben.de. It allows detailed browsing and searching of data, user feedback and data export. Figure 1 shows some screenshots of the MetaCrop web interface starting with a complete pathway (sucrose breakdown in dicotyledon species including compartmentalization, transporters and isoenzymes) to detailed information about reaction kinetics. Additionally to searchable data tables, the user is guided by clickable image maps of the pathways. Entire pathways containing all available information on the respective reactions and metabolites can be downloaded in the standardized systems biology exchange format systems biology markup language (SBML) (10), which can be imported into modelling tools such as COPASI (11,12).

Figure 1.

Screenshots of the web interface of MetaCrop. (a) A pathway (sucrose breakdown in dicotyledon species, which shows compartmentalization, transporter and isoenzymes); (b) Information connected to pathways: conversion details (cytosolic phosphoglucose isomerase): stoichiometry, catalyst, metabolites, conversion location, subset of taxon-specific kinetic parameters (vmax, km) given for cytosolic phosphoglucose isomerase. The functionality of the web interface is documented in a tutorial available on the website. It is also possible to edit entries, extend the content of MetaCrop and create user-specific models. To ensure data quality, such changes cannot be done anonymously. Users interested in these functionalities are invited to obtain an editing account for MetaCrop. Changes performed by all accounts are logged and checked by curators to guarantee consistency and quality of the inserted data. The web interface is based on the Oracle Application Express technology.

Database implementation

MetaCrop uses the information system Meta-All (13) and is based on the database management system Oracle. The database schema comprises 51 relational tables and can be divided into several parts. The main parts are conversions, substances, pathways, locations, references and versioning. Conversions and substances are the central parts of the schema. A conversion is a reaction or a translocation, which is either active or passive. Substances comprise transporters, enzymes, metabolites and macromolecules. They take place in conversions and play certain roles, such as reactant or product, modulator, catalyst, etc. All necessary information, e.g. name, formula or kinetic data, can be stored together with conversions and substances. In order to distinguish data originating from different publications, each record can be enriched by reference information. The term location describes a combination of taxonomy, developmental stage and cytology of plants in order to distinguish where and when conversions take place. Therefore, controlled vocabulary is used. Additionally, the database schema supports parallel versioning of data records, e.g. in case of different opinions of experimentalists. Finally, pathways are combinations of conversions taking place at a certain location. The complete information represented in MetaCrop is also available as a dump of the database, i.e. the data is available for bulk download. The dump can easily be imported into a user's instance of the open source information system Meta-All (13), therefore enabling users to run their local version of the database.

CURATION, QUALITY ASSURANCE, COMPLETENESS AND CONTINUATION

All information was extracted manually through an extensive survey of primary literature and online databases. Literature-based information was derived from about 800 papers of plant biochemical and physiological journals as well as from respective textbooks (e.g. (14,15)). Furthermore, some of the information was manually extracted from online databases providing pathway-related information: KEGG PATHWAY ((16), http://www.genome.jp/kegg/pathway.html), EGENES ((17), http://www.genome.jp/kegg-bin/create_kegg_menu?category=plants_egenes), AraCyc ((18), http://www.arabidopsis.org/biocyc/index.jsp), MetaCyc ((19), http://metacyc.org/), RiceCyc (http://www.gramene.org/pathway/), Reactome ((20), http://www.reactome.org/); enzyme-related information: BRENDA ((21), http://www.brenda-enzymes.info/), ExPASy-ENZYME ((22), http://expasy.org/enzyme/); protein-related information: Swiss-Prot/TrEMBL ((23), http://www.expasy.org/sprot/); metabolite-related information: PubChem (http://pubchem.ncbi), KEGG LIGAND ((16), http://www.genome.jp/kegg/ligand.html); transporter-related information: ARAMEMNON ((24), http://aramemnon.botanik.uni-koeln.de/); kinetic information: BRENDA ((21), http://www.brenda-enzymes.info/)). For quality assurance information inferred from databases has been checked against literature. To enable the trace back of information and further reading, references and corresponding PubMed IDs are given where available. Controlled vocabulary (e.g. ontology terms from Plant Ontology (8) and Gene Ontology (9)) was used to ensure consistency and to allow the comparison of data from different sources. Currently MetaCrop contains most of the pathways of central metabolism in higher plants (e.g. metabolism of carbohydrates, amino acids, lipids, energy, cofactors and nucleotides). With respect to crop plant metabolism, special emphasis is laid on pathways of seed and tuber metabolism such as the sucrose breakdown pathway. While our current focus is on updating pathways with incomplete information, we plan to extend the information stored in MetaCrop to pathways of plant secondary metabolism. The extension of MetaCrop is primary done inhouse; however, registered users can edit entries and extend the content of MetaCrop and therefore may in the future also contribute to the extension of the database.

APPLICATION OF THE METACROP DATABASE

MetaCrop can be used for a wide variety of applications in crop plant research. It helps in understanding the metabolism at different levels of detail, it allows the use of crop plant specific information in other tools for further investigations, and it supports the creation of models of metabolism for simulation approaches. Two example applications are as follows:

Mathematical analysis of metabolic pathways

The in-depth mathematical analysis of a pathway of interest will generally consist of two main steps, which are (i) investigation of the structural properties and capabilities of the pathway with tools such as CellNetAnalyzer (25) and (ii) detailed analysis of the kinetic characteristics of the system with modelling and simulation tools such as COPASI (11). MetaCrop supports these processes at various steps. It contains all necessary information for structural pathway analysis, and for central metabolism also detailed kinetic data for kinetic pathway analysis. Furthermore, the above-mentioned tools are able to read the files exported from MetaCrop in the standardized SBML format (10). Once imported into these tools, the pathways can serve as a starting point for structural or kinetic metabolic models.

Investigation of -omics data in the context of metabolic networks

Network-related analysis of high-throughput data involves the mapping of experimental data onto related pathways and the investigation of this integrated data. Such functionality is provided by tools such as VANTED (26), a system for the visualization and analysis of networks with related experimental data. Data from large-scale biochemical experiments can be uploaded into the software and then mapped on a network that is either drawn with the tool itself or imported, for example, from a SBML file. VANTED enables users to present and analyse transcript, enzyme, proteomics and metabolite data in the context of underlying networks such as metabolic pathways from MetaCrop. Several analysis methods implemented in such software systems help in further investigation of the data.

DISCUSSION

MetaCrop contains comprehensive, original, high-quality data about crop plant metabolism. While most of the existing metabolic pathway databases do not contain any plant-specific information, there exist a few multi-organism databases such as MetaCyc (19), BRENDA (21) and KEGG (16) comprising information about plant metabolism. The transcriptome-based database EGENES (17) is a multi-species plant database, which currently consists of 25 plant species (release 41.0, January 2007). The database integrates plant genomic information (EST contigs) and pathway information (pathway maps derived from KEGG reference pathways), thus offering an overview of fundamental biological processes in plants. In addition to these multi-species databases, there exist a few species-specific crop plant databases such as the pathway/genome databases RiceCyc (http://www.gramene.org/pathway/) and SolCyc (http://www.sgn.cornell.edu/tools/solcyc/). However, most of these single- and multi-species databases only contain little or no hand-curated information due to genome- or EST-based pathway predictions or do not support model creation and model export in SBML. Furthermore, highly specific information such as kinetic data, compartment-specific information or transport processes are often lacking and most of the databases are limited to read-only access not allowing for user-specific interaction, editing and extending. Similar to MetaCrop the pathway databases AraCyc ((18), http://www.arabidopsis.org/biocyc/index.jsp) and MetNetDB ((27), http://www.metnetdb.org) contain detailed information about plant metabolism. AraCyc is a pathway/genome database that contains enzymes and pathways found in the model plant Arabidopsis (Arabidopsis thaliana). The Metabolic Networking Data Base (MetNetDB) contains information on metabolic and regulatory networks in Arabidopsis, which are derived from a combination of online databases and input from biologists in their area of expertise. Both databases are under continued curation and contain highly specific information such as compartment-specific information or transport processes. However, both databases currently only contain information about the model plant Arabidopsis.

CONCLUSION

MetaCrop is an ongoing project and currently consists largely of a collection of manually curated data about six major crop plants, interactive interaction methods via the web interface and export functionalities. Our vision for the database is in two directions: the further curation of information and the improvement of the web interface. We plan to extend the information stored in MetaCrop to secondary pathways and to include other important crop plants such as Glycin max (soybean), Solanum lycopersicum (tomato), Helianthus annuus (sunflower) and Secale cereale (rye). For the web interface work is underway to implement methods to take advantage of the taxonomy and localization information in MetaCrop such that, for example, if information is not available for a specific species it can be derived from information of closely related species.

24 in total

1. The ENZYME database in 2000.

Authors: A Bairoch
Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971

2. Transgenesis and yield: what are our targets?

Authors: Helen L Jenner
Journal: Trends Biotechnol Date: 2003-05 Impact factor: 19.536

3. Tools for kinetic modeling of biochemical networks.

Authors: Rui Alves; Fernando Antunes; Armindo Salvador
Journal: Nat Biotechnol Date: 2006-06 Impact factor: 54.908

4. Carbon-negative biofuels from low-input high-diversity grassland biomass.

Authors: David Tilman; Jason Hill; Clarence Lehman
Journal: Science Date: 2006-12-08 Impact factor: 47.728

5. A draft sequence of the rice genome (Oryza sativa L. ssp. japonica).

Authors: Stephen A Goff; Darrell Ricke; Tien-Hung Lan; Gernot Presting; Ronglin Wang; Molly Dunn; Jane Glazebrook; Allen Sessions; Paul Oeller; Hemant Varma; David Hadley; Don Hutchison; Chris Martin; Fumiaki Katagiri; B Markus Lange; Todd Moughamer; Yu Xia; Paul Budworth; Jingping Zhong; Trini Miguel; Uta Paszkowski; Shiping Zhang; Michelle Colbert; Wei-lin Sun; Lili Chen; Bret Cooper; Sylvia Park; Todd Charles Wood; Long Mao; Peter Quail; Rod Wing; Ralph Dean; Yeisoo Yu; Andrey Zharkikh; Richard Shen; Sudhir Sahasrabudhe; Alun Thomas; Rob Cannings; Alexander Gutin; Dmitry Pruss; Julia Reid; Sean Tavtigian; Jeff Mitchell; Glenn Eldredge; Terri Scholl; Rose Mary Miller; Satish Bhatnagar; Nils Adey; Todd Rubano; Nadeem Tusneem; Rosann Robinson; Jane Feldhaus; Teresita Macalma; Arnold Oliphant; Steven Briggs
Journal: Science Date: 2002-04-05 Impact factor: 47.728

6. ARAMEMNON, a novel database for Arabidopsis integral membrane proteins.

Authors: Rainer Schwacke; Anja Schneider; Eric van der Graaff; Karsten Fischer; Elisabetta Catoni; Marcelo Desimone; Wolf B Frommer; Ulf-Ingo Flügge; Reinhard Kunze
Journal: Plant Physiol Date: 2003-01 Impact factor: 8.340

7. From genomics to chemical genomics: new developments in KEGG.

Authors: Minoru Kanehisa; Susumu Goto; Masahiro Hattori; Kiyoko F Aoki-Kinoshita; Masumi Itoh; Shuichi Kawashima; Toshiaki Katayama; Michihiro Araki; Mika Hirakawa
Journal: Nucleic Acids Res Date: 2006-01-01 Impact factor: 16.971

8. Meta-All: a system for managing metabolic pathway information.

Authors: Stephan Weise; Ivo Grosse; Christian Klukas; Dirk Koschützki; Uwe Scholz; Falk Schreiber; Björn H Junker
Journal: BMC Bioinformatics Date: 2006-10-23 Impact factor: 3.169

9. VANTED: a system for advanced data analysis and visualization in the context of biological networks.

Authors: Björn H Junker; Christian Klukas; Falk Schreiber
Journal: BMC Bioinformatics Date: 2006-03-06 Impact factor: 3.169

10. Reactome: a knowledge base of biologic pathways and processes.

Authors: Imre Vastrik; Peter D'Eustachio; Esther Schmidt; Geeta Joshi-Tope; Gopal Gopinath; David Croft; Bernard de Bono; Marc Gillespie; Bijay Jassal; Suzanna Lewis; Lisa Matthews; Guanming Wu; Ewan Birney; Lincoln Stein
Journal: Genome Biol Date: 2007 Impact factor: 13.583

17 in total

1. Creating interactive, web-based and data-enriched maps with the Systems Biology Graphical Notation.

Authors: Astrid Junker; Hendrik Rohn; Tobias Czauderna; Christian Klukas; Anja Hartmann; Falk Schreiber
Journal: Nat Protoc Date: 2012-03-01 Impact factor: 13.491

2. Editing, validating and translating of SBGN maps.

Authors: Tobias Czauderna; Christian Klukas; Falk Schreiber
Journal: Bioinformatics Date: 2010-07-13 Impact factor: 6.937

3. Creation of a genome-wide metabolic pathway database for Populus trichocarpa using a new approach for reconstruction and curation of metabolic pathways for plants.

Authors: Peifen Zhang; Kate Dreher; A Karthikeyan; Anjo Chi; Anuradha Pujar; Ron Caspi; Peter Karp; Vanessa Kirkup; Mario Latendresse; Cynthia Lee; Lukas A Mueller; Robert Muller; Seung Yon Rhee
Journal: Plant Physiol Date: 2010-06-03 Impact factor: 8.340

10. Metabolite signal identification in accurate mass metabolomics data with MZedDB, an interactive m/z annotation tool utilising predicted ionisation behaviour 'rules'.

Authors: John Draper; David P Enot; David Parker; Manfred Beckmann; Stuart Snowdon; Wanchang Lin; Hassan Zubair
Journal: BMC Bioinformatics Date: 2009-07-21 Impact factor: 3.169