Literature DB >> 21700675

KEGGtranslator: visualizing and converting the KEGG PATHWAY database to various formats.

Clemens Wrzodek1, Andreas Dräger, Andreas Zell.   

Abstract

SUMMARY: The KEGG PATHWAY database provides a widely used service for metabolic and nonmetabolic pathways. It contains manually drawn pathway maps with information about the genes, reactions and relations contained therein. To store these pathways, KEGG uses KGML, a proprietary XML-format. Parsers and translators are needed to process the pathway maps for usage in other applications and algorithms. We have developed KEGGtranslator, an easy-to-use stand-alone application that can visualize and convert KGML formatted XML-files into multiple output formats. Unlike other translators, KEGGtranslator supports a plethora of output formats, is able to augment the information in translated documents (e.g. MIRIAM annotations) beyond the scope of the KGML document, and amends missing components to fragmentary reactions within the pathway to allow simulations on those. AVAILABILITY: KEGGtranslator is freely available as a Java(™) Web Start application and for download at http://www.cogsys.cs.uni-tuebingen.de/software/KEGGtranslator/. KGML files can be downloaded from within the application. CONTACT: clemens.wrzodek@uni-tuebingen.de SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities:  

Mesh:

Year:  2011        PMID: 21700675      PMCID: PMC3150042          DOI: 10.1093/bioinformatics/btr377

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 INTRODUCTION

Many academic researchers, who want to use pathway-based information, utilize the KEGG PATHWAY database (Kanehisa and Goto, 2000). The database, established in 1995, contains manually created maps for various pathways. These maps are visualized on the web and can be downloaded free of charge (for academics) as XML-files in the KEGG Markup Language (KGML). The elements in a pathway XML-file (such as reactions or genes) are usually identified by a KEGG identifier only. Thus, KEGG PATHWAY is strongly related to other KEGG databases that resolve and further describe the identifiers. However, the content of these KGML-formatted XML-files itself is limited. Gene names are often encoded as barely readable abbreviations and elements are only annotated by a single KEGG identifier. By improving the annotation and translating the KGML-files to other file formats, researchers could use the KEGG database for many applications: individual pathway pictures could be created; pathway simulation and modeling applications could be executed; graph-operations on the pathways or stoichiometric analyses could be performed (e.g. Heinrich and Schuster, 2006, chapter 3); or the KEGG pathway database could be used for gene set enrichment analyses. For these purposes, only a few converters are available: KEGGconverter (Moutselos ) or KEGG2SBML (Funahashi ) offer command-line or web-based conversion of KGML-files to SBML-files. KEGGgraph (Zhang and Wiemann, 2009) is able to convert KGML-files to R-based graph structures. None of these tools has a graphical user interface, is capable to validate and autocomplete KEGG reactions, adds standard identifiers (such as MIRIAM URNs) to pathway elements, or is able to write KGML files in multiple ouput formats. Along with this work, the command-line toolbox SuBliMinaL (N.Swainston et al., submitted for publication) overcomes some of these limitations. We here present KEGGtranslator, which reads and completes the content of an XML-file by retrieving online-annotation of all genes and reactions using the KEGG API. KGML-files can be converted to many output formats. Minor deficiencies are corrected (e.g. the name of a gene), new information is added (e.g. multiple MIRIAM identifiers for each gene and reaction (Novère ), or SBO terms describing the function) and some crucial deficiencies (like missing reactants) are addressed.

2 TRANSLATION OF KGML-FILES

In the first step of a translation, KEGGtranslator reads a given XML-file and puts all contained elements into an internal data structure. To get further information and annotation, the KEGG database is queried via the KEGG API for each element in the document (pathway, entries, reactions, relations, substrates, products, etc.). This completes the sparse XML-document with comprehensive information. For example, multiple synonyms and identifiers of many external databases (Ensembl, EntrezGene, UniProt, ChEBI, Gene Ontology, DrugBank, PDBeChem and many more) are being assigned to genes and other elements. After this initial step, various preprocessing operations are performed on the pathway. The user may choose to let KEGGtranslator correct various deficiencies automatically: Remove white nodes—KEGG uses colors in the visualization of a pathway to annotate organism-specific orthologous genes. Nodes in green represent biological entities that occur in the current organism. Nodes in white represent biological entities, corresponding to genes that occur in this pathway in other species, but not in the current one. Translating all those nodes into new models, without caring for the node color, would lead to a model, that contains invalid genes in the pathway. Remove orphans—isolated nodes without any reactions or relations are usually unnecessary for further simulations. Autocomplete reactions—another major deficiency are incomplete reactions. The XML-files only contain those components of a reaction, that are needed for the graphical representation of the pathway. Reactants that are not necessary for the visualization are usually skipped in the KGML format. Thus, the given chemical equation is sometimes incomplete (see Fig. 1). KEGGtranslator is able to lookup each reaction and amend the missing components to reactions. This leads to more complete and functionally correct pathway models, which is very important, e.g. for stoichiometric simulations. After these preprocessing steps, KEGGtranslator branches between two different conversion modes for the actual translation: a functional translation (SBML) and a graphical translation (e.g. GraphML, GML). Depending on the chosen output format, KEGGtranslator determines how to continue with the conversion.
Fig. 1.

(A) Screenshot of a translated GraphML pathway in KEGGtranslator. (B) The need for autocompleting reactions: the upper half shows the KGML-file with only one substrate and product. On the lower half, the complete reaction equation is shown. As one can see, one substrate and product is missing in the XML-document.

The functional translation is performed by converting the KGML document to a JSBML data structure (Dräger ). The focus lies on generating valid and specification-conform SBML (Level 2 Version 4) code that eases, e.g. a dynamic simulation of the pathway. Multiple MIRIAM URNs and an SBO term, which describes best the function of the element, is assigned to each entry of the pathway (pathway references, genes, compounds, enzymes, reactions, reaction-modifiers, etc.). Additionally, notes are assigned to each element with human-readable names and synonyms, a description of the element, and links to pictures and further information. The user may also choose to add graphical information by putting CellDesigner annotations to the model. But the focus in functional translation lies on the reactions in KGML documents, whereas graphical representations concentrate on relations between pathway elements. Besides the already mentioned completion of reactions, each enzymatic modifier is correctly assigned to the reaction and the reversibility of the reaction is annotated. As a final step, the SBML2LaTeX (Dräger ) tool has been integrated into KEGGtranslator, which allows users to automatically generate a LaTeX or PDF-report, to document the SBML-code of the translated pathway. Furthermore, the user may add kinetics to the pathway by using the SBMLsqueezer (Dräger ) tool after the translation. In graphical translations, results can be saved as GraphML, GML or YGF and finally as images of type JPG, GIF or TGF. In this mode, the KGML data structure is being converted to a yFiles (Wiese ) data structure. The focus here lies on the visualization of the pathway. Relations are being translated by inserting arrows with the appropriate style, which is given in the KGML document. For example, dashed arrows without heads represent bindings or associations and a dotted arrow with a simple, filled head illustrates an indirect effect. Please see the KGML specification for a complete list. As in the functional translation, GraphML allows to define custom annotation elements. KEGGtranslator makes use of those, by putting several identifiers (e.g. EntrezGene or Ensembl) and descriptions to the single nodes. From the KGML document, the shape of the node is translated as well as the colors and labels. Links to descriptive HTML pages are being setup and hierarchical group nodes are being created for defined compounds. All these features lead to a graphical representation of the pathway that provides as much information about the elements as possible.

3 DISCUSSION

KEGGtranslator is a stand-alone application with a graphical user interface that runs on every operating system for which a Java™ virtual machine is available. There are other tools for converting KGML to SBML and for converting KGML to graph structures in R. But, to our knowledge, no other KEGG converter is able to translate KGML formatted files to such a variety of output formats with important functionalities like the autocompletion of reactions or the annotation of each element in the translated file, using various identifiers. Furthermore, KEGGtranslator is simple, easy-to-use and comes with a powerful command-line and graphical user interface. The variety of output formats, combined with the translation options and comprehensive, standard-conform annotation of the pathway elements allow a quick and easy usage of files from the KEGG pathway database in a wide range of other applications. (A) Screenshot of a translated GraphML pathway in KEGGtranslator. (B) The need for autocompleting reactions: the upper half shows the KGML-file with only one substrate and product. On the lower half, the complete reaction equation is shown. As one can see, one substrate and product is missing in the XML-document.
  7 in total

1.  KEGG: kyoto encyclopedia of genes and genomes.

Authors:  M Kanehisa; S Goto
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  Minimum information requested in the annotation of biochemical models (MIRIAM).

Authors:  Nicolas Le Novère; Andrew Finney; Michael Hucka; Upinder S Bhalla; Fabien Campagne; Julio Collado-Vides; Edmund J Crampin; Matt Halstead; Edda Klipp; Pedro Mendes; Poul Nielsen; Herbert Sauro; Bruce Shapiro; Jacky L Snoep; Hugh D Spence; Barry L Wanner
Journal:  Nat Biotechnol       Date:  2005-12       Impact factor: 54.908

3.  JSBML: a flexible Java library for working with SBML.

Authors:  Andreas Dräger; Nicolas Rodriguez; Marine Dumousseau; Alexander Dörr; Clemens Wrzodek; Nicolas Le Novère; Andreas Zell; Michael Hucka
Journal:  Bioinformatics       Date:  2011-06-22       Impact factor: 6.937

4.  KEGGconverter: a tool for the in-silico modelling of metabolic networks of the KEGG Pathways database.

Authors:  Konstantinos Moutselos; Ioannis Kanaris; Aristotelis Chatziioannou; Ilias Maglogiannis; Fragiskos N Kolisis
Journal:  BMC Bioinformatics       Date:  2009-10-08       Impact factor: 3.169

5.  KEGGgraph: a graph approach to KEGG PATHWAY in R and bioconductor.

Authors:  Jitao David Zhang; Stefan Wiemann
Journal:  Bioinformatics       Date:  2009-03-23       Impact factor: 6.937

6.  SBML2L(A)T(E)X: conversion of SBML files into human-readable reports.

Authors:  Andreas Dräger; Hannes Planatscher; Dieudonné Motsou Wouamba; Adrian Schröder; Michael Hucka; Lukas Endler; Martin Golebiewski; Wolfgang Müller; Andreas Zell
Journal:  Bioinformatics       Date:  2009-03-23       Impact factor: 6.937

7.  SBMLsqueezer: a CellDesigner plug-in to generate kinetic rate equations for biochemical networks.

Authors:  Andreas Dräger; Nadine Hassis; Jochen Supper; Adrian Schröder; Andreas Zell
Journal:  BMC Syst Biol       Date:  2008-04-30
  7 in total
  27 in total

Review 1.  Computational tools for metabolic engineering.

Authors:  Wilbert B Copeland; Bryan A Bartley; Deepak Chandran; Michal Galdzicki; Kyung H Kim; Sean C Sleight; Costas D Maranas; Herbert M Sauro
Journal:  Metab Eng       Date:  2012-05       Impact factor: 9.783

2.  Identification of alterations in the Jacobian of biochemical reaction networks from steady state covariance data at two conditions.

Authors:  Philipp Kügler; Wei Yang
Journal:  J Math Biol       Date:  2013-05-26       Impact factor: 2.259

3.  InCroMAP: integrated analysis of cross-platform microarray and pathway data.

Authors:  Clemens Wrzodek; Johannes Eichner; Finja Büchel; Andreas Zell
Journal:  Bioinformatics       Date:  2012-12-20       Impact factor: 6.937

4.  Precise generation of systems biology models from KEGG pathways.

Authors:  Clemens Wrzodek; Finja Büchel; Manuel Ruff; Andreas Dräger; Andreas Zell
Journal:  BMC Syst Biol       Date:  2013-02-21

5.  JSBML 1.0: providing a smorgasbord of options to encode systems biology models.

Authors:  Nicolas Rodriguez; Alex Thomas; Leandro Watanabe; Ibrahim Y Vazirabad; Victor Kofia; Harold F Gómez; Florian Mittag; Jakob Matthes; Jan Rudolph; Finja Wrzodek; Eugen Netz; Alexander Diamantikos; Johannes Eichner; Roland Keller; Clemens Wrzodek; Sebastian Fröhlich; Nathan E Lewis; Chris J Myers; Nicolas Le Novère; Bernhard Ø Palsson; Michael Hucka; Andreas Dräger
Journal:  Bioinformatics       Date:  2015-06-16       Impact factor: 6.937

6.  Conversion of KEGG metabolic pathways to SBGN maps including automatic layout.

Authors:  Tobias Czauderna; Michael Wybrow; Kim Marriott; Falk Schreiber
Journal:  BMC Bioinformatics       Date:  2013-08-16       Impact factor: 3.169

7.  KEGGViewer, a BioJS component to visualize KEGG Pathways.

Authors:  Jose M Villaveces; Rafael C Jimenez; Bianca H Habermann
Journal:  F1000Res       Date:  2014-02-13

Review 8.  Colorectal cancer through simulation and experiment.

Authors:  Sophie K Kershaw; Helen M Byrne; David J Gavaghan; James M Osborne
Journal:  IET Syst Biol       Date:  2013-06       Impact factor: 1.615

9.  Parkinson's disease: dopaminergic nerve cell model is consistent with experimental finding of increased extracellular transport of α-synuclein.

Authors:  Finja Büchel; Sandra Saliger; Andreas Dräger; Stephanie Hoffmann; Clemens Wrzodek; Andreas Zell; Philipp J Kahle
Journal:  BMC Neurosci       Date:  2013-11-06       Impact factor: 3.288

10.  KEGGscape: a Cytoscape app for pathway data integration.

Authors:  Kozo Nishida; Keiichiro Ono; Shigehiko Kanaya; Koichi Takahashi
Journal:  F1000Res       Date:  2014-07-01
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.