Literature DB >> 19307239

KEGGgraph: a graph approach to KEGG PATHWAY in R and bioconductor.

Jitao David Zhang1, Stefan Wiemann.   

Abstract

MOTIVATION: KEGG PATHWAY is a service of Kyoto Encyclopedia of Genes and Genomes (KEGG), constructing manually curated pathway maps that represent current knowledge on biological networks in graph models. While valuable graph tools have been implemented in R/Bioconductor, to our knowledge there is currently no software package to parse and analyze KEGG pathways with graph theory.
RESULTS: We introduce the software package KEGGgraph in R and Bioconductor, an interface between KEGG pathways and graph models as well as a collection of tools for these graphs. Superior to existing approaches, KEGGgraph captures the pathway topology and allows further analysis or dissection of pathway graphs. We demonstrate the use of the package by the case study of analyzing human pancreatic cancer pathway. AVAILABILITY: KEGGgraph is freely available at the Bioconductor web site (http://www.bioconductor.org). KGML files can be downloaded from KEGG FTP site (ftp://ftp.genome.jp/pub/kegg/xml).

Entities:  

Mesh:

Year:  2009        PMID: 19307239      PMCID: PMC2682514          DOI: 10.1093/bioinformatics/btp167

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 INTRODUCTION

Since its first introduction in 1995, KEGG PATHWAY has been widely used as a reference knowledge base for understanding biological pathways and functions of cellular processes. Over the last years, KEGG PATHWAY has been significantly expanded with the addition of new pathways related to signal transduction, cellular process and disease (Kaneshia et al., 2008), enhancing its popularity built upon featuring traditional metabolic pathways. Pathways are stored and presented as graphs on the KEGG server side, where nodes are mainly molecules (protein, compound, etc.) and edges represent relation types between the nodes, e.g. activation or phosphorylation. The graph nature of pathways raised our interest to investigate them with powerful tools implemented in R and Bioconductor (Gentleman et al., 2004), e.g. graph, RBGL and Rgraphviz (Carey et al., 2003). While it is barely possible to query the graph characteristics by manual parsing, a native and straightforward client-side tool is currently missing. Packages like KEGG.db and keggorth use information from KEGG, however none of them makes use of the graph information, precluding the option to study pathways from the graph theory perspective (see Section 4 for more details). To address this problem, we developed the open source software package KEGGgraph, an interface between KEGG pathways and graph-theoretical models as well as a collection of tools to analyze, dissect and visualize these graphs.

2 SOFTWARE FEATURES

KEGGgraph offers the following functionalities: Besides the functionalities described above, KEGGgraph also provides tools for remote KGML file retrieval, graph feature study and other related tasks. We refer interested readers to the vignettes released along the package. Parsing: the package parses the regularly updated KGML (KEGG XML) files into graph models maintaining pathway attributes. It should be noted that one ‘node’ in KEGG pathway does not necessarily map to merely one gene product, for example, the node ERK in the human TGF-β signaling pathway contains two homologs, MAPK1 and MAPK3. Therefore, among several parsing options, users can decide whether to expand these nodes topologically. Beyond facilitating the interpretation of pathways in a gene-oriented manner, the approach also assigns unique identifiers to nodes, enabling merging graphs from different pathways. Graph operations: two common operations on graphs are subset and merge (union). A subgraph of selected nodes and the edges in between are returned when subsetting, while merging produces a new graph that contains nodes and edges of individual ones. Both are implemented in KEGGgraph. Visualization: KEGGgraph provides functions to visualize KEGG graphs with custom style. Nevertheless, users are not restricted by them, alternatively they are free to render the graph with other tools like the ones in Rgraphviz.

3 EXAMPLE

Software usage is demonstrated by exploring the graph char-acteristics of pancreatic cancer pathway (http://www.genome.jp/dbget-bin/show_pathway?hsa05212), as KEGG provides pathways also of human diseases. The human pancreatic cancer pathway is linked to eight other pathways as indicated in the KEGG pathway map. To investigate the global network, we merge them into one graph, consisting of 714 nodes and 3196 edges (see Supplementary Material for the complete source code). Our aim is to computationally identify the most important nodes. To this end we turn to relative betweenness centrality, one of the measures reflecting the importance of a node in a graph relative to other nodes (Aittokallio and Schwikowski, 2006). For a graph G≔(V, E) with n vertices, the relative betweenness centrality C′(v) is defined by: where σ is the number of shortest paths from s to t, and σ(v) is the number of shortest paths from s to t that pass through a vertex v (Freeman, 1977). With the function implemented in RBGL package (Brandes, 2001), we identified the most important nodes (Fig. 1) judged by relative betweenness centrality that are TP53 (tumor protein p53), GRB2 (growth factor receptor-bound protein 2) and EGFR (epidermal growth factor receptor). While the oncological roles of TP53 and EGFR are long established in pancreatic carcinoma (Garces et al., 2005), it has only very recently been suggested that the binding of GRB2 to TβR-II is essential for mammary tumor growth and metastasis stimulated by TGF-β (Galliher-Beckley and Schiemann, 2007). No evidence is known to us proving the direct relation between GRB2 and pancreatic cancer. Considering the importance of GRB2 in the network, we suggest to study its role also in this cancer type.
Fig. 1.

Nodes with the highest relative betweenness centrality (in orange) and their interacting partners (blue) in the pancreatic cancer pathway. Relative betweenness centrality estimates the relative importance or role in global network organization.

Nodes with the highest relative betweenness centrality (in orange) and their interacting partners (blue) in the pancreatic cancer pathway. Relative betweenness centrality estimates the relative importance or role in global network organization.

4 DISCUSSION

Prior to the release of KEGGgraph, several R/Bioconductor packages have been introduced and proved their usefulness in understanding biological pathways with KEGG. However, KEGGgraph is the first package able to parse any KEGG pathways from KGML files into graphs. Existing tools either neglect the graph topology (KEGG.db), or do not parse pathway networks (keggorth), or are specialized for certain pathways (cMAP and pathRender). Tools have also been implemented on other platforms to use the knowledge of KEGG, e.g. MetaRoute (Blum and Kohlbacher, 2008), Gaggle (Shannon et al., 2006) and Cytoscape (Shannon et al., 2003). To make it unique and complementary to these tools, KEGGgraph allows native statistical and computational analysis of any KEGG pathway based on graph theory in R. Thanks to the variety of Bioconductor packages, KEGGgraph can be built into analysis pipelines targeting versatile biological questions. No active Internet connection is required once the KGML files have been downloaded, reducing the waiting time and network overhead unavoidable in web-service-based approaches. Using tools like KGML-ED (Klukas and Schreiber, 2007), with KEGGgraph it is even possible to explore newly created or edited pathways via KGML files. Funding: National Genome Research Network (grant number 01GS0864) of the German Federal Ministry of Education and Research (BMBF); International PhD program of the DKFZ (to J.D.Z.). Conflict of Interest: none declared.
  10 in total

1.  Cytoscape: a software environment for integrated models of biomolecular interaction networks.

Authors:  Paul Shannon; Andrew Markiel; Owen Ozier; Nitin S Baliga; Jonathan T Wang; Daniel Ramage; Nada Amin; Benno Schwikowski; Trey Ideker
Journal:  Genome Res       Date:  2003-11       Impact factor: 9.043

2.  Network structures and algorithms in Bioconductor.

Authors:  Vincent J Carey; Jeff Gentry; Elizabeth Whalen; Robert Gentleman
Journal:  Bioinformatics       Date:  2004-08-05       Impact factor: 6.937

Review 3.  Graph-based methods for analysing networks in cell biology.

Authors:  Tero Aittokallio; Benno Schwikowski
Journal:  Brief Bioinform       Date:  2006-07-30       Impact factor: 11.622

4.  Dynamic exploration and editing of KEGG pathway diagrams.

Authors:  Christian Klukas; Falk Schreiber
Journal:  Bioinformatics       Date:  2006-12-01       Impact factor: 6.937

Review 5.  Molecular prognostic markers in pancreatic cancer: a systematic review.

Authors:  G Garcea; C P Neal; C J Pattenden; W P Steward; D P Berry
Journal:  Eur J Cancer       Date:  2005-09-16       Impact factor: 9.162

6.  Bioconductor: open software development for computational biology and bioinformatics.

Authors:  Robert C Gentleman; Vincent J Carey; Douglas M Bates; Ben Bolstad; Marcel Dettling; Sandrine Dudoit; Byron Ellis; Laurent Gautier; Yongchao Ge; Jeff Gentry; Kurt Hornik; Torsten Hothorn; Wolfgang Huber; Stefano Iacus; Rafael Irizarry; Friedrich Leisch; Cheng Li; Martin Maechler; Anthony J Rossini; Gunther Sawitzki; Colin Smith; Gordon Smyth; Luke Tierney; Jean Y H Yang; Jianhua Zhang
Journal:  Genome Biol       Date:  2004-09-15       Impact factor: 13.583

7.  Grb2 binding to Tyr284 in TbetaR-II is essential for mammary tumor growth and metastasis stimulated by TGF-beta.

Authors:  Amy J Galliher-Beckley; William P Schiemann
Journal:  Carcinogenesis       Date:  2008-01-03       Impact factor: 4.944

8.  The Gaggle: an open-source software system for integrating bioinformatics software and data sources.

Authors:  Paul T Shannon; David J Reiss; Richard Bonneau; Nitin S Baliga
Journal:  BMC Bioinformatics       Date:  2006-03-28       Impact factor: 3.169

9.  KEGG for linking genomes to life and the environment.

Authors:  Minoru Kanehisa; Michihiro Araki; Susumu Goto; Masahiro Hattori; Mika Hirakawa; Masumi Itoh; Toshiaki Katayama; Shuichi Kawashima; Shujiro Okuda; Toshiaki Tokimatsu; Yoshihiro Yamanishi
Journal:  Nucleic Acids Res       Date:  2007-12-12       Impact factor: 16.971

10.  MetaRoute: fast search for relevant metabolic routes for interactive network navigation and visualization.

Authors:  Torsten Blum; Oliver Kohlbacher
Journal:  Bioinformatics       Date:  2008-07-16       Impact factor: 6.937

  10 in total
  118 in total

1.  Variable selection for discriminant analysis with Markov random field priors for the analysis of microarray data.

Authors:  Francesco C Stingo; Marina Vannucci
Journal:  Bioinformatics       Date:  2010-12-14       Impact factor: 6.937

2.  Folic Acid Alters Methylation Profile of JAK-STAT and Long-Term Depression Signaling Pathways in Alzheimer's Disease Models.

Authors:  Wen Li; Huan Liu; Min Yu; Xumei Zhang; Yan Zhang; Hongbo Liu; John X Wilson; Guowei Huang
Journal:  Mol Neurobiol       Date:  2015-12-01       Impact factor: 5.590

Review 3.  Extending biochemical databases by metabolomic surveys.

Authors:  Oliver Fiehn; Dinesh K Barupal; Tobias Kind
Journal:  J Biol Chem       Date:  2011-05-12       Impact factor: 5.157

4.  Genome-wide transcriptional analysis of cardiovascular-related genes and pathways induced by PM2.5 in human myocardial cells.

Authors:  Lin Feng; Xiaozhe Yang; Collins Otieno Asweto; Jing Wu; Yannan Zhang; Hejing Hu; Yanfeng Shi; Junchao Duan; Zhiwei Sun
Journal:  Environ Sci Pollut Res Int       Date:  2017-03-22       Impact factor: 4.223

5.  miRNA-target gene regulatory networks: A Bayesian integrative approach to biomarker selection with application to kidney cancer.

Authors:  Thierry Chekouo; Francesco C Stingo; James D Doecke; Kim-Anh Do
Journal:  Biometrics       Date:  2015-01-30       Impact factor: 2.571

6.  Pathways involved in sasang constitution from genome-wide analysis in a Korean population.

Authors:  Bu-Yeo Kim; Sung-Gon Yu; Jong-Yeol Kim; Kwang Hoon Song
Journal:  J Altern Complement Med       Date:  2012-08-13       Impact factor: 2.579

7.  Pathways enrichment analysis for differentially expressed genes in squamous lung cancer.

Authors:  Liqiang Qian; Qingquan Luo; Xiaojing Zhao; Jia Huang
Journal:  Pathol Oncol Res       Date:  2013-10-10       Impact factor: 3.201

8.  Transcriptome analysis during seed germination of elite Chinese bread wheat cultivar Jimai 20.

Authors:  Yonglong Yu; Guangfang Guo; Dongwen Lv; Yingkao Hu; Jiarui Li; Xiaohui Li; Yueming Yan
Journal:  BMC Plant Biol       Date:  2014-01-13       Impact factor: 4.215

9.  Molecular signaling network complexity is correlated with cancer patient survivability.

Authors:  Dylan Breitkreutz; Lynn Hlatky; Edward Rietman; Jack A Tuszynski
Journal:  Proc Natl Acad Sci U S A       Date:  2012-05-21       Impact factor: 11.205

10.  KEGGconverter: a tool for the in-silico modelling of metabolic networks of the KEGG Pathways database.

Authors:  Konstantinos Moutselos; Ioannis Kanaris; Aristotelis Chatziioannou; Ilias Maglogiannis; Fragiskos N Kolisis
Journal:  BMC Bioinformatics       Date:  2009-10-08       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.