Literature DB >> 26685306

PaxtoolsR: pathway analysis in R using Pathway Commons.

Augustin Luna1, Özgün Babur1, Bülent Arman Aksoy1, Emek Demir1, Chris Sander1.   

Abstract

PURPOSE: PaxtoolsR package enables access to pathway data represented in the BioPAX format and made available through the Pathway Commons webservice for users of the R language to aid in advanced pathway analyses. Features include the extraction, merging and validation of pathway data represented in the BioPAX format. This package also provides novel pathway datasets and advanced querying features for R users through the Pathway Commons webservice allowing users to query, extract and retrieve data and integrate these data with local BioPAX datasets.
AVAILABILITY AND IMPLEMENTATION: The PaxtoolsR package is compatible with versions of R 3.1.1 (and higher) on Windows, Mac OS X and Linux using Bioconductor 3.0 and is available through the Bioconductor R package repository along with source code and a tutorial vignette describing common tasks, such as data visualization and gene set enrichment analysis. Source code and documentation are at http://www.bioconductor.org/packages/paxtoolsr This plugin is free, open-source and licensed under the LGPL-3. CONTACT: paxtools@cbio.mskcc.org or lunaa@cbio.mskcc.org.
© The Author 2015. Published by Oxford University Press.

Entities:  

Mesh:

Year:  2015        PMID: 26685306      PMCID: PMC4824129          DOI: 10.1093/bioinformatics/btv733

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

The amount of biological pathway data in machine-readable databases and formats continues to increase. Pathway analysis allows researchers to gain new understanding of the functions of biological systems. A common task has been to aggregate pathway data across databases. This task has been simplified through the creation of standardized data representations, such as the Biological Pathway Exchange (BioPAX) format (Demir ). Pathway Commons is an ongoing effort to aggregate pathway data over a number of databases supporting the BioPAX notation and webservices to access these data (Cerami ). The core component that facilitates the development of projects using data in the BioPAX format, such as Pathway Commons, has been Paxtools, a BioPAX application programming interface (API) written in Java (Demir ). Although the R programming language is widely used in areas of computational biology, there is a deficiency in the availability of pathway data provided through R packages. A recent review by Kramer , 2014) describes 12 R packages for working with pathway data. The majority of these packages—including KEGGgraph, PathView and ReactomePA—utilizes and provides data from KEGG and Reactome. A number of the packages are generic parsers for a variety of formats, including the Systems Biology Markup Language (SBML), KEGG Markup Language (KGML) and BioPAX. Through the PaxtoolsR package, we extend the literature-curated pathway data available to R users, we provide a number of Paxtools API functions, and provide an interface to the Pathway Commons webservice. Through this interface, PaxtoolsR provides native support for the aggregated Pathway Commons database, including data imported from the NCI Pathway Interaction Database (PID), PantherDB, HumanCyc, Reactome, PhoshoSitePlus and HPRD.

2 Implementation and functionality

PaxtoolsR is implemented using the rJava R package (http://www.rforge.net/rJava/) which allows R code to call Java methods. Although R users could use rJava to directly call methods in the Paxtools library, these tend not to follow typical R language conventions, and therefore, PaxtoolsR simplifies the usage of Paxtools in R. PaxtoolsR implements two main sets of features: (i) functions available through the Paxtools console application and (ii) functions available provided through the Pathway Commons webservice. Below, we first describe the main data formats used by the PaxtoolsR package and then describe the functions provided by PaxtoolsR. Additionally, the PaxtoolsR provides a vignette (found on the project website) to guide users in using the provided functionality, such as the visualization of networks directly in R using existing R graph libraries, such as igraph (Csardi and Nepusz, 2006) and RCytoscape (Shannon ), and combining the analysis of gene expression microarrays with pathway data using gene set enrichment analysis (GSEA) (Subramanian ).

2.1 Data formats

There are several primary data formats used by the PaxtoolsR package: BioPAX, simple interaction format (SIF) and extensible markup language (XML); here we describe the role of each of these formats in the PaxtoolsR package.

2.1.1 BioPAX format

The BioPAX format is an RDF/OWL-based language described previously and used as the main input format for the functions provided via the Paxtools Java library (Demir , 2013). BioPAX representations for databases aggregated by Pathway Commons can be downloaded from the project website (http://www.pathwaycommons.org). The currently aggregated databases, include HPRD, HumanCyc, NCI PID, Panther, PhosphoSitePlus and Reactome, among others.

2.1.2 Simple Interaction Format (SIF)

The SIF format is a tab-delimited, plain-text network edge list that describes how two molecules are related in a binary fashion, and is generated from BioPAX datasets by searching certain graphical patterns (Babur ). The SIF format composed of three columns: PARTICIPANT A, INTERACTION TYPE and PARTICIPANT B. There are a number of interaction types, which are described in the package vignette. The conversion from BioPAX to SIF is lossy, but remains useful for applications that require binary interactions, which includes many existing network analysis software tools.

2.1.3 Extensible markup language

BioPAX file validation and search results of Pathway Commons results are returned as R XML (http://www.omegahat.org/RSXML/) objects where further data can be extracted using XPath expressions in R.

2.2 Convert, merge and validate local BioPAX files

A number of BioPAX-related functions are available in PaxtoolsR. These functions can both operate on local BioPAX files and those retrieved from Pathway Commons. PaxtoolsR provides a programming interface for the BioPAX format and for the functions provided through the Paxtools console application. These functions allow importing data into R through the SIF format and conversion of BioPAX files into a variety of formats, including the GSEA gene set format. Functions are also provided to extract subnetworks from BioPAX files and the merging of multiple BioPAX files through a previously described method that merges equivalent elements (Demir, 2013; Demir ). Additionally, PaxtoolsR provides methods to summarize the usage of BioPAX classes and validate BioPAX datasets (Rodchenkov ).

2.3 Query and traverse data from pathway commons

PaxtoolsR provides a number of functions for interacting with the Pathway Commons webservice. PaxtoolsR allows users to query Pathway Commons data via two functions. The first involves searching for specific molecular species or pathways of interest, using the searchPc() function. The second is the graphPc() function, which allows users to query subnetworks of interest. Figure 1 shows the usage of the graphPc() command to extract a small subnetwork involving the kinases AKT1 and MTOR. This subnetwork is then converted to a binary SIF network and visualized using igraph in R; this showcases how Pathway Commons data can be easily visualized using existing R packages. The traverse() function allows the extraction of specific entries from BioPAX records, such as the phosphorylation site information from proteins described in a BioPAX dataset.
Fig. 1.

Pathway Commons graph query of interactions between AKT1 and MTOR using PaxtoolsR and visualized using igraph. Data for the figure were retrieved with the command: graphPc(source=c(“AKT1”,” IRS1”, ”MTOR”, IGF1R”), kind=“PATHSBETWEEN”, format=“BINARY_SIF”)

Pathway Commons graph query of interactions between AKT1 and MTOR using PaxtoolsR and visualized using igraph. Data for the figure were retrieved with the command: graphPc(source=c(“AKT1”,” IRS1”, ”MTOR”, IGF1R”), kind=“PATHSBETWEEN”, format=“BINARY_SIF”)

3 Conclusion

The PaxtoolsR package extends the available biological pathway data available to researchers working primarily in an R environment. This package makes many of the features available from the BioPAX Paxtools API and the Pathway Commons webservice. The data and functionality provided here can be used for a wide range of biological pathway analysis studies and can be easily integrated with the rich ecosystem of existing R packages. Future development of this R package is expected as additions are made to the underlying Paxtools Java library and Pathway Commons webservice. Furthermore, we invite developers of network analysis R packages interested in the Pathway Commons data to work with us to help make the data we provide available to their methodologies.
  9 in total

1.  rBiopaxParser--an R package to parse, modify and visualize BioPAX data.

Authors:  Frank Kramer; Michaela Bayerlová; Florian Klemm; Annalen Bleckmann; Tim Beissbarth
Journal:  Bioinformatics       Date:  2012-12-28       Impact factor: 6.937

2.  Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.

Authors:  Aravind Subramanian; Pablo Tamayo; Vamsi K Mootha; Sayan Mukherjee; Benjamin L Ebert; Michael A Gillette; Amanda Paulovich; Scott L Pomeroy; Todd R Golub; Eric S Lander; Jill P Mesirov
Journal:  Proc Natl Acad Sci U S A       Date:  2005-09-30       Impact factor: 11.205

3.  Pathway Commons, a web resource for biological pathway data.

Authors:  Ethan G Cerami; Benjamin E Gross; Emek Demir; Igor Rodchenkov; Ozgün Babur; Nadia Anwar; Nikolaus Schultz; Gary D Bader; Chris Sander
Journal:  Nucleic Acids Res       Date:  2010-11-10       Impact factor: 16.971

4.  The BioPAX community standard for pathway data sharing.

Authors:  Emek Demir; Michael P Cary; Suzanne Paley; Ken Fukuda; Christian Lemer; Imre Vastrik; Guanming Wu; Peter D'Eustachio; Carl Schaefer; Joanne Luciano; Frank Schacherer; Irma Martinez-Flores; Zhenjun Hu; Veronica Jimenez-Jacinto; Geeta Joshi-Tope; Kumaran Kandasamy; Alejandra C Lopez-Fuentes; Huaiyu Mi; Elgar Pichler; Igor Rodchenkov; Andrea Splendiani; Sasha Tkachev; Jeremy Zucker; Gopal Gopinath; Harsha Rajasimha; Ranjani Ramakrishnan; Imran Shah; Mustafa Syed; Nadia Anwar; Ozgün Babur; Michael Blinov; Erik Brauner; Dan Corwin; Sylva Donaldson; Frank Gibbons; Robert Goldberg; Peter Hornbeck; Augustin Luna; Peter Murray-Rust; Eric Neumann; Oliver Ruebenacker; Oliver Reubenacker; Matthias Samwald; Martijn van Iersel; Sarala Wimalaratne; Keith Allen; Burk Braun; Michelle Whirl-Carrillo; Kei-Hoi Cheung; Kam Dahlquist; Andrew Finney; Marc Gillespie; Elizabeth Glass; Li Gong; Robin Haw; Michael Honig; Olivier Hubaut; David Kane; Shiva Krupa; Martina Kutmon; Julie Leonard; Debbie Marks; David Merberg; Victoria Petri; Alex Pico; Dean Ravenscroft; Liya Ren; Nigam Shah; Margot Sunshine; Rebecca Tang; Ryan Whaley; Stan Letovksy; Kenneth H Buetow; Andrey Rzhetsky; Vincent Schachter; Bruno S Sobral; Ugur Dogrusoz; Shannon McWeeney; Mirit Aladjem; Ewan Birney; Julio Collado-Vides; Susumu Goto; Michael Hucka; Nicolas Le Novère; Natalia Maltsev; Akhilesh Pandey; Paul Thomas; Edgar Wingender; Peter D Karp; Chris Sander; Gary D Bader
Journal:  Nat Biotechnol       Date:  2010-09-09       Impact factor: 54.908

5.  RCytoscape: tools for exploratory network analysis.

Authors:  Paul T Shannon; Mark Grimes; Burak Kutlu; Jan J Bot; David J Galas
Journal:  BMC Bioinformatics       Date:  2013-07-09       Impact factor: 3.169

6.  The BioPAX Validator.

Authors:  Igor Rodchenkov; Emek Demir; Chris Sander; Gary D Bader
Journal:  Bioinformatics       Date:  2013-08-05       Impact factor: 6.937

7.  Using biological pathway data with paxtools.

Authors:  Emek Demir; Ozgün Babur; Igor Rodchenkov; Bülent Arman Aksoy; Ken I Fukuda; Benjamin Gross; Onur Selçuk Sümer; Gary D Bader; Chris Sander
Journal:  PLoS Comput Biol       Date:  2013-09-19       Impact factor: 4.475

8.  Pattern search in BioPAX models.

Authors:  Özgün Babur; Bülent Arman Aksoy; Igor Rodchenkov; Selçuk Onur Sümer; Chris Sander; Emek Demir
Journal:  Bioinformatics       Date:  2013-09-16       Impact factor: 6.937

9.  R-based software for the integration of pathway data into bioinformatic algorithms.

Authors:  Frank Kramer; Michaela Bayerlová; Tim Beißbarth
Journal:  Biology (Basel)       Date:  2014-02-07
  9 in total
  10 in total

Review 1.  Druggable Transcriptional Networks in the Human Neurogenic Epigenome.

Authors:  Gerald A Higgins; Aaron M Williams; Alex S Ade; Hasan B Alam; Brian D Athey
Journal:  Pharmacol Rev       Date:  2019-10       Impact factor: 25.468

2.  A Landscape of Metabolic Variation across Tumor Types.

Authors:  Ed Reznik; Augustin Luna; Bülent Arman Aksoy; Eric Minwei Liu; Konnor La; Irina Ostrovnaya; Chad J Creighton; A Ari Hakimi; Chris Sander
Journal:  Cell Syst       Date:  2018-01-27       Impact factor: 10.304

3.  Path2enet: generation of human pathway-derived networks in an expression specific context.

Authors:  Conrad Droste; Javier De Las Rivas
Journal:  BMC Genomics       Date:  2016-10-25       Impact factor: 3.969

4.  ALS blood expression profiling identifies new biomarkers, patient subgroups, and evidence for neutrophilia and hypoxia.

Authors:  William R Swindell; Colin P S Kruse; Edward O List; Darlene E Berryman; John J Kopchick
Journal:  J Transl Med       Date:  2019-05-22       Impact factor: 5.531

5.  Pathway Commons 2019 Update: integration, analysis and exploration of pathway data.

Authors:  Igor Rodchenkov; Ozgun Babur; Augustin Luna; Bulent Arman Aksoy; Jeffrey V Wong; Dylan Fong; Max Franz; Metin Can Siper; Manfred Cheung; Michael Wrana; Harsh Mistry; Logan Mosier; Jonah Dlin; Qizhi Wen; Caitlin O'Callaghan; Wanxin Li; Geoffrey Elder; Peter T Smith; Christian Dallago; Ethan Cerami; Benjamin Gross; Ugur Dogrusoz; Emek Demir; Gary D Bader; Chris Sander
Journal:  Nucleic Acids Res       Date:  2020-01-08       Impact factor: 16.971

6.  IID 2021: towards context-specific protein interaction analyses by increased coverage, enhanced annotation and enrichment analysis.

Authors:  Max Kotlyar; Chiara Pastrello; Zuhaib Ahmed; Justin Chee; Zofia Varyova; Igor Jurisica
Journal:  Nucleic Acids Res       Date:  2022-01-07       Impact factor: 16.971

7.  Identification of phenotype-specific networks from paired gene expression-cell shape imaging data.

Authors:  Charlie George Barker; Eirini Petsalaki; Girolamo Giudice; Julia Sero; Emmanuel Nsa Ekpenyong; Chris Bakal; Evangelia Petsalaki
Journal:  Genome Res       Date:  2022-02-23       Impact factor: 9.438

8.  PyBioPAX: biological pathway exchange in Python.

Authors:  Benjamin M Gyori; Charles Tapley Hoyt
Journal:  J Open Source Softw       Date:  2022-03-11

Review 9.  The metaRbolomics Toolbox in Bioconductor and beyond.

Authors:  Jan Stanstrup; Corey D Broeckling; Rick Helmus; Nils Hoffmann; Ewy Mathé; Thomas Naake; Luca Nicolotti; Kristian Peters; Johannes Rainer; Reza M Salek; Tobias Schulze; Emma L Schymanski; Michael A Stravs; Etienne A Thévenot; Hendrik Treutler; Ralf J M Weber; Egon Willighagen; Michael Witting; Steffen Neumann
Journal:  Metabolites       Date:  2019-09-23

10.  Causal interactions from proteomic profiles: Molecular data meet pathway knowledge.

Authors:  Özgün Babur; Augustin Luna; Anil Korkut; Funda Durupinar; Metin Can Siper; Ugur Dogrusoz; Alvaro Sebastian Vaca Jacome; Ryan Peckner; Karen E Christianson; Jacob D Jaffe; Paul T Spellman; Joseph E Aslan; Chris Sander; Emek Demir
Journal:  Patterns (N Y)       Date:  2021-05-12
  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.