Literature DB >> 36071952

PyBioPAX: biological pathway exchange in Python.

Benjamin M Gyori1, Charles Tapley Hoyt1.   

Abstract

Entities:  

Year:  2022        PMID: 36071952      PMCID: PMC9447860          DOI: 10.21105/joss.04136

Source DB:  PubMed          Journal:  J Open Source Softw        ISSN: 2475-9066


× No keyword cloud information.

Statement of need

Understanding the complex molecular processes governing how cells respond to external stimuli crucially relies on prior knowledge about signaling, regulatory, and metabolic pathways. Standardized representations are necessary to exchange such pathway knowledge and allow interoperability between tools. BioPAX (Demir et al., 2010) is a widely used pathway exchange format that is formally defined in the BioPAX Language Specification. BioPAX is serialized into the Web Ontology Language (OWL) format, typically as RDF/XML. Software support for parsing, serializing, and finding patterns in BioPAX models is implemented in the Paxtools Java package (Demir et al., 2013). However, interacting with Paxtools is difficult from Python, and requires running a Java Virtual Machine via cross-language frameworks such as pyjnius (Kivy, 2021). Therefore, there is a need for native Python software support for BioPAX to facilitate integration with widely used systems biology tools (e.g., PySB (Lopez et al., 2013), Tellurium (Medley et al., 2018), PyBEL (Hoyt et al., 2018)), and pathway analysis workflows more generally.

State of the field

Support for the BioPAX language is implemented in the Paxtools Java package (Demir et al., 2013) and a wrapper extension around it called PaxtoolsR enabling its usage from an R environment (Luna et al., 2016). A graphical tool for the visualization of BioPAX called ChiBE (Babur et al., 2010) is also available as a Java package. There also exist dedicated analysis packages for pathway enrichment such as the BioPAX-Parser Java package (Agapito et al., 2020) and several tools solving the conversion of BioPAX representations into modeling formalisms such as the BioASF Java package (Haydarlou et al., 2016). Overall, however, there is no Python library implementing the BioPAX object model, and enabling the manipulation and analysis of BioPAX models.

Summary

We present PyBioPAX, a Python software package to process and manipulate BioPAX models. PyBioPAX implements the BioPAX Level 3 object model as a set of Python classes, and implements a BioPAX OWL processor to deserialize BioPAX content from OWL files or strings into these objects. Once a BioPAX model and all its linked elements are deserialized into Python objects, they can be traversed and modified in memory. PyBioPAX supports serialization of BioPAX models into OWL/XML files compatible with other tools in the BioPAX ecosystem. PyBioPAX implements the BioPAX OWL semantics where object attributes can be subtyped (e.g., “display name” is a subtype of “name”) using Python property attributes and getter/setter functions. It also supports exposing “inverse links” between objects; for example, a BioPAX Xref object, which represents a cross-reference, exposes a list of xref_of links back to the objects of which it is a cross-reference. Again, the coherence of these links at the level of a BioPAX model is guaranteed through the use of Python property attributes. The inverse links contribute to the efficient traversal of BioPAX models by allowing to link from e.g., one participant of a reaction to the reaction itself and its other participants. To facilitate model traversal, PyBioPAX provides a module to iterate over linked objects that satisfy a path constraint string specification from a given starting object. PyBioPAX also provides a client to the Pathway Commons web service (Rodchenkov et al., 2020) that makes three different graph query types available: paths-from-to, paths-between, and neighborhood to extract subsets of knowledge aggregated from structured sources in Pathway Commons (e.g., Reactome (Jassal et al., 2020)) as BioPAX models. PyBioPAX further provides web service clients for processing BioPAX content from other pathway databases including NetPath (Kandasamy et al., 2010), and multiple members of the BioCyc database collection (Karp et al., 2019).

Case studies

In the following case studies, we demonstrate the role of PyBioPAX in qualitative and quantitative analyses driven by BioPAX models.

Traversing Pathway Commons

We demonstrate using PyBioPAX to process the Pathway Commons version 12 (PC12) “detailed” model BioPAX OWL file, to traverse it, and then to extract several biologically motivated motifs corresponding to the following questions: Which controllers of the catalyses of biochemical reactions require a co-factor? Which controllers of the catalyses of biochemical reactions are in a phosphorylated state? Which biochemical reactions constitute a simple phosphorylation event? Which complexes contain a protein bound to one or more small molecules? What are all the features (e.g., post-translational modifications, fragments) of a given protein? Our implementations of these queries in the corresponding Jupyter notebook identified nearly 4M objects in PC12, 83 controllers that need co-factors, 1,283 controllers that are in a phosphorylated state, 15,332 simple phosphorylation reactions, 13,338 proteins bound to a single small molecule, and 184 proteins bound to two more small molecules. Additionally, PyBioPAX enabled us to write queries to find superlative entities. For instance, we found that the protein with the most modifications was NOTCH1, with 38 modifications. We further found that the RNA transcript of KTN1 had the most interactions (947), and AR had the most interactions of any protein (106).

Gene set enrichment on Reactome pathways

Expert-curated pathways have been used as a means of dimensionality reduction and interpretation of transcriptomics data. However, most prior methods are limited to using pre-defined pathway lists (e.g., (Emon et al., 2020) only includes KEGG pathways). Here, we demonstrate using PyBioPAX to implement a similar workflow that is generally applicable to any pathway definition originating from BioPAX content, represented as PyBioPAX models. First, we obtained all human pathways as PyBioPAX models through PyBioPAX’s API for the Reactome web service. We then traversed each model to identify physical entities representing proteins, aggregate their cross-references, and ultimately construct a list of HGNC gene identifiers for each pathway. Second, we collected curated transcriptomics experiments from the CREEDS database (Wang et al., 2016) that list the differentially expressed (DE) genes resulting from select drug perturbations, gene knockouts, gene overexpressions, and diseases. Finally, we used Fisher’s exact test in an all-by-all comparison of the lists of DE genes for each perturbation experiment against the lists of genes whose proteins are present in each pathway. From this matrix we identified anti-correlations between drug perturbation experiments and gene perturbation experiments via the Pearson correlation coefficient. For example, this highlighted a strong relationship between estradiol and GPER1, suggesting GPER1 activation as a mechanism of action for estradiol. The corresponding Jupyter notebook can be found here.

Availability and usage

PyBioPAX is available as a package on PyPI with the source code available at https://github.com/indralab/pybiopax and documentation available at https://pybiopax.readthedocs.io/. The repository also contains an interactive Jupyter notebook tutorial and notebooks for the two case studies described above. In addition to our case studies, PyBioPAX has been integrated into INDRA (Gyori et al., 2017) and serves as the primary entry point for processing BioPAX content into INDRA Statements through the traversal of a BioPAX model. It has also been used in (Weber et al., 2021) to process BioPAX content from Reactome into a node-edge graph used to train a machine-learning model used to improve natural language processing.
  16 in total

1.  The BioCyc collection of microbial genomes and metabolic pathways.

Authors:  Peter D Karp; Richard Billington; Ron Caspi; Carol A Fulcher; Mario Latendresse; Anamika Kothari; Ingrid M Keseler; Markus Krummenacker; Peter E Midford; Quang Ong; Wai Kit Ong; Suzanne M Paley; Pallavi Subhraveti
Journal:  Brief Bioinform       Date:  2019-07-19       Impact factor: 11.622

2.  BioPAX-Parser: parsing and enrichment analysis of BioPAX pathways.

Authors:  Giuseppe Agapito; Chiara Pastrello; Pietro Hiram Guzzi; Igor Jurisica; Mario Cannataro
Journal:  Bioinformatics       Date:  2020-08-01       Impact factor: 6.937

3.  The BioPAX community standard for pathway data sharing.

Authors:  Emek Demir; Michael P Cary; Suzanne Paley; Ken Fukuda; Christian Lemer; Imre Vastrik; Guanming Wu; Peter D'Eustachio; Carl Schaefer; Joanne Luciano; Frank Schacherer; Irma Martinez-Flores; Zhenjun Hu; Veronica Jimenez-Jacinto; Geeta Joshi-Tope; Kumaran Kandasamy; Alejandra C Lopez-Fuentes; Huaiyu Mi; Elgar Pichler; Igor Rodchenkov; Andrea Splendiani; Sasha Tkachev; Jeremy Zucker; Gopal Gopinath; Harsha Rajasimha; Ranjani Ramakrishnan; Imran Shah; Mustafa Syed; Nadia Anwar; Ozgün Babur; Michael Blinov; Erik Brauner; Dan Corwin; Sylva Donaldson; Frank Gibbons; Robert Goldberg; Peter Hornbeck; Augustin Luna; Peter Murray-Rust; Eric Neumann; Oliver Ruebenacker; Oliver Reubenacker; Matthias Samwald; Martijn van Iersel; Sarala Wimalaratne; Keith Allen; Burk Braun; Michelle Whirl-Carrillo; Kei-Hoi Cheung; Kam Dahlquist; Andrew Finney; Marc Gillespie; Elizabeth Glass; Li Gong; Robin Haw; Michael Honig; Olivier Hubaut; David Kane; Shiva Krupa; Martina Kutmon; Julie Leonard; Debbie Marks; David Merberg; Victoria Petri; Alex Pico; Dean Ravenscroft; Liya Ren; Nigam Shah; Margot Sunshine; Rebecca Tang; Ryan Whaley; Stan Letovksy; Kenneth H Buetow; Andrey Rzhetsky; Vincent Schachter; Bruno S Sobral; Ugur Dogrusoz; Shannon McWeeney; Mirit Aladjem; Ewan Birney; Julio Collado-Vides; Susumu Goto; Michael Hucka; Nicolas Le Novère; Natalia Maltsev; Akhilesh Pandey; Paul Thomas; Edgar Wingender; Peter D Karp; Chris Sander; Gary D Bader
Journal:  Nat Biotechnol       Date:  2010-09-09       Impact factor: 54.908

4.  BioASF: a framework for automatically generating executable pathway models specified in BioPAX.

Authors:  Reza Haydarlou; Annika Jacobsen; Nicola Bonzanni; K Anton Feenstra; Sanne Abeln; Jaap Heringa
Journal:  Bioinformatics       Date:  2016-06-15       Impact factor: 6.937

5.  From word models to executable models of signaling networks using automated assembly.

Authors:  Benjamin M Gyori; John A Bachman; Kartik Subramanian; Jeremy L Muhlich; Lucian Galescu; Peter K Sorger
Journal:  Mol Syst Biol       Date:  2017-11-24       Impact factor: 11.429

6.  Pathway Commons 2019 Update: integration, analysis and exploration of pathway data.

Authors:  Igor Rodchenkov; Ozgun Babur; Augustin Luna; Bulent Arman Aksoy; Jeffrey V Wong; Dylan Fong; Max Franz; Metin Can Siper; Manfred Cheung; Michael Wrana; Harsh Mistry; Logan Mosier; Jonah Dlin; Qizhi Wen; Caitlin O'Callaghan; Wanxin Li; Geoffrey Elder; Peter T Smith; Christian Dallago; Ethan Cerami; Benjamin Gross; Ugur Dogrusoz; Emek Demir; Gary D Bader; Chris Sander
Journal:  Nucleic Acids Res       Date:  2020-01-08       Impact factor: 16.971

7.  The reactome pathway knowledgebase.

Authors:  Bijay Jassal; Lisa Matthews; Guilherme Viteri; Chuqiao Gong; Pascual Lorente; Antonio Fabregat; Konstantinos Sidiropoulos; Justin Cook; Marc Gillespie; Robin Haw; Fred Loney; Bruce May; Marija Milacic; Karen Rothfels; Cristoffer Sevilla; Veronica Shamovsky; Solomon Shorser; Thawfeek Varusai; Joel Weiser; Guanming Wu; Lincoln Stein; Henning Hermjakob; Peter D'Eustachio
Journal:  Nucleic Acids Res       Date:  2020-01-08       Impact factor: 16.971

8.  PS4DR: a multimodal workflow for identification and prioritization of drugs based on pathway signatures.

Authors:  Mohammad Asif Emon; Daniel Domingo-Fernández; Charles Tapley Hoyt; Martin Hofmann-Apitius
Journal:  BMC Bioinformatics       Date:  2020-06-05       Impact factor: 3.169

9.  Using biological pathway data with paxtools.

Authors:  Emek Demir; Ozgün Babur; Igor Rodchenkov; Bülent Arman Aksoy; Ken I Fukuda; Benjamin Gross; Onur Selçuk Sümer; Gary D Bader; Chris Sander
Journal:  PLoS Comput Biol       Date:  2013-09-19       Impact factor: 4.475

10.  Extraction and analysis of signatures from the Gene Expression Omnibus by the crowd.

Authors:  Zichen Wang; Caroline D Monteiro; Kathleen M Jagodnik; Nicolas F Fernandez; Gregory W Gundersen; Andrew D Rouillard; Sherry L Jenkins; Axel S Feldmann; Kevin S Hu; Michael G McDermott; Qiaonan Duan; Neil R Clark; Matthew R Jones; Yan Kou; Troy Goff; Holly Woodland; Fabio M R Amaral; Gregory L Szeto; Oliver Fuchs; Sophia M Schüssler-Fiorenza Rose; Shvetank Sharma; Uwe Schwartz; Xabier Bengoetxea Bausela; Maciej Szymkiewicz; Vasileios Maroulis; Anton Salykin; Carolina M Barra; Candice D Kruth; Nicholas J Bongio; Vaibhav Mathur; Radmila D Todoric; Udi E Rubin; Apostolos Malatras; Carl T Fulp; John A Galindo; Ruta Motiejunaite; Christoph Jüschke; Philip C Dishuck; Katharina Lahl; Mohieddin Jafari; Sara Aibar; Apostolos Zaravinos; Linda H Steenhuizen; Lindsey R Allison; Pablo Gamallo; Fernando de Andres Segura; Tyler Dae Devlin; Vicente Pérez-García; Avi Ma'ayan
Journal:  Nat Commun       Date:  2016-09-26       Impact factor: 14.919

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.