Literature DB >> 34128961

PAX2GRAPHML: a Python library for large-scale regulation network analysis using BIOPAX.

François Moreews1,2, Hugo Simon1, Anne Siegel1, Florence Gondret2, Emmanuelle Becker1.   

Abstract

SUMMARY: PAX2GRAPHML is an open source Python library that allows to easily manipulate BioPAX source files as regulated reaction graphs described in .graphml format. The concept of regulated reactions, which allows connecting regulatory, signaling and metabolic levels, has been used. Biochemical reactions and regulatory interactions are homogeneously described by regulated reactions involving substrates, products, activators and inhibitors as elements. PAX2GRAPHML is highly flexible and allows generating graphs of regulated reactions from a single BioPAX source or by combining and filtering BioPAX sources. Supported by the graph exchange format .graphml, the large-scale graphs produced from one or more data sources can be further analyzed with PAX2GRAPHML or standard Python and R graph libraries.
AVAILABILITY AND IMPLEMENTATION: https://pax2graphml.genouest.org.
© The Author(s) (2021). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

Entities:  

Year:  2021        PMID: 34128961      PMCID: PMC8665752          DOI: 10.1093/bioinformatics/btab441

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

BioPAX is a standard format encoding biological processes like gene regulation, metabolic pathways or signaling events, that facilitates the inter-operability between data sources and network analysis tools. However, this rich knowledge-oriented data format that finely captures the complexity of biological networks cannot be easily handled without appropriated tools. Software have been recently proposed to design, visualize (Babur ; Shannon ), parse (Turei ), validate (Rodchenkov ), query (Babur ) and analyze BioPAX files. However, an important missing feature to analyze BioPAX data sources is the ability to interpret BioPAX files into graph structures including the role of physical entities as substrate, product or regulator in the reactions. An accurate format for representing the variety and complexity of the biological reactions is the concept of regulated reactions connecting regulatory, signaling and metabolic levels (Blavy ). In this conceptual framework, both biochemical reactions and regulatory interactions are described homogeneously as regulated reactions involving substrates, products, activators, inhibitors and modulators as key elements. In the reaction graph generated from regulated reactions, the molecules and the reactions are represented as typed nodes, as shown in Figure 1.
Fig. 1.

Example of reaction graph manipulated by PAX2GRAPHML showing reactions and entities as nodes

Example of reaction graph manipulated by PAX2GRAPHML showing reactions and entities as nodes Thus, we propose to extend the BioPAX toolbox with a Python library able to interpret BioPAX files as graphs of regulated reactions. With PAX2GRAPHML, the graphs are represented in the .graphml format, allowing the manipulation of nodes and edges properties. The PAX2GRAPHML tool also enables extracting sub-graphs, by filtering the original files according to specific properties of the nodes (genes or proteins) or by merging different graphs. It also implements basic methods to explore the graphs. Thanks to the .graphml exchange format support, generated graphs can be further analyzed with already existing graph libraries in Python or R.

2 Format and package description

PAX2GRAPHML is able to process all BioPAX files to generate regulated reaction graphs, which can be further interpreted into positive and negative oriented influences. It is available on pypi and as a docker image. In PAX2GRAPHML, PaxTools (Babur ) is used internally to extract sub-classes of patterns and further interpret them as regulated reactions. These extracted patterns form the building elements of a regulated reaction graph (Blavy ). Each regulated reaction graph pattern is centered on a reaction node linked to one or several substrate nodes and product nodes. The reaction node can also be linked to modulator nodes (activators or inhibitors). Substrates and modulators are inputs of the reaction node, whereas products are outputs of the reaction node. All nodes (reaction, substrate, product or modulator) are associated with their own metadata in the graph. PAX2GRAPHML is composed of four sub-packages. (i) The sub-package pax_import is dedicated to global or parametrized import of BioPAX files from Pathway Commons (PC) to be further interpreted as regulated reaction graph. (ii) The sub-package properties allow to manipulate nodes and edges properties of the generated graphs. All aliases contained in BioPAX have been incorporated in the .graphml format as node properties to represent genes, protein and compounds. Additional annotations can also be directly imported from specific files. (iii) The sub-package extract allows modifying either the generated reaction graph or the influence graph, including sub-graphs selection or graphs merging. (iv) The sub-package graph_explore includes IO functions and analysis of the generated graphs. It also includes classical graph metrics (degree, betweenness, closeness, connected components) as preliminary steps. More sophisticated analyses can be further performed with graph-tool or other advanced libraries (Csardi and Nepusz, 2006). The PAX2GRAPHML website provides a complete documentation and the pre-processed database resources. Regulated reaction graphs and influence graphs produced from 16 data sources of PC can be downloaded as ready-to-use data for further analyses with PAX2GRAPHML. Files are automatically updated using databanks synchronization and a processing software (Filangi ).

3 Application

PAX2GRAPHML was first applied to the complete PC databank. The regulated reaction graph produced in.graphml format has a size of 363 MB (13% of the initial BioPAX file size). PAX2GRAPHML was also applied to each data source of PC considered independently. As shown in Table 1, the regulated reaction concept used to unify the different BioPAX reaction types facilitates the comparison of the content of each resource. Notably, this revealed that Mirtarbase and CTD are the main contributors of PC in terms of nodes, edges, and especially inhibition reactions.
Table 1.

BioPAX files transformation of datasources available in PC into regulated reaction graphs with PAX2GRAPHML

Data sourcesNodesReaction nodesEntity nodesEdgesSubstrate ofProduct ofActivator ofInhibitor of
PC* all sources175 26285 75089 512639 94584 49695 74352 009407 697
CTD44 63919 81424 82598 99318 53819 07335 07726 305
HumanCyc57331778395511 8903875445535600
INOH43152188212774094247316200
Intact complex218756316242869230656300
KEGG**31331560157370413488355300
Mirtarbase32 72715 06417 663395 703015 0640380 639
Panther37661662210452632914216514242
PID94034495490814 827661345443233437
Reactome31 71811 40420 31446 54127 21015 3583717256
Reconx69562821413520 4857722768950740
PC* all sources except CTD and Mirtarbase119 28555 18464 101167 51583 78266 05616 9397 38
PID and HumanCyc12 5614518804316 085357558596221430
PID and HumanCyc and KEGG15 5646079948519 651533276526237430
PID and HumanCyc and KEGG and Reactome38 58510 39828 18743 89916 08717 3719756685

Note: Single datasources transformations were performed with the sub-package pax_import. Combination of several datasources was performed by filtering the PC* all sources graphml file with the sub-package extract. Nodes are either reactions or entities (proteins, small molecules, etc.). The numbers of regulated reactions computed by PAX2GRAPHML, together with the number of substrates, products, activators, inhibitors, are indicated. All these graphs can be directly downloaded from the PAX2GRAPHML website.

‘PC' version 12, September 2019.

KEGG, July 2011 (only human, hsa* files).

BioPAX files transformation of datasources available in PC into regulated reaction graphs with PAX2GRAPHML Note: Single datasources transformations were performed with the sub-package pax_import. Combination of several datasources was performed by filtering the PC* all sources graphml file with the sub-package extract. Nodes are either reactions or entities (proteins, small molecules, etc.). The numbers of regulated reactions computed by PAX2GRAPHML, together with the number of substrates, products, activators, inhibitors, are indicated. All these graphs can be directly downloaded from the PAX2GRAPHML website. ‘PC' version 12, September 2019. KEGG, July 2011 (only human, hsa* files). Generating the regulated reaction graph from 16 BioPAX datasources with PAX2GRAPHML lasted 7 days on a virtual machine with 48 G RAM. Conveniently, the generated files can be downloaded on PAX2GRAPHML website as ready-to-use data resources, which is automatically updated. Customized graphs can be produced for any subsets of the databases. To achieve this, users can either filter the overall regulated reaction graph, or can merge the regulated reaction graphs produced from two or more databases selected according to their specific interest. The two functionalities (filtering and merging) are available within the PAX2GRAPHML package. As an illustration, Table 1 shows that filtering out CTD and Mirtarbase from PC eliminates 32% of the nodes (36% of reaction nodes and 28% of entity nodes) and 74% of the edges. Table 1 also illustrates that the combination of PID with successively HumanCyc, KEGG and Reactome improves coverage of both reaction nodes (from 4495 to 10 398) and entities (from 4908 to 28 187). By managing BioPAX data extraction into regulated graphs, PAX2GRAPHML simplifies the implementation of many methods for regulation network analysis and understanding of the controlling steps of the biological pathways. Financial Support: none declared. Conflict of Interest: none declared.
  7 in total

1.  Cytoscape: a software environment for integrated models of biomolecular interaction networks.

Authors:  Paul Shannon; Andrew Markiel; Owen Ozier; Nitin S Baliga; Jonathan T Wang; Daniel Ramage; Nada Amin; Benno Schwikowski; Trey Ideker
Journal:  Genome Res       Date:  2003-11       Impact factor: 9.043

2.  ChiBE: interactive visualization and manipulation of BioPAX pathway models.

Authors:  Ozgun Babur; Ugur Dogrusoz; Emek Demir; Chris Sander
Journal:  Bioinformatics       Date:  2009-12-09       Impact factor: 6.937

3.  OmniPath: guidelines and gateway for literature-curated signaling pathway resources.

Authors:  Dénes Türei; Tamás Korcsmáros; Julio Saez-Rodriguez
Journal:  Nat Methods       Date:  2016-11-29       Impact factor: 28.547

4.  Using a large-scale knowledge database on reactions and regulations to propose key upstream regulators of various sets of molecules participating in cell metabolism.

Authors:  Pierre Blavy; Florence Gondret; Sandrine Lagarrigue; Jaap van Milgen; Anne Siegel
Journal:  BMC Syst Biol       Date:  2014-03-17

5.  BioMAJ: a flexible framework for databanks synchronization and processing.

Authors:  Olivier Filangi; Yoann Beausse; Anthony Assi; Ludovic Legrand; Jean-Marc Larré; Véronique Martin; Olivier Collin; Christophe Caron; Hugues Leroy; David Allouche
Journal:  Bioinformatics       Date:  2008-06-30       Impact factor: 6.937

6.  The BioPAX Validator.

Authors:  Igor Rodchenkov; Emek Demir; Chris Sander; Gary D Bader
Journal:  Bioinformatics       Date:  2013-08-05       Impact factor: 6.937

7.  Pattern search in BioPAX models.

Authors:  Özgün Babur; Bülent Arman Aksoy; Igor Rodchenkov; Selçuk Onur Sümer; Chris Sander; Emek Demir
Journal:  Bioinformatics       Date:  2013-09-16       Impact factor: 6.937

  7 in total
  1 in total

1.  Discrete modeling for integration and analysis of large-scale signaling networks.

Authors:  Pierre Vignet; Jean Coquet; Sébastien Auber; Matéo Boudet; Anne Siegel; Nathalie Théret
Journal:  PLoS Comput Biol       Date:  2022-06-13       Impact factor: 4.779

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.