Literature DB >> 29048466

PyBEL: a computational framework for Biological Expression Language.

Charles Tapley Hoyt1,2, Andrej Konotopez1, Christian Ebeling1, Jonathan Wren.   

Abstract

Summary: Biological Expression Language (BEL) assembles knowledge networks from biological relations across multiple modes and scales. Here, we present PyBEL; a software package for parsing, validating, converting, storing, querying, and visualizing networks encoded in BEL. Availability and implementation: PyBEL is implemented in platform-independent, universal Python code. Its source is distributed under the Apache 2.0 License at https://github.com/pybel. Contact: charles.hoyt@scai.fraunhofer.de. Supplementary information: Supplementary data are available at Bioinformatics online.
© The Author(s) 2017. Published by Oxford University Press.

Entities:  

Mesh:

Year:  2018        PMID: 29048466      PMCID: PMC5860616          DOI: 10.1093/bioinformatics/btx660

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

Among the most popular modeling and data exchange languages in systems biology are currently the Biological Pathways Exchange (BioPAX), Systems Biology Markup Language (SBML) and Biological Expression Language (BEL). BioPAX captures metabolic, signaling, molecular, gene-regulatory, and genetic interaction networks (Hucka ); SBML accommodates mathematical models of biochemical networks, cellular signaling, and metabolic pathways (Demir ); and BEL assembles qualitative causal and correlative relations between biological entities across multiple modes and scales, with full provenance information including namespace references, relation provenance (citation and evidence), and biological context-specific relation metadata (anatomy, cell, disease etc.) (Slater, 2014). Although there exist several software packages for BioPAX and SBML, the ecosystem of open-source software for BEL is much more limited. An assessment of previous software (see Supplementary Table S3) shows there is an unmet need for easily installable, stable, facile software that parses modern BEL and provides programmatic access to a data container that enables the resulting network to be extended, queried, manipulated, analyzed, and visualized. Furthermore, a converter between common data formats is needed to enable re-usability and interoperability between general and BEL-specific software for network analysis and visualization. Here, we present PyBEL; a software package designed to fulfill each of these needs.

2 Software architecture

The PyBEL software package consists of five main components: (i) network data container, (ii) parser and validator, (iii) network database manager, (iv) data converter and (v) network visualizer. Although a graph refers to an abstraction for a set of objects (i.e. nodes) and their relations (i.e. edges), its instantiation in a real-world application is often called a network. We provide an implementation of a directed multigraph (i.e. a graph whose edges have directionality and any given pair of nodes may have multiple edges) that maps the biological entities and concepts in the subjects and objects of BEL relations to nodes in a network and their relations, with corresponding metadata, to edges. We extended the MultiDiGraph class from NetworkX (http://networkx.github.io) to enable users direct access to their suite of network algorithms and static visualizations to support their further development into biologically meaningful analyses. The parser performs tokenization, lexical analysis, parsing, and validation on each of the three sections of BEL documents (see Supplementary Figs S1 and S2). Callbacks are used to annotate the entries in the document metadata section to a network instance, download and store the resources referenced in the definitions section, maintain a list of current annotations from SET statements, and parse BEL relations to populate a network instance with the corresponding nodes, edges, and their metadata from the current internal state. Although relations’ syntax is implicitly validated, the semantics of their subjects’ and objects’ identifiers are validated against the references from the definitions section. Finally, feedback is provided to users to support thoughtful re-curation, which could lead to more robust knowledge assemblies and enable more reproducible science. Namespaces and networks are cached with a relational database to improve the speed of validation and access to data. Although relational databases lack the faculty for applying network algorithms, they provide indexing functionality that enables complicated queries and filters over the nodes, edges, and metadata of increasingly large collections of networks. For example, this could help identify intersections and potential cross-talk between disease-specific networks. We implemented lossless converters for common file formats including Node-Link JSON, JGIF, CX, and binary as well as for database formats including SQL, Neo4J, and NDEx. We also provide lossy exporters to Excel, CSV, SIF, XGMML, and GSEA to facilitate usage in other programs. Notably, we have deferred implementing a RDF (Resource Description Framework) converter until improvements are made to the existing BEL to RDF mapping and its documentation (https://wiki.openbel.org). Future work will also include converters for BioPAX and SBML. See Supplementary Tables S1 and S2 for more detailed descriptions of each format. Networks can be exported for visualization in Cytoscape or uploaded to NDEx (Pratt ) to take advantage of its viewer and simple query interface. Alternatively, we provide an interactive network explorer tailored to BEL networks (appropriate node coloring, metadata pop-ups etc.) that can be directly embedded as HTML in email, Jupyter Notebook, or a web application. It has already been used to produce visualizations in the NeuroMMSig Web Service (Domingo-Fernández ). Supplementary Figures S3–S5 present these visualizations side-by-side. In addition to their programmatic interfaces, the parser, storage, conversion, and visualization features are exposed via a command line tool.

3 Case study

The PyBEL suite includes functions for querying and mutating networks with which it implements state-of-the-art algorithms for over-representation analysis, functional class scoring, and pathway topological analysis of BEL networks such as Reverse Causal Reasoning (Catlett ). Figure 1 presents a case study in which a novel heat diffusion work flow was used to assess the observed impact on biological processes from differential gene expression in Alzheimer’s disease (AD). Technical documentation is included in the Supplementary Material.
Fig. 1

Plotted is the distribution of the final heat on biological processes from the NeuroMMSig AD Knowledge Assembly (Domingo-Fernández ) following heat diffusion analysis with a differential gene expression experiment from the brains of AD patients (E-GEOD-5281, Liang ). The significant down-regulation of biological processes related to inflammatory response (heat = 69) and up-regulation of cellular death (heat = −13) and beta-amyloid formation (heat = −9) match common clinical observations and serve as a validation for this approach

Plotted is the distribution of the final heat on biological processes from the NeuroMMSig AD Knowledge Assembly (Domingo-Fernández ) following heat diffusion analysis with a differential gene expression experiment from the brains of AD patients (E-GEOD-5281, Liang ). The significant down-regulation of biological processes related to inflammatory response (heat = 69) and up-regulation of cellular death (heat = −13) and beta-amyloid formation (heat = −9) match common clinical observations and serve as a validation for this approach

4 Discussion

Even after its v2.0 update, BEL does not yet explicitly specify many concepts in molecular biology such as epigenetic information (Irin ). The inevitability of language evolution prompted us to develop the parser in modules so that new syntax could be proposed and implemented quickly. As a proof of concept, a syntax extension for gene modifications is included in the package by default. Historically, BEL has used a custom namespace file format, but the creation and maintenance of biological terminologies has tended towards using OWL (Web Ontology Language). Furthermore, many domains (e.g. SNPs) are growing too large to enumerate during semantic integration and validation. The modular architecture of the parser enables easy implementation of new definition file formats, external validation services, or even alternative namespace definition schemes to address these issues. Although BEL is often used to formalize knowledge curated from unstructured sources, our software also enables the integration of knowledge from structured sources. For example, existing solutions for resolving equivalences across namespaces rely on the creation and hosting of extensive lookup tables. Alternatively, the parser could be extended with a dedicated syntax and draw equivalencies directly from OWL. Finally, we plan to present this software as a web service to enable a wider audience of researchers across disciplines to validate, explore, and analyze their BEL networks. Click here for additional data file.
  8 in total

1.  The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models.

Authors:  M Hucka; A Finney; H M Sauro; H Bolouri; J C Doyle; H Kitano; A P Arkin; B J Bornstein; D Bray; A Cornish-Bowden; A A Cuellar; S Dronov; E D Gilles; M Ginkel; V Gor; I I Goryanin; W J Hedley; T C Hodgman; J-H Hofmeyr; P J Hunter; N S Juty; J L Kasberger; A Kremling; U Kummer; N Le Novère; L M Loew; D Lucio; P Mendes; E Minch; E D Mjolsness; Y Nakayama; M R Nelson; P F Nielsen; T Sakurada; J C Schaff; B E Shapiro; T S Shimizu; H D Spence; J Stelling; K Takahashi; M Tomita; J Wagner; J Wang
Journal:  Bioinformatics       Date:  2003-03-01       Impact factor: 6.937

Review 2.  Recent advances in modeling languages for pathway maps and computable biological networks.

Authors:  Ted Slater
Journal:  Drug Discov Today       Date:  2014-01-17       Impact factor: 7.851

3.  Gene expression profiles in anatomically and functionally distinct regions of the normal aged human brain.

Authors:  Winnie S Liang; Travis Dunckley; Thomas G Beach; Andrew Grover; Diego Mastroeni; Douglas G Walker; Richard J Caselli; Walter A Kukull; Daniel McKeel; John C Morris; Christine Hulette; Donald Schmechel; Gene E Alexander; Eric M Reiman; Joseph Rogers; Dietrich A Stephan
Journal:  Physiol Genomics       Date:  2006-10-31       Impact factor: 3.107

4.  NDEx, the Network Data Exchange.

Authors:  Dexter Pratt; Jing Chen; David Welker; Ricardo Rivas; Rudolf Pillich; Vladimir Rynkov; Keiichiro Ono; Carol Miello; Lyndon Hicks; Sandor Szalma; Aleksandar Stojmirovic; Radu Dobrin; Michael Braxenthaler; Jan Kuentzer; Barry Demchak; Trey Ideker
Journal:  Cell Syst       Date:  2015-10-28       Impact factor: 10.304

5.  The BioPAX community standard for pathway data sharing.

Authors:  Emek Demir; Michael P Cary; Suzanne Paley; Ken Fukuda; Christian Lemer; Imre Vastrik; Guanming Wu; Peter D'Eustachio; Carl Schaefer; Joanne Luciano; Frank Schacherer; Irma Martinez-Flores; Zhenjun Hu; Veronica Jimenez-Jacinto; Geeta Joshi-Tope; Kumaran Kandasamy; Alejandra C Lopez-Fuentes; Huaiyu Mi; Elgar Pichler; Igor Rodchenkov; Andrea Splendiani; Sasha Tkachev; Jeremy Zucker; Gopal Gopinath; Harsha Rajasimha; Ranjani Ramakrishnan; Imran Shah; Mustafa Syed; Nadia Anwar; Ozgün Babur; Michael Blinov; Erik Brauner; Dan Corwin; Sylva Donaldson; Frank Gibbons; Robert Goldberg; Peter Hornbeck; Augustin Luna; Peter Murray-Rust; Eric Neumann; Oliver Ruebenacker; Oliver Reubenacker; Matthias Samwald; Martijn van Iersel; Sarala Wimalaratne; Keith Allen; Burk Braun; Michelle Whirl-Carrillo; Kei-Hoi Cheung; Kam Dahlquist; Andrew Finney; Marc Gillespie; Elizabeth Glass; Li Gong; Robin Haw; Michael Honig; Olivier Hubaut; David Kane; Shiva Krupa; Martina Kutmon; Julie Leonard; Debbie Marks; David Merberg; Victoria Petri; Alex Pico; Dean Ravenscroft; Liya Ren; Nigam Shah; Margot Sunshine; Rebecca Tang; Ryan Whaley; Stan Letovksy; Kenneth H Buetow; Andrey Rzhetsky; Vincent Schachter; Bruno S Sobral; Ugur Dogrusoz; Shannon McWeeney; Mirit Aladjem; Ewan Birney; Julio Collado-Vides; Susumu Goto; Michael Hucka; Nicolas Le Novère; Natalia Maltsev; Akhilesh Pandey; Paul Thomas; Edgar Wingender; Peter D Karp; Chris Sander; Gary D Bader
Journal:  Nat Biotechnol       Date:  2010-09-09       Impact factor: 54.908

6.  Reverse causal reasoning: applying qualitative causal knowledge to the interpretation of high-throughput data.

Authors:  Natalie L Catlett; Anthony J Bargnesi; Stephen Ungerer; Toby Seagaran; William Ladd; Keith O Elliston; Dexter Pratt
Journal:  BMC Bioinformatics       Date:  2013-11-23       Impact factor: 3.169

Review 7.  Computational Modelling Approaches on Epigenetic Factors in Neurodegenerative and Autoimmune Diseases and Their Mechanistic Analysis.

Authors:  Afroza Khanam Irin; Alpha Tom Kodamullil; Michaela Gündel; Martin Hofmann-Apitius
Journal:  J Immunol Res       Date:  2015-11-09       Impact factor: 4.818

8.  Multimodal mechanistic signatures for neurodegenerative diseases (NeuroMMSig): a web server for mechanism enrichment.

Authors:  Daniel Domingo-Fernández; Alpha Tom Kodamullil; Anandhi Iyappan; Mufassra Naz; Mohammad Asif Emon; Tamara Raschka; Reagon Karki; Stephan Springstubbe; Christian Ebeling; Martin Hofmann-Apitius
Journal:  Bioinformatics       Date:  2017-11-15       Impact factor: 6.937

  8 in total
  17 in total

1.  Do-calculus enables estimation of causal effects in partially observed biomolecular pathways.

Authors:  Sara Mohammad-Taheri; Jeremy Zucker; Charles Tapley Hoyt; Karen Sachs; Vartika Tewari; Robert Ness; Olga Vitek
Journal:  Bioinformatics       Date:  2022-06-24       Impact factor: 6.931

2.  Linking COVID-19 and Heme-Driven Pathophysiologies: A Combined Computational-Experimental Approach.

Authors:  Marie-Thérèse Hopp; Daniel Domingo-Fernández; Yojana Gadiya; Milena S Detzel; Regina Graf; Benjamin F Schmalohr; Alpha T Kodamullil; Diana Imhof; Martin Hofmann-Apitius
Journal:  Biomolecules       Date:  2021-04-27

3.  Integrated intra- and intercellular signaling knowledge for multicellular omics analysis.

Authors:  Dénes Türei; Alberto Valdeolivas; Lejla Gul; Nicolàs Palacio-Escat; Michal Klein; Olga Ivanova; Márton Ölbei; Attila Gábor; Fabian Theis; Dezső Módos; Tamás Korcsmáros; Julio Saez-Rodriguez
Journal:  Mol Syst Biol       Date:  2021-03       Impact factor: 11.429

4.  The status of causality in biological databases: data resources and data retrieval possibilities to support logical modeling.

Authors:  Vasundra Touré; Åsmund Flobak; Anna Niarakis; Steven Vercruysse; Martin Kuiper
Journal:  Brief Bioinform       Date:  2021-07-20       Impact factor: 11.622

5.  A systematic approach for identifying shared mechanisms in epilepsy and its comorbidities.

Authors:  Charles Tapley Hoyt; Daniel Domingo-Fernández; Nora Balzer; Anka Güldenpfennig; Martin Hofmann-Apitius
Journal:  Database (Oxford)       Date:  2018-01-01       Impact factor: 3.451

6.  BEL2ABM: agent-based simulation of static models in Biological Expression Language.

Authors:  Michaela Gündel; Charles Tapley Hoyt; Martin Hofmann-Apitius
Journal:  Bioinformatics       Date:  2018-07-01       Impact factor: 6.937

7.  BEL Commons: an environment for exploration and analysis of networks encoded in Biological Expression Language.

Authors:  Charles Tapley Hoyt; Daniel Domingo-Fernández; Martin Hofmann-Apitius
Journal:  Database (Oxford)       Date:  2018-01-01       Impact factor: 3.451

8.  Re-curation and rational enrichment of knowledge graphs in Biological Expression Language.

Authors:  Charles Tapley Hoyt; Daniel Domingo-Fernández; Rana Aldisi; Lingling Xu; Kristian Kolpeja; Sandra Spalek; Esther Wollert; John Bachman; Benjamin M Gyori; Patrick Greene; Martin Hofmann-Apitius
Journal:  Database (Oxford)       Date:  2019-01-01       Impact factor: 3.451

9.  The Impact of Pathway Database Choice on Statistical Enrichment Analysis and Predictive Modeling.

Authors:  Sarah Mubeen; Charles Tapley Hoyt; André Gemünd; Martin Hofmann-Apitius; Holger Fröhlich; Daniel Domingo-Fernández
Journal:  Front Genet       Date:  2019-11-22       Impact factor: 4.599

10.  Data-Driven Modeling of Knowledge Assemblies in Understanding Comorbidity Between Type 2 Diabetes Mellitus and Alzheimer's Disease.

Authors:  Reagon Karki; Sumit Madan; Yojana Gadiya; Daniel Domingo-Fernández; Alpha Tom Kodamullil; Martin Hofmann-Apitius
Journal:  J Alzheimers Dis       Date:  2020       Impact factor: 4.472

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.