Literature DB >> 27993779

Recon2Neo4j: applying graph database technologies for managing comprehensive genome-scale networks.

Irina Balaur1, Alexander Mazein1, Mansoor Saqi1, Artem Lysenko2, Christopher J Rawlings2, Charles Auffray1.   

Abstract

Summary: The goal of this work is to offer a computational framework for exploring data from the Recon2 human metabolic reconstruction model. Advanced user access features have been developed using the Neo4j graph database technology and this paper describes key features such as efficient management of the network data, examples of the network querying for addressing particular tasks, and how query results are converted back to the Systems Biology Markup Language (SBML) standard format. The Neo4j-based metabolic framework facilitates exploration of highly connected and comprehensive human metabolic data and identification of metabolic subnetworks of interest. A Java-based parser component has been developed to convert query results (available in the JSON format) into SBML and SIF formats in order to facilitate further results exploration, enhancement or network sharing. Availability and Implementation: The Neo4j-based metabolic framework is freely available from: https://diseaseknowledgebase.etriks.org/metabolic/browser/ . The java code files developed for this work are available from the following url: https://github.com/ibalaur/MetabolicFramework . Contact: ibalaur@eisbm.org. Supplementary information: Supplementary data are available at Bioinformatics online.
© The Author 2016. Published by Oxford University Press.

Entities:  

Mesh:

Year:  2017        PMID: 27993779      PMCID: PMC5408918          DOI: 10.1093/bioinformatics/btw731

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

Genome-scale consensus models are essential for further advances in Systems Biology and Systems Medicine. Recon2 (Thiele ) is the most up-to-date comprehensive community-driven reconstruction of the human metabolic network, with 7440 reactions, 2626 unique metabolites and 1789 proteins included. The Recon2 resource is structured in the Systems Biology Markup Language (SBML) standard format (Hucka ) and is publically available (Virtual Metabolic Human, https://vmh.uni.lu/). However, advanced exploration involving associations between multiple concepts (e.g. network neighborhood of particular metabolites, shortest pathways between specific metabolites, proteins and complexes) is challenging for models of the size and complexity of this extensive high quality reconstruction. This study demonstrates that advanced exploration of genome-scale metabolic reconstructions can benefit from an integrated graph representation of the model and associated data.

2 Methods

The Recon2 human metabolic reconstruction (in SBML format) was integrated into the Neo4j framework (https://neo4j.com/), which uses a graph database approach. The major concepts involved in the metabolic reactions (metabolites, proteins, complexes and metabolic reaction names) were represented as nodes in the graph database, while the relationships among them (e.g. consumption, production, catalysis) as connecting edges. In addition, the relationships between the compounds (nodes) and the complexes were represented by ‘part of’ edges. Information on the name, the SBO Term identifier and additional details (such as initial concentration, charge, metadata) were stored as attributes (properties) of the nodes. An SBML species was classified as a node of either a metabolite, a protein or a biological complex based on its SBO Term identifier in the Recon2 input file. For the proteins and biological complexes nodes, the UniProt identifier information was also stored as node attributes. When available, data related to biological compartments (including compartment name, meta id, SBO Term id, size, spatial dimensions) were also stored as attributes for every species node. For the metabolic reactions, information such as name, identifier, metadata, notes, the reversibility property, were stored as attributes of the Reaction nodes; for the consumption and production reactions, the stoichiometric relationships were also captured as edge properties. The Neo4j-based metabolic representation of Recon2 is composed of i) nodes: 5063 metabolites, 3567 proteins, 7440 metabolic reactions and 1168 complexes (with 590 protein compounds); and ii) relationships (edges): 15677 consumption, 15863 production, 9982 catalysis and 590 part-of relationships between complexes and their compounds. The data graph model of the Neo4j-based metabolic framework is given in Supplementary Figure S1 (Supplementary file S1). A parser component was developed to convert the query results from the Neo4j-based metabolic framework in the JavaScript Object Notation (JSON) format to the SBML standard format and the Standard Interchange Format (the SIF format), compatible with well-established environments for biological data management (e.g. Cytoscape (Smoot )) and network sharing (e.g. NDEx (Pratt )). Both the Neo4j-based metabolic framework and the parser component were developed mainly in Java using: the JSBML 1.0 library (Dräger ) for managing the SBML files (read and write data, check consistency of the SBML output), the Neo4j Java API to build the Neo4j-based resource and the JSON-simple 1.1.1 library to read information from the JSON files.

3 Results

The developments presented here focus on two major components: i) a Neo4j graph database for the human metabolism data and ii) a Java-based parser for translating the JSON representation of the Neo4j networks into the SBML and SIF formats. The major steps of the overall workflow are illustrated in Supplementary Figure S2 (Supplementary file S2) and are described briefly below. Firstly, the Neo4j-based metabolic framework facilitates exploration and visualization of the human metabolic network. As an example of exploring the newly developed resource using the Neo4j Cypher declarative language, a use-case was developed to identify pathways and subnetworks useful for understanding the metabolism of the arachidonic acid, a metabolite that plays a crucial role in inflammation processes. The metabolic network shown in Supplementary Figure S3 (Supplementary file S3) identifies metabolites and proteins three metabolic reaction steps away from the arachidonic acid (or, in terms of nodes in the graph, the figure illustrates the 6-steps neighborhood of the arachidonic acid node). The network from Supplementary Figure S3 excluded paths with highly connected promiscuous nodes (such as those representing the ‘proton’, ‘H2O’, ‘Sodium’), to avoid having all nodes interconnected. A list of examples of Cypher queries for the metabolic framework (including the query for Supplementary Fig. S3) is given in Supplementary file S4. Second, the user can import the Neo4j output file (the JSON format), which contains data on the metabolic subnetwork identified using a Cypher query (e.g. network in Supplementary Fig. S3), into the parser component and choose to export information into the SBML or SIF formats. The SBML output file can be visualized and managed using other tools (e.g. CellDesigner (Funahashi )) or used for further mathematical modeling development. The metabolic subnetwork obtained (the SIF file) can be also explored in Cytoscape or shared among the community through the NDEx platform (Pratt ). A visualization of the arachidonate subnetwork using CellDesigner and following manual intervention to improve readability is shown in Figure 1. The output SBML file corresponding to the network in Supplementary Figure S3 is given as Supplementary file S5.
Fig. 1.

Visualization using CellDesigner (Funahashi et al., 2008) for the arachidonic acid metabolic network identified based on Cypher query 1 (Supplementary file S4)

Visualization using CellDesigner (Funahashi et al., 2008) for the arachidonic acid metabolic network identified based on Cypher query 1 (Supplementary file S4) In summary, the developments reported here enable efficient exploration of a human metabolic model by envisioning particular metabolites together with their network neighborhood. Thus, a powerful feature of the Recon2Neo4j framework is facilitating querying and exploration of integrated metabolic data (via the Cypher language, as discussed above), which adds to the functionality provided by other systems biology software, such as cySBML (König ). Recon2Neo4j can be easily extended to process other input files available in the SBML standard format, due to the use of the JSBML library functionalities to manage the SBML files, and also to integrate new data types if these become available, due to the use of the graph database approach that presents schema free properties. (More detailed discussions on using the Neo4j environment for the management of biological and biomedical data can be found in e.g. (Lysenko )). As possible future development steps, it would be useful to add more information to the metabolic network, such as synonyms for metabolite names and tissue expression level for proteins (e.g. from the Human Protein Atlas (Uhlen )). Further work is being undertaken to use this newly developed Neo4j-based data integration framework to identify functional modules in disease-specific network reconstruction (e.g. Parkinson disease map, cancer specific disease map). Click here for additional data file.
  8 in total

1.  The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models.

Authors:  M Hucka; A Finney; H M Sauro; H Bolouri; J C Doyle; H Kitano; A P Arkin; B J Bornstein; D Bray; A Cornish-Bowden; A A Cuellar; S Dronov; E D Gilles; M Ginkel; V Gor; I I Goryanin; W J Hedley; T C Hodgman; J-H Hofmeyr; P J Hunter; N S Juty; J L Kasberger; A Kremling; U Kummer; N Le Novère; L M Loew; D Lucio; P Mendes; E Minch; E D Mjolsness; Y Nakayama; M R Nelson; P F Nielsen; T Sakurada; J C Schaff; B E Shapiro; T S Shimizu; H D Spence; J Stelling; K Takahashi; M Tomita; J Wagner; J Wang
Journal:  Bioinformatics       Date:  2003-03-01       Impact factor: 6.937

2.  CySBML: a Cytoscape plugin for SBML.

Authors:  Matthias König; Andreas Dräger; Hermann-Georg Holzhütter
Journal:  Bioinformatics       Date:  2012-07-05       Impact factor: 6.937

3.  Towards a knowledge-based Human Protein Atlas.

Authors:  Mathias Uhlen; Per Oksvold; Linn Fagerberg; Emma Lundberg; Kalle Jonasson; Mattias Forsberg; Martin Zwahlen; Caroline Kampf; Kenneth Wester; Sophia Hober; Henrik Wernerus; Lisa Björling; Fredrik Ponten
Journal:  Nat Biotechnol       Date:  2010-12       Impact factor: 54.908

4.  A community-driven global reconstruction of human metabolism.

Authors:  Ines Thiele; Neil Swainston; Ronan M T Fleming; Andreas Hoppe; Swagatika Sahoo; Maike K Aurich; Hulda Haraldsdottir; Monica L Mo; Ottar Rolfsson; Miranda D Stobbe; Stefan G Thorleifsson; Rasmus Agren; Christian Bölling; Sergio Bordel; Arvind K Chavali; Paul Dobson; Warwick B Dunn; Lukas Endler; David Hala; Michael Hucka; Duncan Hull; Daniel Jameson; Neema Jamshidi; Jon J Jonsson; Nick Juty; Sarah Keating; Intawat Nookaew; Nicolas Le Novère; Naglis Malys; Alexander Mazein; Jason A Papin; Nathan D Price; Evgeni Selkov; Martin I Sigurdsson; Evangelos Simeonidis; Nikolaus Sonnenschein; Kieran Smallbone; Anatoly Sorokin; Johannes H G M van Beek; Dieter Weichart; Igor Goryanin; Jens Nielsen; Hans V Westerhoff; Douglas B Kell; Pedro Mendes; Bernhard Ø Palsson
Journal:  Nat Biotechnol       Date:  2013-03-03       Impact factor: 54.908

5.  NDEx, the Network Data Exchange.

Authors:  Dexter Pratt; Jing Chen; David Welker; Ricardo Rivas; Rudolf Pillich; Vladimir Rynkov; Keiichiro Ono; Carol Miello; Lyndon Hicks; Sandor Szalma; Aleksandar Stojmirovic; Radu Dobrin; Michael Braxenthaler; Jan Kuentzer; Barry Demchak; Trey Ideker
Journal:  Cell Syst       Date:  2015-10-28       Impact factor: 10.304

6.  Cytoscape 2.8: new features for data integration and network visualization.

Authors:  Michael E Smoot; Keiichiro Ono; Johannes Ruscheinski; Peng-Liang Wang; Trey Ideker
Journal:  Bioinformatics       Date:  2010-12-12       Impact factor: 6.937

7.  JSBML: a flexible Java library for working with SBML.

Authors:  Andreas Dräger; Nicolas Rodriguez; Marine Dumousseau; Alexander Dörr; Clemens Wrzodek; Nicolas Le Novère; Andreas Zell; Michael Hucka
Journal:  Bioinformatics       Date:  2011-06-22       Impact factor: 6.937

Review 8.  Representing and querying disease networks using graph databases.

Authors:  Artem Lysenko; Irina A Roznovăţ; Mansoor Saqi; Alexander Mazein; Christopher J Rawlings; Charles Auffray
Journal:  BioData Min       Date:  2016-07-25       Impact factor: 2.522

  8 in total
  8 in total

1.  biochem4j: Integrated and extensible biochemical knowledge through graph databases.

Authors:  Neil Swainston; Riza Batista-Navarro; Pablo Carbonell; Paul D Dobson; Mark Dunstan; Adrian J Jervis; Maria Vinaixa; Alan R Williams; Sophia Ananiadou; Jean-Loup Faulon; Pedro Mendes; Douglas B Kell; Nigel S Scrutton; Rainer Breitling
Journal:  PLoS One       Date:  2017-07-14       Impact factor: 3.240

2.  Nature and Extent of Physical Comorbidities Among Korean Patients With Mental Illnesses: Pairwise and Network Analysis Based on Health Insurance Claims Data.

Authors:  Ho Joon Kim; Sam Yi Shin; Seong Hoon Jeong
Journal:  Psychiatry Investig       Date:  2022-06-15       Impact factor: 3.202

3.  Systematic integration of biomedical knowledge prioritizes drugs for repurposing.

Authors:  Daniel Scott Himmelstein; Antoine Lizee; Christine Hessler; Leo Brueggeman; Sabrina L Chen; Dexter Hadley; Ari Green; Pouya Khankhanian; Sergio E Baranzini
Journal:  Elife       Date:  2017-09-22       Impact factor: 8.140

4.  Systems medicine disease maps: community-driven comprehensive representation of disease mechanisms.

Authors:  Alexander Mazein; Marek Ostaszewski; Inna Kuperstein; Steven Watterson; Nicolas Le Novère; Diane Lefaudeux; Bertrand De Meulder; Johann Pellet; Irina Balaur; Mansoor Saqi; Maria Manuela Nogueira; Feng He; Andrew Parton; Nathanaël Lemonnier; Piotr Gawron; Stephan Gebel; Pierre Hainaut; Markus Ollert; Ugur Dogrusoz; Emmanuel Barillot; Andrei Zinovyev; Reinhard Schneider; Rudi Balling; Charles Auffray
Journal:  NPJ Syst Biol Appl       Date:  2018-06-02

Review 5.  Systems Bioinformatics: increasing precision of computational diagnostics and therapeutics through network-based approaches.

Authors:  Anastasis Oulas; George Minadakis; Margarita Zachariou; Kleitos Sokratous; Marilena M Bourdakou; George M Spyrou
Journal:  Brief Bioinform       Date:  2019-05-21       Impact factor: 11.622

6.  ERMer: a serverless platform for navigating, analyzing, and visualizing Escherichia coli regulatory landscape through graph database.

Authors:  Zhitao Mao; Ruoyu Wang; Haoran Li; Yixin Huang; Qiang Zhang; Xiaoping Liao; Hongwu Ma
Journal:  Nucleic Acids Res       Date:  2022-04-30       Impact factor: 19.160

7.  Reactome graph database: Efficient access to complex pathway data.

Authors:  Antonio Fabregat; Florian Korninger; Guilherme Viteri; Konstantinos Sidiropoulos; Pablo Marin-Garcia; Peipei Ping; Guanming Wu; Lincoln Stein; Peter D'Eustachio; Henning Hermjakob
Journal:  PLoS Comput Biol       Date:  2018-01-29       Impact factor: 4.475

8.  A knowledge graph to interpret clinical proteomics data.

Authors:  Alberto Santos; Ana R Colaço; Annelaura B Nielsen; Lili Niu; Maximilian Strauss; Philipp E Geyer; Fabian Coscia; Nicolai J Wewer Albrechtsen; Filip Mundt; Lars Juhl Jensen; Matthias Mann
Journal:  Nat Biotechnol       Date:  2022-01-31       Impact factor: 68.164

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.