Literature DB >> 28416946

Use of Graph Database for the Integration of Heterogeneous Biological Data.

Byoung-Ha Yoon1,2, Seon-Kyu Kim1, Seon-Young Kim1,2.   

Abstract

Understanding complex relationships among heterogeneous biological data is one of the fundamental goals in biology. In most cases, diverse biological data are stored in relational databases, such as MySQL and Oracle, which store data in multiple tables and then infer relationships by multiple-join statements. Recently, a new type of database, called the graph-based database, was developed to natively represent various kinds of complex relationships, and it is widely used among computer science communities and IT industries. Here, we demonstrate the feasibility of using a graph-based database for complex biological relationships by comparing the performance between MySQL and Neo4j, one of the most widely used graph databases. We collected various biological data (protein-protein interaction, drug-target, gene-disease, etc.) from several existing sources, removed duplicate and redundant data, and finally constructed a graph database containing 114,550 nodes and 82,674,321 relationships. When we tested the query execution performance of MySQL versus Neo4j, we found that Neo4j outperformed MySQL in all cases. While Neo4j exhibited a very fast response for various queries, MySQL exhibited latent or unfinished responses for complex queries with multiple-join statements. These results show that using graph-based databases, such as Neo4j, is an efficient way to store complex biological relationships. Moreover, querying a graph database in diverse ways has the potential to reveal novel relationships among heterogeneous biological data.

Entities:  

Keywords:  Neo4j; biological network; data mining; graph database; heterogeneous biological data; query performance

Year:  2017        PMID: 28416946      PMCID: PMC5389944          DOI: 10.5808/GI.2017.15.1.19

Source DB:  PubMed          Journal:  Genomics Inform        ISSN: 1598-866X


  34 in total

1.  Molecular networks: the top-down view.

Authors:  Dennis Bray
Journal:  Science       Date:  2003-09-26       Impact factor: 47.728

Review 2.  Network biology: understanding the cell's functional organization.

Authors:  Albert-László Barabási; Zoltán N Oltvai
Journal:  Nat Rev Genet       Date:  2004-02       Impact factor: 53.242

3.  The Orphan Drug Act: an engine of innovation? At what cost?

Authors:  D D Rohde
Journal:  Food Drug Law J       Date:  2000       Impact factor: 0.619

4.  Clinical genomic database.

Authors:  Benjamin D Solomon; Anh-Dao Nguyen; Kelly A Bear; Tyra G Wolfsberg
Journal:  Proc Natl Acad Sci U S A       Date:  2013-05-21       Impact factor: 11.205

5.  Disease Ontology: a backbone for disease semantic integration.

Authors:  Lynn Marie Schriml; Cesar Arze; Suvarna Nadendla; Yu-Wei Wayne Chang; Mark Mazaitis; Victor Felix; Gang Feng; Warren Alden Kibbe
Journal:  Nucleic Acids Res       Date:  2011-11-12       Impact factor: 16.971

6.  DrugBank: a comprehensive resource for in silico drug discovery and exploration.

Authors:  David S Wishart; Craig Knox; An Chi Guo; Savita Shrivastava; Murtaza Hassanali; Paul Stothard; Zhan Chang; Jennifer Woolsey
Journal:  Nucleic Acids Res       Date:  2006-01-01       Impact factor: 16.971

7.  BioGRID: a general repository for interaction datasets.

Authors:  Chris Stark; Bobby-Joe Breitkreutz; Teresa Reguly; Lorrie Boucher; Ashton Breitkreutz; Mike Tyers
Journal:  Nucleic Acids Res       Date:  2006-01-01       Impact factor: 16.971

8.  Combining computational models, semantic annotations and simulation experiments in a graph database.

Authors:  Ron Henkel; Olaf Wolkenhauer; Dagmar Waltemath
Journal:  Database (Oxford)       Date:  2015-03-08       Impact factor: 3.451

9.  A side effect resource to capture phenotypic effects of drugs.

Authors:  Michael Kuhn; Monica Campillos; Ivica Letunic; Lars Juhl Jensen; Peer Bork
Journal:  Mol Syst Biol       Date:  2010-01-19       Impact factor: 11.429

Review 10.  Linking genes to literature: text mining, information extraction, and retrieval applications for biology.

Authors:  Martin Krallinger; Alfonso Valencia; Lynette Hirschman
Journal:  Genome Biol       Date:  2008-09-01       Impact factor: 13.583

View more
  12 in total

1.  The importance of residue-level filtering and the Top2018 best-parts dataset of high-quality protein residues.

Authors:  Christopher J Williams; David C Richardson; Jane S Richardson
Journal:  Protein Sci       Date:  2021-11-29       Impact factor: 6.725

2.  FHIR-Ontop-OMOP: Building clinical knowledge graphs in FHIR RDF with the OMOP Common data Model.

Authors:  Guohui Xiao; Emily Pfaff; Eric Prud'hommeaux; David Booth; Deepak K Sharma; Nan Huo; Yue Yu; Nansu Zong; Kathryn J Ruddy; Christopher G Chute; Guoqian Jiang
Journal:  J Biomed Inform       Date:  2022-09-09       Impact factor: 8.000

3.  Systematic integration of biomedical knowledge prioritizes drugs for repurposing.

Authors:  Daniel Scott Himmelstein; Antoine Lizee; Christine Hessler; Leo Brueggeman; Sabrina L Chen; Dexter Hadley; Ari Green; Pouya Khankhanian; Sergio E Baranzini
Journal:  Elife       Date:  2017-09-22       Impact factor: 8.140

4.  PlanNET: homology-based predicted interactome for multiple planarian transcriptomes.

Authors:  S Castillo-Lara; J F Abril
Journal:  Bioinformatics       Date:  2018-03-15       Impact factor: 6.937

5.  BED: a Biological Entity Dictionary based on a graph data model.

Authors:  Patrice Godard; Jonathan van Eyll
Journal:  F1000Res       Date:  2018-02-15

6.  Graph-Representation of Patient Data: a Systematic Literature Review.

Authors:  Jens Schrodt; Aleksei Dudchenko; Petra Knaup-Gregori; Matthias Ganzinger
Journal:  J Med Syst       Date:  2020-03-12       Impact factor: 4.460

7.  Knowledge graph analytics platform with LINCS and IDG for Parkinson's disease target illumination.

Authors:  Jeremy J Yang; Christopher R Gessner; Joel L Duerksen; Daniel Biber; Jessica L Binder; Murat Ozturk; Brian Foote; Robin McEntire; Kyle Stirling; Ying Ding; David J Wild
Journal:  BMC Bioinformatics       Date:  2022-01-12       Impact factor: 3.169

8.  BioDWH2: an automated graph-based data warehouse and mapping tool.

Authors:  Marcel Friedrichs
Journal:  J Integr Bioinform       Date:  2021-02-22

9.  DHPV: a distributed algorithm for large-scale graph partitioning.

Authors:  Wilfried Yves Hamilton Adoni; Tarik Nahhal; Moez Krichen; Abdeltif El Byed; Ismail Assayad
Journal:  J Big Data       Date:  2020-09-16

10.  Exploring Integrative Analysis Using the BioMedical Evidence Graph.

Authors:  Adam Struck; Brian Walsh; Alexander Buchanan; Jordan A Lee; Ryan Spangler; Joshua M Stuart; Kyle Ellrott
Journal:  JCO Clin Cancer Inform       Date:  2020-02
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.