Tyler Cowman1, Mustafa Coşkun2, Ananth Grama3, Mehmet Koyutürk1,4. 1. Department of Computer and Data Sciences, Case Western Reserve University, Cleveland, OH 44106, USA. 2. Department of Computer Engineering, Abdullah Gül University, Kayseri 38080, Turkey. 3. Department of Computer Science, Purdue University, West Lafayette, IN 47906, USA. 4. Center for Proteomics and Bioinformatics, Case Western Reserve University, Cleveland, OH 44106, USA.
Abstract
MOTIVATION: Biomolecular data stored in public databases is increasingly specialized to organisms, context/pathology and tissue type, potentially resulting in significant overhead for analyses. These networks are often specializations of generic interaction sets, presenting opportunities for reducing storage and computational cost. Therefore, it is desirable to develop effective compression and storage techniques, along with efficient algorithms and a flexible query interface capable of operating on compressed data structures. Current graph databases offer varying levels of support for network integration. However, these solutions do not provide efficient methods for the storage and querying of versioned networks. RESULTS: We present VerTIoN, a framework consisting of novel data structures and associated query mechanisms for integrated querying of versioned context-specific biological networks. As a use case for our framework, we study network proximity queries in which the user can select and compose a combination of tissue-specific and generic networks. Using our compressed version tree data structure, in conjunction with state-of-the-art numerical techniques, we demonstrate real-time querying of large network databases. CONCLUSION: Our results show that it is possible to support flexible queries defined on heterogeneous networks composed at query time while drastically reducing response time for multiple simultaneous queries. The flexibility offered by VerTIoN in composing integrated network versions opens significant new avenues for the utilization of ever increasing volume of context-specific network data in a broad range of biomedical applications. AVAILABILITY AND IMPLEMENTATION: VerTIoN is implemented as a C++ library and is available at http://compbio.case.edu/omics/software/vertion and https://github.com/tjcowman/vertion. CONTACT: tyler.cowman@case.edu.
MOTIVATION: Biomolecular data stored in public databases is increasingly specialized to organisms, context/pathology and tissue type, potentially resulting in significant overhead for analyses. These networks are often specializations of generic interaction sets, presenting opportunities for reducing storage and computational cost. Therefore, it is desirable to develop effective compression and storage techniques, along with efficient algorithms and a flexible query interface capable of operating on compressed data structures. Current graph databases offer varying levels of support for network integration. However, these solutions do not provide efficient methods for the storage and querying of versioned networks. RESULTS: We present VerTIoN, a framework consisting of novel data structures and associated query mechanisms for integrated querying of versioned context-specific biological networks. As a use case for our framework, we study network proximity queries in which the user can select and compose a combination of tissue-specific and generic networks. Using our compressed version tree data structure, in conjunction with state-of-the-art numerical techniques, we demonstrate real-time querying of large network databases. CONCLUSION: Our results show that it is possible to support flexible queries defined on heterogeneous networks composed at query time while drastically reducing response time for multiple simultaneous queries. The flexibility offered by VerTIoN in composing integrated network versions opens significant new avenues for the utilization of ever increasing volume of context-specific network data in a broad range of biomedical applications. AVAILABILITY AND IMPLEMENTATION: VerTIoN is implemented as a C++ library and is available at http://compbio.case.edu/omics/software/vertion and https://github.com/tjcowman/vertion. CONTACT: tyler.cowman@case.edu.
Authors: Zhengyan Kan; Bijay S Jaiswal; Jeremy Stinson; Vasantharajan Janakiraman; Deepali Bhatt; Howard M Stern; Peng Yue; Peter M Haverty; Richard Bourgon; Jianbiao Zheng; Martin Moorhead; Subhra Chaudhuri; Lynn P Tomsho; Brock A Peters; Kanan Pujara; Shaun Cordes; David P Davis; Victoria E H Carlton; Wenlin Yuan; Li Li; Weiru Wang; Charles Eigenbrot; Joshua S Kaminker; David A Eberhard; Paul Waring; Stephan C Schuster; Zora Modrusan; Zemin Zhang; David Stokoe; Frederic J de Sauvage; Malek Faham; Somasekar Seshagiri Journal: Nature Date: 2010-07-28 Impact factor: 49.962
Authors: Sebastian Köhler; Sandra C Doelken; Christopher J Mungall; Sebastian Bauer; Helen V Firth; Isabelle Bailleul-Forestier; Graeme C M Black; Danielle L Brown; Michael Brudno; Jennifer Campbell; David R FitzPatrick; Janan T Eppig; Andrew P Jackson; Kathleen Freson; Marta Girdea; Ingo Helbig; Jane A Hurst; Johanna Jähn; Laird G Jackson; Anne M Kelly; David H Ledbetter; Sahar Mansour; Christa L Martin; Celia Moss; Andrew Mumford; Willem H Ouwehand; Soo-Mi Park; Erin Rooney Riggs; Richard H Scott; Sanjay Sisodiya; Steven Van Vooren; Ronald J Wapner; Andrew O M Wilkie; Caroline F Wright; Anneke T Vulto-van Silfhout; Nicole de Leeuw; Bert B A de Vries; Nicole L Washingthon; Cynthia L Smith; Monte Westerfield; Paul Schofield; Barbara J Ruef; Georgios V Gkoutos; Melissa Haendel; Damian Smedley; Suzanna E Lewis; Peter N Robinson Journal: Nucleic Acids Res Date: 2013-11-11 Impact factor: 16.971
Authors: Chuan Gao; Ian C McDowell; Shiwen Zhao; Christopher D Brown; Barbara E Engelhardt Journal: PLoS Comput Biol Date: 2016-07-28 Impact factor: 4.475