| Literature DB >> 30349760 |
Christopher Harrison1,2, Sündüz Keleş1, Rebecca Hudson1, Sunyoung Shin1, Inês Dutra2.
Abstract
We explore the feasibility of a database storage engine housing up to 307 billion genetic Single Nucleotide Polymorphisms (SNP) for online access. We evaluate database storage engines and implement a solution utilizing factors such as dataset size, information gain, cost and hardware constraints. Our solution provides a full feature functional model for scalable storage and query-ability for researchers exploring the SNP's in the human genome. We address the scalability problem by building physical infrastructure and comparing final costs to a major cloud provider.Entities:
Keywords: Big Data; Billion Records; Cassandra; Data Reduction; Distributed Computing; Economical Computing; Edge Computing; Elasticsearch; Genomics; MySQL; NoSQL; PWM; SNP
Year: 2018 PMID: 30349760 PMCID: PMC6195815 DOI: 10.1109/IPDPSW.2018.00086
Source DB: PubMed Journal: IEEE Int Symp Parallel Distrib Process Workshops Phd Forum ISSN: 2164-7062