Literature DB >> 30349760

atSNPInfrastructure, a case study for searching billions of records while providing significant cost savings over cloud providers.

Christopher Harrison1,2, Sündüz Keleş1, Rebecca Hudson1, Sunyoung Shin1, Inês Dutra2.   

Abstract

We explore the feasibility of a database storage engine housing up to 307 billion genetic Single Nucleotide Polymorphisms (SNP) for online access. We evaluate database storage engines and implement a solution utilizing factors such as dataset size, information gain, cost and hardware constraints. Our solution provides a full feature functional model for scalable storage and query-ability for researchers exploring the SNP's in the human genome. We address the scalability problem by building physical infrastructure and comparing final costs to a major cloud provider.

Entities:  

Keywords:  Big Data; Billion Records; Cassandra; Data Reduction; Distributed Computing; Economical Computing; Edge Computing; Elasticsearch; Genomics; MySQL; NoSQL; PWM; SNP

Year:  2018        PMID: 30349760      PMCID: PMC6195815          DOI: 10.1109/IPDPSW.2018.00086

Source DB:  PubMed          Journal:  IEEE Int Symp Parallel Distrib Process Workshops Phd Forum        ISSN: 2164-7062


  11 in total

1.  An SNP map of the human genome generated by reduced representation shotgun sequencing.

Authors:  D Altshuler; V J Pollara; C R Cowles; W J Van Etten; J Baldwin; L Linton; E S Lander
Journal:  Nature       Date:  2000-09-28       Impact factor: 49.962

Review 2.  Single nucleotide polymorphisms as tools in human genetics.

Authors:  I C Gray; D A Campbell; N K Spurr
Journal:  Hum Mol Genet       Date:  2000-10       Impact factor: 6.150

3.  D³: Data-Driven Documents.

Authors:  Michael Bostock; Vadim Ogievetsky; Jeffrey Heer
Journal:  IEEE Trans Vis Comput Graph       Date:  2011-12       Impact factor: 4.579

4.  The NCBI dbGaP database of genotypes and phenotypes.

Authors:  Matthew D Mailman; Michael Feolo; Yumi Jin; Masato Kimura; Kimberly Tryka; Rinat Bagoutdinov; Luning Hao; Anne Kiang; Justin Paschall; Lon Phan; Natalia Popova; Stephanie Pretel; Lora Ziyabari; Moira Lee; Yu Shao; Zhen Y Wang; Karl Sirotkin; Minghong Ward; Michael Kholodov; Kerry Zbicz; Jeffrey Beck; Michael Kimelman; Sergey Shevelev; Don Preuss; Eugene Yaschenko; Alan Graeff; James Ostell; Stephen T Sherry
Journal:  Nat Genet       Date:  2007-10       Impact factor: 38.330

5.  atSNP: transcription factor binding affinity testing for regulatory SNP detection.

Authors:  Chandler Zuo; Sunyoung Shin; Sündüz Keleş
Journal:  Bioinformatics       Date:  2015-06-18       Impact factor: 6.937

6.  Use of the 'Perceptron' algorithm to distinguish translational initiation sites in E. coli.

Authors:  G D Stormo; T D Schneider; L Gold; A Ehrenfeucht
Journal:  Nucleic Acids Res       Date:  1982-05-11       Impact factor: 16.971

7.  Nucleotide polymorphism at the alcohol dehydrogenase locus of Drosophila melanogaster.

Authors:  M Kreitman
Journal:  Nature       Date:  1983 Aug 4-10       Impact factor: 49.962

8.  An integrated encyclopedia of DNA elements in the human genome.

Authors: 
Journal:  Nature       Date:  2012-09-06       Impact factor: 49.962

9.  Genome Variation Map: a data repository of genome variations in BIG Data Center.

Authors:  Shuhui Song; Dongmei Tian; Cuiping Li; Bixia Tang; Lili Dong; Jingfa Xiao; Yiming Bao; Wenming Zhao; Hang He; Zhang Zhang
Journal:  Nucleic Acids Res       Date:  2018-01-04       Impact factor: 16.971

10.  JASPAR 2014: an extensively expanded and updated open-access database of transcription factor binding profiles.

Authors:  Anthony Mathelier; Xiaobei Zhao; Allen W Zhang; François Parcy; Rebecca Worsley-Hunt; David J Arenillas; Sorana Buchman; Chih-yu Chen; Alice Chou; Hans Ienasescu; Jonathan Lim; Casper Shyr; Ge Tan; Michelle Zhou; Boris Lenhard; Albin Sandelin; Wyeth W Wasserman
Journal:  Nucleic Acids Res       Date:  2013-11-04       Impact factor: 16.971

View more
  1 in total

1.  RIViT-seq enables systematic identification of regulons of transcriptional machineries.

Authors:  Hiroshi Otani; Nigel J Mouncey
Journal:  Nat Commun       Date:  2022-06-17       Impact factor: 17.694

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.