Literature DB >> 27663493

Biospark: scalable analysis of large numerical datasets from biological simulations and experiments using Hadoop and Spark.

Max Klein1, Rati Sharma1, Chris H Bohrer1, Cameron M Avelis1, Elijah Roberts1.   

Abstract

Data-parallel programming techniques can dramatically decrease the time needed to analyze large datasets. While these methods have provided significant improvements for sequencing-based analyses, other areas of biological informatics have not yet adopted them. Here, we introduce Biospark, a new framework for performing data-parallel analysis on large numerical datasets. Biospark builds upon the open source Hadoop and Spark projects, bringing domain-specific features for biology.
AVAILABILITY AND IMPLEMENTATION: Source code is licensed under the Apache 2.0 open source license and is available at the project website: https://www.assembla.com/spaces/roberts-lab-public/wiki/Biospark CONTACT: eroberts@jhu.eduSupplementary information: Supplementary data are available at Bioinformatics online.
© The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

Mesh:

Year:  2016        PMID: 27663493      PMCID: PMC6276899          DOI: 10.1093/bioinformatics/btw614

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  4 in total

1.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.

Authors:  Aaron McKenna; Matthew Hanna; Eric Banks; Andrey Sivachenko; Kristian Cibulskis; Andrew Kernytsky; Kiran Garimella; David Altshuler; Stacey Gabriel; Mark Daly; Mark A DePristo
Journal:  Genome Res       Date:  2010-07-19       Impact factor: 9.043

2.  'Big data', Hadoop and cloud computing in genomics.

Authors:  Aisling O'Driscoll; Jurate Daugelaite; Roy D Sleator
Journal:  J Biomed Inform       Date:  2013-07-18       Impact factor: 6.317

3.  BioPig: a Hadoop-based analytic toolkit for large-scale sequence data.

Authors:  Henrik Nordberg; Karan Bhatia; Kai Wang; Zhong Wang
Journal:  Bioinformatics       Date:  2013-09-10       Impact factor: 6.937

4.  Hadoop-BAM: directly manipulating next generation sequencing data in the cloud.

Authors:  Matti Niemenmaa; Aleksi Kallio; André Schumacher; Petri Klemelä; Eija Korpelainen; Keijo Heljanko
Journal:  Bioinformatics       Date:  2012-02-02       Impact factor: 6.937

  4 in total
  5 in total

1.  MpBsmi: A new algorithm for the recognition of continuous biological sequence pattern based on index structure.

Authors:  Weina Li; Jiadong Ren
Journal:  PLoS One       Date:  2018-04-23       Impact factor: 3.240

2.  DECA: scalable XHMM exome copy-number variant calling with ADAM and Apache Spark.

Authors:  Michael D Linderman; Davin Chia; Forrest Wallace; Frank A Nothaft
Journal:  BMC Bioinformatics       Date:  2019-10-11       Impact factor: 3.169

3.  Neural network control of focal position during time-lapse microscopy of cells.

Authors:  Ling Wei; Elijah Roberts
Journal:  Sci Rep       Date:  2018-05-09       Impact factor: 4.379

Review 4.  Bioinformatics applications on Apache Spark.

Authors:  Runxin Guo; Yi Zhao; Quan Zou; Xiaodong Fang; Shaoliang Peng
Journal:  Gigascience       Date:  2018-08-01       Impact factor: 6.524

5.  Modeling binary and graded cone cell fate patterning in the mouse retina.

Authors:  Kiara C Eldred; Cameron Avelis; Robert J Johnston; Elijah Roberts
Journal:  PLoS Comput Biol       Date:  2020-03-09       Impact factor: 4.475

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.