Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Biospark: scalable analysis of large numerical datasets from biological simulations and experiments using Hadoop and Spark.

Literature DB >> 27663493

Biospark: scalable analysis of large numerical datasets from biological simulations and experiments using Hadoop and Spark.

Max Klein¹, Rati Sharma¹, Chris H Bohrer¹, Cameron M Avelis¹, Elijah Roberts¹.

Abstract

Data-parallel programming techniques can dramatically decrease the time needed to analyze large datasets. While these methods have provided significant improvements for sequencing-based analyses, other areas of biological informatics have not yet adopted them. Here, we introduce Biospark, a new framework for performing data-parallel analysis on large numerical datasets. Biospark builds upon the open source Hadoop and Spark projects, bringing domain-specific features for biology.
AVAILABILITY AND IMPLEMENTATION: Source code is licensed under the Apache 2.0 open source license and is available at the project website: https://www.assembla.com/spaces/roberts-lab-public/wiki/Biospark CONTACT: eroberts@jhu.eduSupplementary information: Supplementary data are available at Bioinformatics online.

Mesh：

Year: 2016 PMID： 27663493 PMCID： PMC6276899 DOI： 10.1093/bioinformatics/btw614

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

4 in total

1. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.

Authors: Aaron McKenna; Matthew Hanna; Eric Banks; Andrey Sivachenko; Kristian Cibulskis; Andrew Kernytsky; Kiran Garimella; David Altshuler; Stacey Gabriel; Mark Daly; Mark A DePristo
Journal: Genome Res Date: 2010-07-19 Impact factor: 9.043

2. 'Big data', Hadoop and cloud computing in genomics.

Authors: Aisling O'Driscoll; Jurate Daugelaite; Roy D Sleator
Journal: J Biomed Inform Date: 2013-07-18 Impact factor: 6.317

3. BioPig: a Hadoop-based analytic toolkit for large-scale sequence data.

Authors: Henrik Nordberg; Karan Bhatia; Kai Wang; Zhong Wang
Journal: Bioinformatics Date: 2013-09-10 Impact factor: 6.937

4. Hadoop-BAM: directly manipulating next generation sequencing data in the cloud.

Authors: Matti Niemenmaa; Aleksi Kallio; André Schumacher; Petri Klemelä; Eija Korpelainen; Keijo Heljanko
Journal: Bioinformatics Date: 2012-02-02 Impact factor: 6.937

4 in total

5 in total

Biospark: scalable analysis of large numerical datasets from biological simulations and experiments using Hadoop and Spark.

1. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.

2. 'Big data', Hadoop and cloud computing in genomics.

3. BioPig: a Hadoop-based analytic toolkit for large-scale sequence data.

4. Hadoop-BAM: directly manipulating next generation sequencing data in the cloud.

1. MpBsmi: A new algorithm for the recognition of continuous biological sequence pattern based on index structure.

2. DECA: scalable XHMM exome copy-number variant calling with ADAM and Apache Spark.

3. Neural network control of focal position during time-lapse microscopy of cells.

Review 4. Bioinformatics applications on Apache Spark.

5. Modeling binary and graded cone cell fate patterning in the mouse retina.