Literature DB >> 24021384

BioPig: a Hadoop-based analytic toolkit for large-scale sequence data.

Henrik Nordberg1, Karan Bhatia, Kai Wang, Zhong Wang.   

Abstract

MOTIVATION: The recent revolution in sequencing technologies has led to an exponential growth of sequence data. As a result, most of the current bioinformatics tools become obsolete as they fail to scale with data. To tackle this 'data deluge', here we introduce the BioPig sequence analysis toolkit as one of the solutions that scale to data and computation.
RESULTS: We built BioPig on the Apache's Hadoop MapReduce system and the Pig data flow language. Compared with traditional serial and MPI-based algorithms, BioPig has three major advantages: first, BioPig's programmability greatly reduces development time for parallel bioinformatics applications; second, testing BioPig with up to 500 Gb sequences demonstrates that it scales automatically with size of data; and finally, BioPig can be ported without modification on many Hadoop infrastructures, as tested with Magellan system at National Energy Research Scientific Computing Center and the Amazon Elastic Compute Cloud. In summary, BioPig represents a novel program framework with the potential to greatly accelerate data-intensive bioinformatics analysis.

Entities:  

Mesh:

Year:  2013        PMID: 24021384     DOI: 10.1093/bioinformatics/btt528

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  22 in total

1.  Big data and biomedical informatics: a challenging opportunity.

Authors:  R Bellazzi
Journal:  Yearb Med Inform       Date:  2014-05-22

2.  Biospark: scalable analysis of large numerical datasets from biological simulations and experiments using Hadoop and Spark.

Authors:  Max Klein; Rati Sharma; Chris H Bohrer; Cameron M Avelis; Elijah Roberts
Journal:  Bioinformatics       Date:  2016-09-22       Impact factor: 6.937

3.  Libra: scalable k-mer-based tool for massive all-vs-all metagenome comparisons.

Authors:  Illyoung Choi; Alise J Ponsero; Matthew Bomhoff; Ken Youens-Clark; John H Hartman; Bonnie L Hurwitz
Journal:  Gigascience       Date:  2019-02-01       Impact factor: 6.524

Review 4.  Managing, analysing, and integrating big data in medical bioinformatics: open problems and future perspectives.

Authors:  Ivan Merelli; Horacio Pérez-Sánchez; Sandra Gesing; Daniele D'Agostino
Journal:  Biomed Res Int       Date:  2014-09-01       Impact factor: 3.411

Review 5.  Applications of the MapReduce programming framework to clinical big data analysis: current landscape and future trends.

Authors:  Emad A Mohammed; Behrouz H Far; Christopher Naugler
Journal:  BioData Min       Date:  2014-10-29       Impact factor: 2.522

6.  A hadoop-based method to predict potential effective drug combination.

Authors:  Yifan Sun; Yi Xiong; Qian Xu; Dongqing Wei
Journal:  Biomed Res Int       Date:  2014-07-23       Impact factor: 3.411

7.  A quantitative assessment of the Hadoop framework for analyzing massively parallel DNA sequencing data.

Authors:  Alexey Siretskiy; Tore Sundqvist; Mikhail Voznesenskiy; Ola Spjuth
Journal:  Gigascience       Date:  2015-06-04       Impact factor: 6.524

8.  SeqPig: simple and scalable scripting for large sequencing data sets in Hadoop.

Authors:  André Schumacher; Luca Pireddu; Matti Niemenmaa; Aleksi Kallio; Eija Korpelainen; Gianluigi Zanetti; Keijo Heljanko
Journal:  Bioinformatics       Date:  2013-10-22       Impact factor: 6.937

9.  CloudDOE: a user-friendly tool for deploying Hadoop clouds and analyzing high-throughput sequencing data with MapReduce.

Authors:  Wei-Chun Chung; Chien-Chih Chen; Jan-Ming Ho; Chung-Yen Lin; Wen-Lian Hsu; Yu-Chun Wang; D T Lee; Feipei Lai; Chih-Wei Huang; Yu-Jung Chang
Journal:  PLoS One       Date:  2014-06-04       Impact factor: 3.240

10.  Towards sub-quadratic time and space complexity solutions for the dated tree reconciliation problem.

Authors:  Benjamin Drinkwater; Michael A Charleston
Journal:  Algorithms Mol Biol       Date:  2016-05-21       Impact factor: 1.405

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.