Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 SparkSeq: fast, scalable and cloud-ready tool for the interactive genomic data analysis with nucleotide precision.

Literature DB >> 24845651

SparkSeq: fast, scalable and cloud-ready tool for the interactive genomic data analysis with nucleotide precision.

Marek S Wiewiórka¹, Antonio Messina¹, Alicja Pacholewska², Sergio Maffioletti¹, Piotr Gawrysiak¹, Michał J Okoniewski¹.

Abstract

UNLABELLED: Many time-consuming analyses of next -: generation sequencing data can be addressed with modern cloud computing. The Apache Hadoop-based solutions have become popular in genomics BECAUSE OF: their scalability in a cloud infrastructure. So far, most of these tools have been used for batch data processing rather than interactive data querying. The SparkSeq software has been created to take advantage of a new MapReduce framework, Apache Spark, for next-generation sequencing data. SparkSeq is a general-purpose, flexible and easily extendable library for genomic cloud computing. It can be used to build genomic analysis pipelines in Scala and run them in an interactive way. SparkSeq opens up the possibility of customized ad hoc secondary analyses and iterative machine learning algorithms. This article demonstrates its scalability and overall fast performance by running the analyses of sequencing datasets. Tests of SparkSeq also prove that the use of cache and HDFS block size can be tuned for the optimal performance on multiple worker nodes.
AVAILABILITY AND IMPLEMENTATION: Available under open source Apache 2.0 license: https://bitbucket.org/mwiewiorka/sparkseq/.

Entities: Disease

Mesh：

Substances：
Nucleotides

Year: 2014 PMID： 24845651 DOI： 10.1093/bioinformatics/btu343

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

Keyword Cloud
Cited

18 in total

1. Synonymous variants that disrupt messenger RNA structure are significantly constrained in the human population.

Authors: Jeffrey B S Gaither; Grant E Lammi; James L Li; David M Gordon; Harkness C Kuck; Benjamin J Kelly; James R Fitch; Peter White
Journal: Gigascience Date: 2021-04-05 Impact factor: 6.524

2. Optimized distributed systems achieve significant performance improvement on sorted merging of massive VCF files.

Authors: Xiaobo Sun; Jingjing Gao; Peng Jin; Celeste Eng; Esteban G Burchard; Terri H Beaty; Ingo Ruczinski; Rasika A Mathias; Kathleen Barnes; Fusheng Wang; Zhaohui S Qin
Journal: Gigascience Date: 2018-06-01 Impact factor: 6.524

3. Review of applications of high-throughput sequencing in personalized medicine: barriers and facilitators of future progress in research and clinical application.

Authors: Gaye Lightbody; Valeriia Haberland; Fiona Browne; Laura Taggart; Huiru Zheng; Eileen Parkes; Jaine K Blayney
Journal: Brief Bioinform Date: 2019-09-27 Impact factor: 11.622

SparkSeq: fast, scalable and cloud-ready tool for the interactive genomic data analysis with nucleotide precision.

1. Synonymous variants that disrupt messenger RNA structure are significantly constrained in the human population.

2. Optimized distributed systems achieve significant performance improvement on sorted merging of massive VCF files.

3. Review of applications of high-throughput sequencing in personalized medicine: barriers and facilitators of future progress in research and clinical application.

4. Engineering bioinformatics: building reliability, performance and productivity into bioinformatics software.

5. VariantSpark: population scale clustering of genotype information.

6. A quantitative assessment of the Hadoop framework for analyzing massively parallel DNA sequencing data.

Review 7. Single-cell Transcriptome Study as Big Data.

8. START: a system for flexible analysis of hundreds of genomic signal tracks in few lines of SQL-like queries.

9. Benchmarking distributed data warehouse solutions for storing genomic variant information.

Review 10. Big Data Application in Biomedical Research and Health Care: A Literature Review.