Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 MetaSpark: a spark-based distributed processing tool to recruit metagenomic reads to reference genomes.

Literature DB >> 28065898

MetaSpark: a spark-based distributed processing tool to recruit metagenomic reads to reference genomes.

Wei Zhou¹, Ruilin Li^2,3, Shuo Yuan¹, ChangChun Liu¹, Shaowen Yao¹, Jing Luo⁴, Beifang Niu^2,3.

Abstract

Summary: With the advent of next-generation sequencing, traditional bioinformatics tools are challenged by massive raw metagenomic datasets. One of the bottlenecks of metagenomic studies is lack of large-scale and cloud computing suitable data analysis tools. In this paper, we proposed a Spark based tool, called MetaSpark, to recruit metagenomic reads to reference genomes. MetaSpark benefits from the distributed data set (RDD) of Spark, which makes it able to cache data set in memory across cluster nodes and scale well with the datasets. Compared with previous metagenomics recruitment tools, MetaSpark recruited significantly more reads than many programs such as SOAP2, BWA and LAST and increased recruited reads by ∼4% compared with FR-HIT when there were 1 million reads and 0.75 GB references. Different test cases demonstrate MetaSpark's scalability and overall high performance. Availability: https://github.com/zhouweiyg/metaspark. Contact: bniu@sccas.cn , jingluo@ynu.edu.cn. Supplementary information: Supplementary data are available at Bioinformatics online.

Entities: Disease

Mesh：

Year: 2017 PMID： 28065898 DOI： 10.1093/bioinformatics/btw750

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

Keyword Cloud
Cited

6 in total

6. BigFiRSt: A Software Program Using Big Data Technique for Mining Simple Sequence Repeats From Large-Scale Sequencing Data.

Authors: Jinxiang Chen; Fuyi Li; Miao Wang; Junlong Li; Tatiana T Marquez-Lago; André Leier; Jerico Revote; Shuqin Li; Quanzhong Liu; Jiangning Song
Journal: Front Big Data Date: 2022-01-18

6 in total

MetaSpark: a spark-based distributed processing tool to recruit metagenomic reads to reference genomes.

1. Optimized distributed systems achieve significant performance improvement on sorted merging of massive VCF files.

2. Analyzing large scale genomic data on the cloud with Sparkhit.

3. Large scale microbiome profiling in the cloud.

4. Computational Strategies for Scalable Genomics Analysis.

Review 5. Bioinformatics applications on Apache Spark.

6. BigFiRSt: A Software Program Using Big Data Technique for Mining Simple Sequence Repeats From Large-Scale Sequencing Data.