Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 MarDRe: efficient MapReduce-based removal of duplicate DNA reads in the cloud.

Literature DB >> 28475668

MarDRe: efficient MapReduce-based removal of duplicate DNA reads in the cloud.

Roberto R Expósito¹, Jorge Veiga¹, Jorge González-Domínguez¹, Juan Touriño¹.

Abstract

SUMMARY: This article presents MarDRe, a de novo cloud-ready duplicate and near-duplicate removal tool that can process single- and paired-end reads from FASTQ/FASTA datasets. MarDRe takes advantage of the widely adopted MapReduce programming model to fully exploit Big Data technologies on cloud-based infrastructures. Written in Java to maximize cross-platform compatibility, MarDRe is built upon the open-source Apache Hadoop project, the most popular distributed computing framework for scalable Big Data processing. On a 16-node cluster deployed on the Amazon EC2 cloud platform, MarDRe is up to 8.52 times faster than a representative state-of-the-art tool.
AVAILABILITY AND IMPLEMENTATION: Source code in Java and Hadoop as well as a user's guide are freely available under the GNU GPLv3 license at http://mardre.des.udc.es . CONTACT: rreye@udc.es.

Entities: Chemical

Mesh：

Year: 2017 PMID： 28475668 DOI： 10.1093/bioinformatics/btx307

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

Keyword Cloud
Cited

4 in total

4. BigFiRSt: A Software Program Using Big Data Technique for Mining Simple Sequence Repeats From Large-Scale Sequencing Data.

Authors: Jinxiang Chen; Fuyi Li; Miao Wang; Junlong Li; Tatiana T Marquez-Lago; André Leier; Jerico Revote; Shuqin Li; Quanzhong Liu; Jiangning Song
Journal: Front Big Data Date: 2022-01-18

4 in total

MarDRe: efficient MapReduce-based removal of duplicate DNA reads in the cloud.

1. Fast-HBR: Fast hash based duplicate read remover.

2. HSRA: Hadoop-based spliced read aligner for RNA sequencing data.

3. GPrimer: a fast GPU-based pipeline for primer design for qPCR experiments.

4. BigFiRSt: A Software Program Using Big Data Technique for Mining Simple Sequence Repeats From Large-Scale Sequencing Data.