Literature DB >> 28205674

MapReduce for accurate error correction of next-generation sequencing data.

Liang Zhao1,2, Qingfeng Chen1, Wencui Li2, Peng Jiang1, Limsoon Wong3, Jinyan Li4.   

Abstract

MOTIVATION: Next-generation sequencing platforms have produced huge amounts of sequence data. This is revolutionizing every aspect of genetic and genomic research. However, these sequence datasets contain quite a number of machine-induced errors-e.g. errors due to substitution can be as high as 2.5%. Existing error-correction methods are still far from perfect. In fact, more errors are sometimes introduced than correct corrections, especially by the prevalent k-mer based methods. The existing methods have also made limited exploitation of on-demand cloud computing.
RESULTS: We introduce an error-correction method named MEC, which uses a two-layered MapReduce technique to achieve high correction performance. In the first layer, all the input sequences are mapped to groups to identify candidate erroneous bases in parallel. In the second layer, the erroneous bases at the same position are linked together from all the groups for making statistically reliable corrections. Experiments on real and simulated datasets show that our method outperforms existing methods remarkably. Its per-position error rate is consistently the lowest, and the correction gain is always the highest.
AVAILABILITY AND IMPLEMENTATION: The source code is available at bioinformatics.gxu.edu.cn/ngs/mec. CONTACTS: wongls@comp.nus.edu.sg or jinyan.li@uts.edu.au. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

Entities:  

Mesh:

Year:  2017        PMID: 28205674     DOI: 10.1093/bioinformatics/btx089

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  5 in total

1.  Systematic evaluation of error rates and causes in short samples in next-generation sequencing.

Authors:  Franziska Pfeiffer; Carsten Gröber; Michael Blank; Kristian Händler; Marc Beyer; Joachim L Schultze; Günter Mayer
Journal:  Sci Rep       Date:  2018-07-19       Impact factor: 4.379

2.  Efficient Mining of Variants From Trios for Ventricular Septal Defect Association Study.

Authors:  Peng Jiang; Yaofei Hu; Yiqi Wang; Jin Zhang; Qinghong Zhu; Lin Bai; Qiang Tong; Tao Li; Liang Zhao
Journal:  Front Genet       Date:  2019-08-08       Impact factor: 4.599

Review 3.  Sequencing-Based Measurable Residual Disease Testing in Acute Myeloid Leukemia.

Authors:  Jennifer M Yoest; Cara Lunn Shirai; Eric J Duncavage
Journal:  Front Cell Dev Biol       Date:  2020-05-08

4.  GPrimer: a fast GPU-based pipeline for primer design for qPCR experiments.

Authors:  Jeongmin Bae; Hajin Jeon; Min-Soo Kim
Journal:  BMC Bioinformatics       Date:  2021-04-29       Impact factor: 3.169

5.  BigFiRSt: A Software Program Using Big Data Technique for Mining Simple Sequence Repeats From Large-Scale Sequencing Data.

Authors:  Jinxiang Chen; Fuyi Li; Miao Wang; Junlong Li; Tatiana T Marquez-Lago; André Leier; Jerico Revote; Shuqin Li; Quanzhong Liu; Jiangning Song
Journal:  Front Big Data       Date:  2022-01-18
  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.