Literature DB >> 20426693

A parallel algorithm for error correction in high-throughput short-read data on CUDA-enabled graphics hardware.

Haixiang Shi1, Bertil Schmidt, Weiguo Liu, Wolfgang Müller-Wittig.   

Abstract

Emerging DNA sequencing technologies open up exciting new opportunities for genome sequencing by generating read data with a massive throughput. However, produced reads are significantly shorter and more error-prone compared to the traditional Sanger shotgun sequencing method. This poses challenges for de novo DNA fragment assembly algorithms in terms of both accuracy (to deal with short, error-prone reads) and scalability (to deal with very large input data sets). In this article, we present a scalable parallel algorithm for correcting sequencing errors in high-throughput short-read data so that error-free reads can be available before DNA fragment assembly, which is of high importance to many graph-based short-read assembly tools. The algorithm is based on spectral alignment and uses the Compute Unified Device Architecture (CUDA) programming model. To gain efficiency we are taking advantage of the CUDA texture memory using a space-efficient Bloom filter data structure for spectrum membership queries. We have tested the runtime and accuracy of our algorithm using real and simulated Illumina data for different read lengths, error rates, input sizes, and algorithmic parameters. Using a CUDA-enabled mass-produced GPU (available for less than US$400 at any local computer outlet), this results in speedups of 12-84 times for the parallelized error correction, and speedups of 3-63 times for both sequential preprocessing and parallelized error correction compared to the publicly available Euler-SR program. Our implementation is freely available for download from http://cuda-ec.sourceforge.net .

Mesh:

Year:  2010        PMID: 20426693     DOI: 10.1089/cmb.2009.0062

Source DB:  PubMed          Journal:  J Comput Biol        ISSN: 1066-5277            Impact factor:   1.479


  18 in total

Review 1.  Next-generation transcriptome assembly.

Authors:  Jeffrey A Martin; Zhong Wang
Journal:  Nat Rev Genet       Date:  2011-09-07       Impact factor: 53.242

Review 2.  From next-generation resequencing reads to a high-quality variant data set.

Authors:  S P Pfeifer
Journal:  Heredity (Edinb)       Date:  2016-10-19       Impact factor: 3.821

3.  Scaling metagenome sequence assembly with probabilistic de Bruijn graphs.

Authors:  Jason Pell; Arend Hintze; Rosangela Canino-Koning; Adina Howe; James M Tiedje; C Titus Brown
Journal:  Proc Natl Acad Sci U S A       Date:  2012-07-30       Impact factor: 11.205

4.  Review of applications of high-throughput sequencing in personalized medicine: barriers and facilitators of future progress in research and clinical application.

Authors:  Gaye Lightbody; Valeriia Haberland; Fiona Browne; Laura Taggart; Huiru Zheng; Eileen Parkes; Jaine K Blayney
Journal:  Brief Bioinform       Date:  2019-09-27       Impact factor: 11.622

5.  DecGPU: distributed error correction on massively parallel graphics processing units using CUDA and MPI.

Authors:  Yongchao Liu; Bertil Schmidt; Douglas L Maskell
Journal:  BMC Bioinformatics       Date:  2011-03-29       Impact factor: 3.169

6.  A practical comparison of de novo genome assembly software tools for next-generation sequencing technologies.

Authors:  Wenyu Zhang; Jiajia Chen; Yang Yang; Yifei Tang; Jing Shang; Bairong Shen
Journal:  PLoS One       Date:  2011-03-14       Impact factor: 3.240

7.  Efficient counting of k-mers in DNA sequences using a bloom filter.

Authors:  Páll Melsted; Jonathan K Pritchard
Journal:  BMC Bioinformatics       Date:  2011-08-10       Impact factor: 3.169

8.  Quake: quality-aware detection and correction of sequencing errors.

Authors:  David R Kelley; Michael C Schatz; Steven L Salzberg
Journal:  Genome Biol       Date:  2010-11-29       Impact factor: 13.583

9.  Parallelized short read assembly of large genomes using de Bruijn graphs.

Authors:  Yongchao Liu; Bertil Schmidt; Douglas L Maskell
Journal:  BMC Bioinformatics       Date:  2011-08-25       Impact factor: 3.169

10.  Error correction of high-throughput sequencing datasets with non-uniform coverage.

Authors:  Paul Medvedev; Eric Scott; Boyko Kakaradov; Pavel Pevzner
Journal:  Bioinformatics       Date:  2011-07-01       Impact factor: 6.937

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.