Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 A parallel algorithm for error correction in high-throughput short-read data on CUDA-enabled graphics hardware.

Literature DB >> 20426693

A parallel algorithm for error correction in high-throughput short-read data on CUDA-enabled graphics hardware.

Haixiang Shi¹, Bertil Schmidt, Weiguo Liu, Wolfgang Müller-Wittig.

Abstract

Emerging DNA sequencing technologies open up exciting new opportunities for genome sequencing by generating read data with a massive throughput. However, produced reads are significantly shorter and more error-prone compared to the traditional Sanger shotgun sequencing method. This poses challenges for de novo DNA fragment assembly algorithms in terms of both accuracy (to deal with short, error-prone reads) and scalability (to deal with very large input data sets). In this article, we present a scalable parallel algorithm for correcting sequencing errors in high-throughput short-read data so that error-free reads can be available before DNA fragment assembly, which is of high importance to many graph-based short-read assembly tools. The algorithm is based on spectral alignment and uses the Compute Unified Device Architecture (CUDA) programming model. To gain efficiency we are taking advantage of the CUDA texture memory using a space-efficient Bloom filter data structure for spectrum membership queries. We have tested the runtime and accuracy of our algorithm using real and simulated Illumina data for different read lengths, error rates, input sizes, and algorithmic parameters. Using a CUDA-enabled mass-produced GPU (available for less than US$400 at any local computer outlet), this results in speedups of 12-84 times for the parallelized error correction, and speedups of 3-63 times for both sequential preprocessing and parallelized error correction compared to the publicly available Euler-SR program. Our implementation is freely available for download from http://cuda-ec.sourceforge.net .

Mesh：

Year: 2010 PMID： 20426693 DOI： 10.1089/cmb.2009.0062

Source DB: PubMed Journal: J Comput Biol ISSN： 1066-5277 Impact factor: 1.479

Keyword Cloud
Cited

18 in total

Review 1. Next-generation transcriptome assembly.

Authors: Jeffrey A Martin; Zhong Wang
Journal: Nat Rev Genet Date: 2011-09-07 Impact factor: 53.242

Review 2. From next-generation resequencing reads to a high-quality variant data set.

Authors: S P Pfeifer
Journal: Heredity (Edinb) Date: 2016-10-19 Impact factor: 3.821

3. Scaling metagenome sequence assembly with probabilistic de Bruijn graphs.

Authors: Jason Pell; Arend Hintze; Rosangela Canino-Koning; Adina Howe; James M Tiedje; C Titus Brown
Journal: Proc Natl Acad Sci U S A Date: 2012-07-30 Impact factor: 11.205

4. Review of applications of high-throughput sequencing in personalized medicine: barriers and facilitators of future progress in research and clinical application.

Authors: Gaye Lightbody; Valeriia Haberland; Fiona Browne; Laura Taggart; Huiru Zheng; Eileen Parkes; Jaine K Blayney
Journal: Brief Bioinform Date: 2019-09-27 Impact factor: 11.622

A parallel algorithm for error correction in high-throughput short-read data on CUDA-enabled graphics hardware.

Review 1. Next-generation transcriptome assembly.

Review 2. From next-generation resequencing reads to a high-quality variant data set.

3. Scaling metagenome sequence assembly with probabilistic de Bruijn graphs.

4. Review of applications of high-throughput sequencing in personalized medicine: barriers and facilitators of future progress in research and clinical application.

5. DecGPU: distributed error correction on massively parallel graphics processing units using CUDA and MPI.

6. A practical comparison of de novo genome assembly software tools for next-generation sequencing technologies.

7. Efficient counting of k-mers in DNA sequences using a bloom filter.

8. Quake: quality-aware detection and correction of sequencing errors.

9. Parallelized short read assembly of large genomes using de Bruijn graphs.

10. Error correction of high-throughput sequencing datasets with non-uniform coverage.