Dhaivat Joshi1, Shunfu Mao2, Sreeram Kannan2, Suhas Diggavi1. 1. Electrical & Computer Engineering, University of California, Los Angeles, CA 90095, USA. 2. Electrical & Computer Engineering, University of Washington, Seattle, WA 98195, USA.
Abstract
MOTIVATION: Efficient and accurate alignment of DNA/RNA sequence reads to each other or to a reference genome/transcriptome is an important problem in genomic analysis. Nanopore sequencing has emerged as a major sequencing technology and many long-read aligners have been designed for aligning nanopore reads. However, the high error rate makes accurate and efficient alignment difficult. Utilizing the noise and error characteristics inherent in the sequencing process properly can play a vital role in constructing a robust aligner. In this article, we design QAlign, a pre-processor that can be used with any long-read aligner for aligning long reads to a genome/transcriptome or to other long reads. The key idea in QAlign is to convert the nucleotide reads into discretized current levels that capture the error modes of the nanopore sequencer before running it through a sequence aligner. RESULTS: We show that QAlign is able to improve alignment rates from around 80% up to 90% with nanopore reads when aligning to the genome. We also show that QAlign improves the average overlap quality by 9.2, 2.5 and 10.8% in three real datasets for read-to-read alignment. Read-to-transcriptome alignment rates are improved from 51.6% to 75.4% and 82.6% to 90% in two real datasets. AVAILABILITY AND IMPLEMENTATION: https://github.com/joshidhaivat/QAlign.git. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Efficient and accurate alignment of DNA/RNA sequence reads to each other or to a reference genome/transcriptome is an important problem in genomic analysis. Nanopore sequencing has emerged as a major sequencing technology and many long-read aligners have been designed for aligning nanopore reads. However, the high error rate makes accurate and efficient alignment difficult. Utilizing the noise and error characteristics inherent in the sequencing process properly can play a vital role in constructing a robust aligner. In this article, we design QAlign, a pre-processor that can be used with any long-read aligner for aligning long reads to a genome/transcriptome or to other long reads. The key idea in QAlign is to convert the nucleotide reads into discretized current levels that capture the error modes of the nanopore sequencer before running it through a sequence aligner. RESULTS: We show that QAlign is able to improve alignment rates from around 80% up to 90% with nanopore reads when aligning to the genome. We also show that QAlign improves the average overlap quality by 9.2, 2.5 and 10.8% in three real datasets for read-to-read alignment. Read-to-transcriptome alignment rates are improved from 51.6% to 75.4% and 82.6% to 90% in two real datasets. AVAILABILITY AND IMPLEMENTATION: https://github.com/joshidhaivat/QAlign.git. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Mark A DePristo; Eric Banks; Ryan Poplin; Kiran V Garimella; Jared R Maguire; Christopher Hartl; Anthony A Philippakis; Guillermo del Angel; Manuel A Rivas; Matt Hanna; Aaron McKenna; Tim J Fennell; Andrew M Kernytsky; Andrey Y Sivachenko; Kristian Cibulskis; Stacey B Gabriel; David Altshuler; Mark J Daly Journal: Nat Genet Date: 2011-04-10 Impact factor: 38.330
Authors: Maximilian Haeussler; Ann S Zweig; Cath Tyner; Matthew L Speir; Kate R Rosenbloom; Brian J Raney; Christopher M Lee; Brian T Lee; Angie S Hinrichs; Jairo Navarro Gonzalez; David Gibson; Mark Diekhans; Hiram Clawson; Jonathan Casper; Galt P Barber; David Haussler; Robert M Kuhn; W James Kent Journal: Nucleic Acids Res Date: 2019-01-08 Impact factor: 16.971
Authors: Wouter De Coster; Peter De Rijk; Arne De Roeck; Tim De Pooter; Svenn D'Hert; Mojca Strazisar; Kristel Sleegers; Christine Van Broeckhoven Journal: Genome Res Date: 2019-06-11 Impact factor: 9.043
Authors: Thomas Gatter; Sarah von Löhneysen; Jörg Fallmann; Polina Drozdova; Tom Hartmann; Peter F Stadler Journal: Algorithms Mol Biol Date: 2021-06-01 Impact factor: 1.405