Santiago Marco-Sola1,2, Juan Carlos Moure2, Miquel Moreto1,3, Antonio Espinosa2. 1. Department of Computer Sciences, Barcelona Supercomputing Center, Barcelona 08034, Spain. 2. Departament d'Arquitectura de Computadors i Sistemes Operatius, Universitat Autònoma de Barcelona, Barcelona 08193, Spain. 3. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya, Barcelona 08034, Spain.
Abstract
MOTIVATION: Pairwise alignment of sequences is a fundamental method in modern molecular biology, implemented within multiple bioinformatics tools and libraries. Current advances in sequencing technologies press for the development of faster pairwise alignment algorithms that can scale with increasing read lengths and production yields. RESULTS: In this article, we present the wavefront alignment algorithm (WFA), an exact gap-affine algorithm that takes advantage of homologous regions between the sequences to accelerate the alignment process. As opposed to traditional dynamic programming algorithms that run in quadratic time, the WFA runs in time O(ns), proportional to the read length n and the alignment score s, using O(s2) memory. Furthermore, our algorithm exhibits simple data dependencies that can be easily vectorized, even by the automatic features of modern compilers, for different architectures, without the need to adapt the code. We evaluate the performance of our algorithm, together with other state-of-the-art implementations. As a result, we demonstrate that the WFA runs 20-300× faster than other methods aligning short Illumina-like sequences, and 10-100× faster using long noisy reads like those produced by Oxford Nanopore Technologies. AVAILABILITY AND IMPLEMENTATION: The WFA algorithm is implemented within the wavefront-aligner library, and it is publicly available at https://github.com/smarco/WFA.
MOTIVATION: Pairwise alignment of sequences is a fundamental method in modern molecular biology, implemented within multiple bioinformatics tools and libraries. Current advances in sequencing technologies press for the development of faster pairwise alignment algorithms that can scale with increasing read lengths and production yields. RESULTS: In this article, we present the wavefront alignment algorithm (WFA), an exact gap-affine algorithm that takes advantage of homologous regions between the sequences to accelerate the alignment process. As opposed to traditional dynamic programming algorithms that run in quadratic time, the WFA runs in time O(ns), proportional to the read length n and the alignment score s, using O(s2) memory. Furthermore, our algorithm exhibits simple data dependencies that can be easily vectorized, even by the automatic features of modern compilers, for different architectures, without the need to adapt the code. We evaluate the performance of our algorithm, together with other state-of-the-art implementations. As a result, we demonstrate that the WFA runs 20-300× faster than other methods aligning short Illumina-like sequences, and 10-100× faster using long noisy reads like those produced by Oxford Nanopore Technologies. AVAILABILITY AND IMPLEMENTATION: The WFA algorithm is implemented within the wavefront-aligner library, and it is publicly available at https://github.com/smarco/WFA.
Authors: Baoxing Song; Santiago Marco-Sola; Miquel Moreto; Lynn Johnson; Edward S Buckler; Michelle C Stitzer Journal: Proc Natl Acad Sci U S A Date: 2022-01-04 Impact factor: 12.779