Heng Li1. 1. Department of Medical Population Genetics Program, Broad Institute, Cambridge, MA, USA.
Abstract
Motivation: Recent advances in sequencing technologies promise ultra-long reads of ∼100 kb in average, full-length mRNA or cDNA reads in high throughput and genomic contigs over 100 Mb in length. Existing alignment programs are unable or inefficient to process such data at scale, which presses for the development of new alignment algorithms. Results: Minimap2 is a general-purpose alignment program to map DNA or long mRNA sequences against a large reference database. It works with accurate short reads of ≥100 bp in length, ≥1 kb genomic reads at error rate ∼15%, full-length noisy Direct RNA or cDNA reads and assembly contigs or closely related full chromosomes of hundreds of megabases in length. Minimap2 does split-read alignment, employs concave gap cost for long insertions and deletions and introduces new heuristics to reduce spurious alignments. It is 3-4 times as fast as mainstream short-read mappers at comparable accuracy, and is ≥30 times faster than long-read genomic or cDNA mappers at higher accuracy, surpassing most aligners specialized in one type of alignment. Availability and implementation: https://github.com/lh3/minimap2. Supplementary information: Supplementary data are available at Bioinformatics online.
Motivation: Recent advances in sequencing technologies promise ultra-long reads of ∼100 kb in average, full-length mRNA or cDNA reads in high throughput and genomic contigs over 100 Mb in length. Existing alignment programs are unable or inefficient to process such data at scale, which presses for the development of new alignment algorithms. Results: Minimap2 is a general-purpose alignment program to map DNA or long mRNA sequences against a large reference database. It works with accurate short reads of ≥100 bp in length, ≥1 kb genomic reads at error rate ∼15%, full-length noisy Direct RNA or cDNA reads and assembly contigs or closely related full chromosomes of hundreds of megabases in length. Minimap2 does split-read alignment, employs concave gap cost for long insertions and deletions and introduces new heuristics to reduce spurious alignments. It is 3-4 times as fast as mainstream short-read mappers at comparable accuracy, and is ≥30 times faster than long-read genomic or cDNA mappers at higher accuracy, surpassing most aligners specialized in one type of alignment. Availability and implementation: https://github.com/lh3/minimap2. Supplementary information: Supplementary data are available at Bioinformatics online.
Authors: Konstantin Berlin; Sergey Koren; Chen-Shan Chin; James P Drake; Jane M Landolin; Adam M Phillippy Journal: Nat Biotechnol Date: 2015-05-25 Impact factor: 54.908
Authors: Alexander Dobin; Carrie A Davis; Felix Schlesinger; Jorg Drenkow; Chris Zaleski; Sonali Jha; Philippe Batut; Mark Chaisson; Thomas R Gingeras Journal: Bioinformatics Date: 2012-10-25 Impact factor: 6.937
Authors: Ashley Byrne; Anna E Beaudin; Hugh E Olsen; Miten Jain; Charles Cole; Theron Palmer; Rebecca M DuBois; E Camilla Forsberg; Mark Akeson; Christopher Vollmers Journal: Nat Commun Date: 2017-07-19 Impact factor: 14.919
Authors: Guillaume Marçais; Arthur L Delcher; Adam M Phillippy; Rachel Coston; Steven L Salzberg; Aleksey Zimin Journal: PLoS Comput Biol Date: 2018-01-26 Impact factor: 4.475
Authors: Fritz J Sedlazeck; Philipp Rescheneder; Moritz Smolka; Han Fang; Maria Nattestad; Arndt von Haeseler; Michael C Schatz Journal: Nat Methods Date: 2018-04-30 Impact factor: 28.547
Authors: Susanne S Renner; Shan Wu; Oscar A Pérez-Escobar; Martina V Silber; Zhangjun Fei; Guillaume Chomicki Journal: Proc Natl Acad Sci U S A Date: 2021-06-08 Impact factor: 11.205
Authors: Paras Garg; Bharati Jadhav; Oscar L Rodriguez; Nihir Patel; Alejandro Martin-Trujillo; Miten Jain; Sofie Metsu; Hugh Olsen; Benedict Paten; Beate Ritz; R Frank Kooy; Jozef Gecz; Andrew J Sharp Journal: Am J Hum Genet Date: 2020-09-15 Impact factor: 11.025