Literature DB >> 29708767

A Fast Approximate Algorithm for Mapping Long Reads to Large Reference Databases.

Chirag Jain1,2, Alexander Dilthey2, Sergey Koren2, Srinivas Aluru1, Adam M Phillippy2.   

Abstract

Emerging single-molecule sequencing technologies from Pacific Biosciences and Oxford Nanopore have revived interest in long-read mapping algorithms. Alignment-based seed-and-extend methods demonstrate good accuracy, but face limited scalability, while faster alignment-free methods typically trade decreased precision for efficiency. In this article, we combine a fast approximate read mapping algorithm based on minimizers with a novel MinHash identity estimation technique to achieve both scalability and precision. In contrast to prior methods, we develop a mathematical framework that defines the types of mapping targets we uncover, establish probabilistic estimates of p-value and sensitivity, and demonstrate tolerance for alignment error rates up to 20%. With this framework, our algorithm automatically adapts to different minimum length and identity requirements and provides both positional and identity estimates for each mapping reported. For mapping human PacBio reads to the hg38 reference, our method is 290 × faster than Burrows-Wheeler Aligner-MEM with a lower memory footprint and recall rate of 96%. We further demonstrate the scalability of our method by mapping noisy PacBio reads (each ≥5 kbp in length) to the complete NCBI RefSeq database containing 838 Gbp of sequence and >60,000 genomes.

Entities:  

Keywords:  Jaccard; MinHash; long-read mapping; minimizers; sketching; winnowing

Mesh:

Year:  2018        PMID: 29708767      PMCID: PMC6067103          DOI: 10.1089/cmb.2018.0036

Source DB:  PubMed          Journal:  J Comput Biol        ISSN: 1066-5277            Impact factor:   1.479


  20 in total

1.  Fast algorithms for large-scale genome alignment and comparison.

Authors:  Arthur L Delcher; Adam Phillippy; Jane Carlton; Steven L Salzberg
Journal:  Nucleic Acids Res       Date:  2002-06-01       Impact factor: 16.971

2.  Reducing storage requirements for biological sequence comparison.

Authors:  Michael Roberts; Wayne Hayes; Brian R Hunt; Stephen M Mount; James A Yorke
Journal:  Bioinformatics       Date:  2004-07-15       Impact factor: 6.937

Review 3.  A survey of sequence alignment algorithms for next-generation sequencing.

Authors:  Heng Li; Nils Homer
Journal:  Brief Bioinform       Date:  2010-05-11       Impact factor: 11.622

4.  Assembling large genomes with single-molecule sequencing and locality-sensitive hashing.

Authors:  Konstantin Berlin; Sergey Koren; Chen-Shan Chin; James P Drake; Jane M Landolin; Adam M Phillippy
Journal:  Nat Biotechnol       Date:  2015-05-25       Impact factor: 54.908

5.  MinION nanopore sequencing identifies the position and structure of a bacterial antibiotic resistance island.

Authors:  Philip M Ashton; Satheesh Nair; Tim Dallman; Salvatore Rubino; Wolfgang Rabsch; Solomon Mwaigwisya; John Wain; Justin O'Grady
Journal:  Nat Biotechnol       Date:  2014-12-08       Impact factor: 54.908

6.  Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.

Authors:  Chen-Shan Chin; David H Alexander; Patrick Marks; Aaron A Klammer; James Drake; Cheryl Heiner; Alicia Clum; Alex Copeland; John Huddleston; Evan E Eichler; Stephen W Turner; Jonas Korlach
Journal:  Nat Methods       Date:  2013-05-05       Impact factor: 28.547

7.  Fast gapped-read alignment with Bowtie 2.

Authors:  Ben Langmead; Steven L Salzberg
Journal:  Nat Methods       Date:  2012-03-04       Impact factor: 28.547

8.  Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences.

Authors:  Heng Li
Journal:  Bioinformatics       Date:  2016-03-19       Impact factor: 6.937

9.  Mash: fast genome and metagenome distance estimation using MinHash.

Authors:  Brian D Ondov; Todd J Treangen; Páll Melsted; Adam B Mallonee; Nicholas H Bergman; Sergey Koren; Adam M Phillippy
Journal:  Genome Biol       Date:  2016-06-20       Impact factor: 13.583

10.  Real-time selective sequencing using nanopore technology.

Authors:  Matthew Loose; Sunir Malla; Michael Stout
Journal:  Nat Methods       Date:  2016-07-25       Impact factor: 28.547

View more
  21 in total

Review 1.  Application of computational approaches to analyze metagenomic data.

Authors:  Ho-Jin Gwak; Seung Jae Lee; Mina Rho
Journal:  J Microbiol       Date:  2021-02-10       Impact factor: 3.422

2.  Weighted minimizer sampling improves long read mapping.

Authors:  Chirag Jain; Arang Rhie; Haowen Zhang; Claudia Chu; Brian P Walenz; Sergey Koren; Adam M Phillippy
Journal:  Bioinformatics       Date:  2020-07-01       Impact factor: 6.937

3.  On the transformation of MinHash-based uncorrected distances into proper evolutionary distances for phylogenetic inference.

Authors:  Alexis Criscuolo
Journal:  F1000Res       Date:  2020-11-10

4.  Long-read mapping to repetitive reference sequences using Winnowmap2.

Authors:  Chirag Jain; Arang Rhie; Nancy F Hansen; Sergey Koren; Adam M Phillippy
Journal:  Nat Methods       Date:  2022-04-01       Impact factor: 28.547

5.  The minimizer Jaccard estimator is biased and inconsistent.

Authors:  Mahdi Belbasi; Antonio Blanca; Robert S Harris; David Koslicki; Paul Medvedev
Journal:  Bioinformatics       Date:  2022-06-24       Impact factor: 6.931

6.  ANI analysis of poxvirus genomes reveals its potential application to viral species rank demarcation.

Authors:  Zhaobin Deng; Xuyang Xia; Yiqi Deng; Mingde Zhao; Congwei Gu; Yi Geng; Jun Wang; Qian Yang; Manli He; Qihai Xiao; Wudian Xiao; Lvqin He; Sicheng Liang; Heng Xu; Muhan Lü; Zehui Yu
Journal:  Virus Evol       Date:  2022-05-28

7.  To Petabytes and beyond: recent advances in probabilistic and signal processing algorithms and their application to metagenomics.

Authors:  R A Leo Elworth; Qi Wang; Pavan K Kota; C J Barberan; Benjamin Coleman; Advait Balaji; Gaurav Gupta; Richard G Baraniuk; Anshumali Shrivastava; Todd J Treangen
Journal:  Nucleic Acids Res       Date:  2020-06-04       Impact factor: 16.971

Review 8.  Whole-Genome Alignment and Comparative Annotation.

Authors:  Joel Armstrong; Ian T Fiddes; Mark Diekhans; Benedict Paten
Journal:  Annu Rev Anim Biosci       Date:  2018-10-31       Impact factor: 8.923

Review 9.  The microbiome and host mucosal interactions in urinary tract diseases.

Authors:  Bernadette Jones-Freeman; Michelle Chonwerawong; Vanessa R Marcelino; Aniruddh V Deshpande; Samuel C Forster; Malcolm R Starkey
Journal:  Mucosal Immunol       Date:  2021-02-04       Impact factor: 7.313

10.  Vulcan: Improved long-read mapping and structural variant calling via dual-mode alignment.

Authors:  Yilei Fu; Medhat Mahmoud; Viginesh Vaibhav Muraliraman; Fritz J Sedlazeck; Todd J Treangen
Journal:  Gigascience       Date:  2021-09-24       Impact factor: 6.524

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.