Literature DB >> 35041495

MONI: A Pangenomic Index for Finding Maximal Exact Matches.

Massimiliano Rossi1, Marco Oliva1, Ben Langmead2, Travis Gagie3, Christina Boucher1.   

Abstract

Recently, Gagie et al. proposed a version of the FM-index, called the r-index, that can store thousands of human genomes on a commodity computer. Then Kuhnle et al. showed how to build the r-index efficiently via a technique called prefix-free parsing (PFP) and demonstrated its effectiveness for exact pattern matching. Exact pattern matching can be leveraged to support approximate pattern matching, but the r-index itself cannot support efficiently popular and important queries such as finding maximal exact matches (MEMs). To address this shortcoming, Bannai et al. introduced the concept of thresholds, and showed that storing them together with the r-index enables efficient MEM finding-but they did not say how to find those thresholds. We present a novel algorithm that applies PFP to build the r-index and find the thresholds simultaneously and in linear time and space with respect to the size of the prefix-free parse. Our implementation called MONI can rapidly find MEMs between reads and large-sequence collections of highly repetitive sequences. Compared with other read aligners-PuffAligner, Bowtie2, BWA-MEM, and CHIC- MONI used 2-11 times less memory and was 2-32 times faster for index construction. Moreover, MONI was less than one thousandth the size of competing indexes for large collections of human chromosomes. Thus, MONI represents a major advance in our ability to perform MEM finding against very large collections of related references.

Entities:  

Keywords:  MEM-finding; r-index; run-length-encoded Burrows-Wheeler transform; thresholds

Mesh:

Year:  2022        PMID: 35041495      PMCID: PMC8892979          DOI: 10.1089/cmb.2021.0290

Source DB:  PubMed          Journal:  J Comput Biol        ISSN: 1066-5277            Impact factor:   1.479


  24 in total

1.  STAR: ultrafast universal RNA-seq aligner.

Authors:  Alexander Dobin; Carrie A Davis; Felix Schlesinger; Jorg Drenkow; Chris Zaleski; Sonali Jha; Philippe Batut; Mark Chaisson; Thomas R Gingeras
Journal:  Bioinformatics       Date:  2012-10-25       Impact factor: 6.937

2.  deBGA: read alignment with de Bruijn graph-based seed and extension.

Authors:  Bo Liu; Hongzhe Guo; Michael Brudno; Yadong Wang
Journal:  Bioinformatics       Date:  2016-07-04       Impact factor: 6.937

3.  Identification of common molecular subsequences.

Authors:  T F Smith; M S Waterman
Journal:  J Mol Biol       Date:  1981-03-25       Impact factor: 5.469

4.  The 100 000 Genomes Project: bringing whole genome sequencing to the NHS.

Authors:  Clare Turnbull; Richard H Scott; Ellen Thomas; Louise Jones; Nirupa Murugaesu; Freya Boardman Pretty; Dina Halai; Emma Baple; Clare Craig; Angela Hamblin; Shirley Henderson; Christine Patch; Amanda O'Neill; Katherine Smith; Antonio Rueda Martin; Alona Sosinsky; Ellen M McDonagh; Razvan Sultana; Michael Mueller; Damian Smedley; Adam Toms; Lisa Dinh; Tom Fowler; Mark Bale; Tim Hubbard; Augusto Rendon; Sue Hill; Mark J Caulfield
Journal:  BMJ       Date:  2018-04-24

5.  The Public Health Impact of a Publically Available, Environmental Database of Microbial Genomes.

Authors:  Eric L Stevens; Ruth Timme; Eric W Brown; Marc W Allard; Errol Strain; Kelly Bunning; Steven Musser
Journal:  Front Microbiol       Date:  2017-05-09       Impact factor: 5.640

6.  Introducing difference recurrence relations for faster semi-global alignment of long sequences.

Authors:  Hajime Suzuki; Masahiro Kasahara
Journal:  BMC Bioinformatics       Date:  2018-02-19       Impact factor: 3.169

7.  Variation graph toolkit improves read mapping by representing genetic variation in the reference.

Authors:  Erik Garrison; Jouni Sirén; Adam M Novak; Glenn Hickey; Jordan M Eizenga; Eric T Dawson; William Jones; Shilpa Garg; Charles Markello; Michael F Lin; Benedict Paten; Richard Durbin
Journal:  Nat Biotechnol       Date:  2018-08-20       Impact factor: 54.908

8.  Prefix-free parsing for building big BWTs.

Authors:  Christina Boucher; Travis Gagie; Alan Kuhnle; Ben Langmead; Giovanni Manzini; Taher Mun
Journal:  Algorithms Mol Biol       Date:  2019-05-24       Impact factor: 1.405

9.  Fast and accurate short read alignment with Burrows-Wheeler transform.

Authors:  Heng Li; Richard Durbin
Journal:  Bioinformatics       Date:  2009-05-18       Impact factor: 6.937

10.  Puffaligner : A Fast, Efficient, and Accurate Aligner Based on the Pufferfish Index.

Authors:  Fatemeh Almodaresi; Mohsen Zakeri; Rob Patro
Journal:  Bioinformatics       Date:  2021-06-12       Impact factor: 6.931

View more
  1 in total

1.  Finding Maximal Exact Matches Using the r-Index.

Authors:  Massimiliano Rossi; Marco Oliva; Paola Bonizzoni; Ben Langmead; Travis Gagie; Christina Boucher
Journal:  J Comput Biol       Date:  2022-01-17       Impact factor: 1.479

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.