Literature DB >> 33604186

Syncmers are more sensitive than minimizers for selecting conserved k‑mers in biological sequences.

Robert Edgar1.   

Abstract

Minimizers are widely used to select subsets of fixed-length substrings (k-mers) from biological sequences in applications ranging from read mapping to taxonomy prediction and indexing of large datasets. The minimizer of a string of w consecutive k-mers is the k-mer with smallest value according to an ordering of all k-mers. Syncmers are defined here as a family of alternative methods which select k-mers by inspecting the position of the smallest-valued substring of length s < k within the k-mer. For example, a closed syncmer is selected if its smallest s-mer is at the start or end of the k-mer. At least one closed syncmer must be found in every window of length (k - s) k-mers. Unlike a minimizer, a syncmer is identified by its sequence alone, and is therefore synchronized in the following sense: if a given k-mer is selected from one sequence, it will also be selected from any other sequence. Also, minimizers can be deleted by mutations in flanking sequence, which cannot happen with syncmers. Experiments on minimizers with parameters used in the minimap2 read mapper and Kraken taxonomy prediction algorithm respectively show that syncmers can simultaneously achieve both lower density and higher conservation compared to minimizers.
© 2021 Edgar.

Entities:  

Keywords:  Alignment-free methods; Minimizers; Sequence analysis; String index; k-mers

Year:  2021        PMID: 33604186      PMCID: PMC7869670          DOI: 10.7717/peerj.10805

Source DB:  PubMed          Journal:  PeerJ        ISSN: 2167-8359            Impact factor:   2.984


  15 in total

Review 1.  Applications of next-generation sequencing technologies in functional genomics.

Authors:  Olena Morozova; Marco A Marra
Journal:  Genomics       Date:  2008-08-24       Impact factor: 5.736

Review 2.  Microbial metagenomics: beyond the genome.

Authors:  Jack A Gilbert; Christopher L Dupont
Journal:  Ann Rev Mar Sci       Date:  2011

Review 3.  Next-generation sequencing: big data meets high performance computing.

Authors:  Bertil Schmidt; Andreas Hildebrandt
Journal:  Drug Discov Today       Date:  2017-02-02       Impact factor: 7.851

4.  Minimap2: pairwise alignment for nucleotide sequences.

Authors:  Heng Li
Journal:  Bioinformatics       Date:  2018-09-15       Impact factor: 6.937

5.  Weighted minimizer sampling improves long read mapping.

Authors:  Chirag Jain; Arang Rhie; Haowen Zhang; Claudia Chu; Brian P Walenz; Sergey Koren; Adam M Phillippy
Journal:  Bioinformatics       Date:  2020-07-01       Impact factor: 6.937

6.  Minimus: a fast, lightweight genome assembler.

Authors:  Daniel D Sommer; Arthur L Delcher; Steven L Salzberg; Mihai Pop
Journal:  BMC Bioinformatics       Date:  2007-02-26       Impact factor: 3.169

7.  Asymptotically optimal minimizers schemes.

Authors:  Guillaume Marçais; Dan DeBlasio; Carl Kingsford
Journal:  Bioinformatics       Date:  2018-07-01       Impact factor: 6.937

8.  Database indexing for production MegaBLAST searches.

Authors:  Aleksandr Morgulis; George Coulouris; Yan Raytselis; Thomas L Madden; Richa Agarwala; Alejandro A Schäffer
Journal:  Bioinformatics       Date:  2008-06-21       Impact factor: 6.937

9.  Kraken: ultrafast metagenomic sequence classification using exact alignments.

Authors:  Derrick E Wood; Steven L Salzberg
Journal:  Genome Biol       Date:  2014-03-03       Impact factor: 13.583

10.  Designing small universal k-mer hitting sets for improved analysis of high-throughput sequencing.

Authors:  Yaron Orenstein; David Pellow; Guillaume Marçais; Ron Shamir; Carl Kingsford
Journal:  PLoS Comput Biol       Date:  2017-10-02       Impact factor: 4.475

View more
  7 in total

1.  The minimizer Jaccard estimator is biased and inconsistent.

Authors:  Mahdi Belbasi; Antonio Blanca; Robert S Harris; David Koslicki; Paul Medvedev
Journal:  Bioinformatics       Date:  2022-06-24       Impact factor: 6.931

2.  Minimizer-space de Bruijn graphs: Whole-genome assembly of long reads in minutes on a personal computer.

Authors:  Barış Ekim; Bonnie Berger; Rayan Chikhi
Journal:  Cell Syst       Date:  2021-09-14       Impact factor: 10.304

3.  Effective sequence similarity detection with strobemers.

Authors:  Kristoffer Sahlin
Journal:  Genome Res       Date:  2021-10-19       Impact factor: 9.043

Review 4.  Multiple genome alignment in the telomere-to-telomere assembly era.

Authors:  Bryce Kille; Advait Balaji; Fritz J Sedlazeck; Michael Nute; Todd J Treangen
Journal:  Genome Biol       Date:  2022-08-29       Impact factor: 17.906

Review 5.  From molecules to genomic variations: Accelerating genome analysis via intelligent algorithms and architectures.

Authors:  Mohammed Alser; Joel Lindegger; Can Firtina; Nour Almadhoun; Haiyu Mao; Gagandeep Singh; Juan Gomez-Luna; Onur Mutlu
Journal:  Comput Struct Biotechnol J       Date:  2022-08-18       Impact factor: 6.155

6.  Theory of local k-mer selection with applications to long-read alignment.

Authors:  Jim Shaw; Yun William Yu
Journal:  Bioinformatics       Date:  2022-10-14       Impact factor: 6.931

7.  Sequence-specific minimizers via polar sets.

Authors:  Hongyu Zheng; Carl Kingsford; Guillaume Marçais
Journal:  Bioinformatics       Date:  2021-07-12       Impact factor: 6.937

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.