Literature DB >> 29444235

Squeakr: an exact and approximate k-mer counting system.

Prashant Pandey1, Michael A Bender1, Rob Johnson1,2, Rob Patro1, Bonnie Berger.   

Abstract

Motivation: k-mer-based algorithms have become increasingly popular in the processing of high-throughput sequencing data. These algorithms span the gamut of the analysis pipeline from k-mer counting (e.g. for estimating assembly parameters), to error correction, genome and transcriptome assembly, and even transcript quantification. Yet, these tasks often use very different k-mer representations and data structures. In this article, we show how to build a k-mer-counting and multiset-representation system using the counting quotient filter, a feature-rich approximate membership query data structure. We introduce the k-mer-counting/querying system Squeakr (Simple Quotient filter-based Exact and Approximate Kmer Representation), which is based on the counting quotient filter. This off-the-shelf data structure turns out to be an efficient (approximate or exact) representation for sets or multisets of k-mers.
Results: Squeakr takes 2×-4.3× less time than the state-of-the-art to count and perform a random-point-query workload. Squeakr is memory-efficient, consuming 1.5×-4.3× less memory than the state-of-the-art. It offers competitive counting performance. In fact, it is faster for larger k-mers, and answers point queries (i.e. queries for the abundance of a particular k-mer) over an order-of-magnitude faster than other systems. The Squeakr representation of the k-mer multiset turns out to be immediately useful for downstream processing (e.g. de Bruijn graph traversal) because it supports fast queries and dynamic k-mer insertion, deletion, and modification. Availability and implementation: https://github.com/splatlab/squeakr available under BSD 3-Clause License. Contact: ppandey@cs.stonybrook.edu. Supplementary information: Supplementary data are available at Bioinformatics online.
© The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

Entities:  

Mesh:

Year:  2018        PMID: 29444235     DOI: 10.1093/bioinformatics/btx636

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  15 in total

Review 1.  Multimodal Long Noncoding RNA Interaction Networks: Control Panels for Cell Fate Specification.

Authors:  Keriayn N Smith; Sarah C Miller; Gabriele Varani; J Mauro Calabrese; Terry Magnuson
Journal:  Genetics       Date:  2019-12       Impact factor: 4.562

2.  Classification of Long Noncoding RNAs by k-mer Content.

Authors:  Jessime M Kirk; Daniel Sprague; J Mauro Calabrese
Journal:  Methods Mol Biol       Date:  2021

3.  SPRISS: Approximating Frequent K-mers by Sampling Reads, and Applications.

Authors:  Diego Santoro; Leonardo Pellegrina; Matteo Comin; Fabio Vandin
Journal:  Bioinformatics       Date:  2022-05-18       Impact factor: 6.931

4.  An Efficient, Scalable, and Exact Representation of High-Dimensional Color Information Enabled Using de Bruijn Graph Search.

Authors:  Fatemeh Almodaresi; Prashant Pandey; Michael Ferdman; Rob Johnson; Rob Patro
Journal:  J Comput Biol       Date:  2020-03-16       Impact factor: 1.479

5.  Representation of k-Mer Sets Using Spectrum-Preserving String Sets.

Authors:  Amatur Rahman; Paul Medevedev
Journal:  J Comput Biol       Date:  2020-12-07       Impact factor: 1.479

6.  An Incrementally Updatable and Scalable System for Large-Scale Sequence Search using the Bentley-Saxe Transformation.

Authors:  Fatemeh Almodaresi; Jamshed Khan; Sergey Madaminov; Michael Ferdman; Rob Johnson; Prashant Pandey; Rob Patro
Journal:  Bioinformatics       Date:  2022-03-23       Impact factor: 6.931

Review 7.  When the levee breaks: a practical guide to sketching algorithms for processing the flood of genomic data.

Authors:  Will P M Rowe
Journal:  Genome Biol       Date:  2019-09-13       Impact factor: 13.583

8.  Disk compression of k-mer sets.

Authors:  Amatur Rahman; Rayan Chikhi; Paul Medvedev
Journal:  Algorithms Mol Biol       Date:  2021-06-21       Impact factor: 1.405

9.  deBGR: an efficient and near-exact representation of the weighted de Bruijn graph.

Authors:  Prashant Pandey; Michael A Bender; Rob Johnson; Rob Patro
Journal:  Bioinformatics       Date:  2017-07-15       Impact factor: 6.937

10.  A benchmark study of k-mer counting methods for high-throughput sequencing.

Authors:  Swati C Manekar; Shailesh R Sathe
Journal:  Gigascience       Date:  2018-12-01       Impact factor: 6.524

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.