Literature DB >> 23325618

DSK: k-mer counting with very low memory usage.

Guillaume Rizk1, Dominique Lavenier, Rayan Chikhi.   

Abstract

SUMMARY: Counting all the k-mers (substrings of length k) in DNA/RNA sequencing reads is the preliminary step of many bioinformatics applications. However, state of the art k-mer counting methods require that a large data structure resides in memory. Such structure typically grows with the number of distinct k-mers to count. We present a new streaming algorithm for k-mer counting, called DSK (disk streaming of k-mers), which only requires a fixed user-defined amount of memory and disk space. This approach realizes a memory, time and disk trade-off. The multi-set of all k-mers present in the reads is partitioned, and partitions are saved to disk. Then, each partition is separately loaded in memory in a temporary hash table. The k-mer counts are returned by traversing each hash table. Low-abundance k-mers are optionally filtered. DSK is the first approach that is able to count all the 27-mers of a human genome dataset using only 4.0 GB of memory and moderate disk space (160 GB), in 17.9 h. DSK can replace a popular k-mer counting software (Jellyfish) on small-memory servers. AVAILABILITY: http://minia.genouest.org/dsk

Entities:  

Mesh:

Year:  2013        PMID: 23325618     DOI: 10.1093/bioinformatics/btt020

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  78 in total

1.  Whole-genome re-sequencing of non-model organisms: lessons from unmapped reads.

Authors:  A Gouin; F Legeai; P Nouhaud; A Whibley; J-C Simon; C Lemaitre
Journal:  Heredity (Edinb)       Date:  2014-10-01       Impact factor: 3.821

2.  Phenetic Comparison of Prokaryotic Genomes Using k-mers.

Authors:  Maxime Déraspe; Frédéric Raymond; Sébastien Boisvert; Alexander Culley; Paul H Roy; François Laviolette; Jacques Corbeil
Journal:  Mol Biol Evol       Date:  2017-10-01       Impact factor: 16.240

3.  An efficient classification algorithm for NGS data based on text similarity.

Authors:  Xiangyu Liao; Xingyu Liao; Wufei Zhu; Lu Fang; Xing Chen
Journal:  Genet Res (Camb)       Date:  2018-09-17       Impact factor: 1.588

4.  KCMBT: a k-mer Counter based on Multiple Burst Trees.

Authors:  Abdullah-Al Mamun; Soumitra Pal; Sanguthevar Rajasekaran
Journal:  Bioinformatics       Date:  2016-06-09       Impact factor: 6.937

5.  Portable nanopore analytics: are we there yet?

Authors:  Marco Oliva; Franco Milicchio; Kaden King; Grace Benson; Christina Boucher; Mattia Prosperi
Journal:  Bioinformatics       Date:  2020-08-15       Impact factor: 6.937

6.  Nebula: ultra-efficient mapping-free structural variant genotyper.

Authors:  Parsoa Khorsand; Fereydoun Hormozdiari
Journal:  Nucleic Acids Res       Date:  2021-05-07       Impact factor: 16.971

7.  Full Molecular Typing of Neisseria meningitidis Directly from Clinical Specimens for Outbreak Investigation.

Authors:  Mark Itsko; Adam C Retchless; Sandeep J Joseph; Abigail Norris Turner; Jose A Bazan; Adodo Yao Sadji; Rasmata Ouédraogo-Traoré; Xin Wang
Journal:  J Clin Microbiol       Date:  2020-11-18       Impact factor: 5.948

8.  RecoverY: k-mer-based read classification for Y-chromosome-specific sequencing and assembly.

Authors:  Samarth Rangavittal; Robert S Harris; Monika Cechova; Marta Tomaszkiewicz; Rayan Chikhi; Kateryna D Makova; Paul Medvedev
Journal:  Bioinformatics       Date:  2018-04-01       Impact factor: 6.937

9.  KAnalyze: a fast versatile pipelined k-mer toolkit.

Authors:  Peter Audano; Fredrik Vannberg
Journal:  Bioinformatics       Date:  2014-03-18       Impact factor: 6.937

10.  swga: a primer design toolkit for selective whole genome amplification.

Authors:  Erik L Clarke; Sesh A Sundararaman; Stephanie N Seifert; Frederic D Bushman; Beatrice H Hahn; Dustin Brisson
Journal:  Bioinformatics       Date:  2017-07-15       Impact factor: 6.937

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.