Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 DSK: k-mer counting with very low memory usage.

Literature DB >> 23325618

DSK: k-mer counting with very low memory usage.

Guillaume Rizk¹, Dominique Lavenier, Rayan Chikhi.
1. Algorizk, 75013 Paris, France.

Abstract

SUMMARY: Counting all the k-mers (substrings of length k) in DNA/RNA sequencing reads is the preliminary step of many bioinformatics applications. However, state of the art k-mer counting methods require that a large data structure resides in memory. Such structure typically grows with the number of distinct k-mers to count. We present a new streaming algorithm for k-mer counting, called DSK (disk streaming of k-mers), which only requires a fixed user-defined amount of memory and disk space. This approach realizes a memory, time and disk trade-off. The multi-set of all k-mers present in the reads is partitioned, and partitions are saved to disk. Then, each partition is separately loaded in memory in a temporary hash table. The k-mer counts are returned by traversing each hash table. Low-abundance k-mers are optionally filtered. DSK is the first approach that is able to count all the 27-mers of a human genome dataset using only 4.0 GB of memory and moderate disk space (160 GB), in 17.9 h. DSK can replace a popular k-mer counting software (Jellyfish) on small-memory servers. AVAILABILITY: http://minia.genouest.org/dsk

Entities: Species

Mesh：

Year: 2013 PMID： 23325618 DOI： 10.1093/bioinformatics/btt020

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

Keyword Cloud
Cited

78 in total

DSK: k-mer counting with very low memory usage.

1. Whole-genome re-sequencing of non-model organisms: lessons from unmapped reads.

2. Phenetic Comparison of Prokaryotic Genomes Using k-mers.

3. An efficient classification algorithm for NGS data based on text similarity.

4. KCMBT: a k-mer Counter based on Multiple Burst Trees.

5. Portable nanopore analytics: are we there yet?

6. Nebula: ultra-efficient mapping-free structural variant genotyper.

7. Full Molecular Typing of Neisseria meningitidis Directly from Clinical Specimens for Outbreak Investigation.

8. RecoverY: k-mer-based read classification for Y-chromosome-specific sequencing and assembly.

9. KAnalyze: a fast versatile pipelined k-mer toolkit.

10. swga: a primer design toolkit for selective whole genome amplification.