Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 KCMBT: a k-mer Counter based on Multiple Burst Trees.

Literature DB >> 27283950

KCMBT: a k-mer Counter based on Multiple Burst Trees.

Abdullah-Al Mamun¹, Soumitra Pal¹, Sanguthevar Rajasekaran¹.

Abstract

MOTIVATION: A massive number of bioinformatics applications require counting of k-length substrings in genetically important long strings. A k-mer counter generates the frequencies of each k-length substring in genome sequences. Genome assembly, repeat detection, multiple sequence alignment, error detection and many other related applications use a k-mer counter as a building block. Very fast and efficient algorithms are necessary to count k-mers in large data sets to be useful in such applications.
RESULTS: We propose a novel trie-based algorithm for this k-mer counting problem. We compare our devised algorithm k-mer Counter based on Multiple Burst Trees (KCMBT) with available all well-known algorithms. Our experimental results show that KCMBT is around 30% faster than the previous best-performing algorithm KMC2 for human genome dataset. As another example, our algorithm is around six times faster than Jellyfish2. Overall, KCMBT is 20-30% faster than KMC2 on five benchmark data sets when both the algorithms were run using multiple threads.
AVAILABILITY AND IMPLEMENTATION: KCMBT is freely available on GitHub: (https://github.com/abdullah009/kcmbt_mt). CONTACT: rajasek@engr.uconn.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities: Chemical Species

Mesh：

Year: 2016 PMID： 27283950 PMCID： PMC5939891 DOI： 10.1093/bioinformatics/btw345

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

15 in total

KCMBT: a k-mer Counter based on Multiple Burst Trees.

1. An Eulerian path approach to DNA fragment assembly.

2. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers.

3. Velvet: algorithms for de novo short read assembly using de Bruijn graphs.

4. DSK: k-mer counting with very low memory usage.

5. Turtle: identifying frequent k-mers with cache-efficient algorithms.

6. KAnalyze: a fast versatile pipelined k-mer toolkit.

7. Efficient counting of k-mers in DNA sequences using a bloom filter.

8. Quake: quality-aware detection and correction of sequencing errors.

9. A new method to compute K-mer frequencies and its application to annotate large repetitive plant genomes.

10. Aggressive assembly of pyrosequencing reads with mates.

1. TahcoRoll: fast genomic signature profiling via thinned automaton and rolling hash.

2. Gerbil: a fast and memory-efficient k-mer counter with GPU-support.

3. A benchmark study of k-mer counting methods for high-throughput sequencing.