Literature DB >> 27283950

KCMBT: a k-mer Counter based on Multiple Burst Trees.

Abdullah-Al Mamun1, Soumitra Pal1, Sanguthevar Rajasekaran1.   

Abstract

MOTIVATION: A massive number of bioinformatics applications require counting of k-length substrings in genetically important long strings. A k-mer counter generates the frequencies of each k-length substring in genome sequences. Genome assembly, repeat detection, multiple sequence alignment, error detection and many other related applications use a k-mer counter as a building block. Very fast and efficient algorithms are necessary to count k-mers in large data sets to be useful in such applications.
RESULTS: We propose a novel trie-based algorithm for this k-mer counting problem. We compare our devised algorithm k-mer Counter based on Multiple Burst Trees (KCMBT) with available all well-known algorithms. Our experimental results show that KCMBT is around 30% faster than the previous best-performing algorithm KMC2 for human genome dataset. As another example, our algorithm is around six times faster than Jellyfish2. Overall, KCMBT is 20-30% faster than KMC2 on five benchmark data sets when both the algorithms were run using multiple threads.
AVAILABILITY AND IMPLEMENTATION: KCMBT is freely available on GitHub: (https://github.com/abdullah009/kcmbt_mt). CONTACT: rajasek@engr.uconn.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

Entities:  

Mesh:

Year:  2016        PMID: 27283950      PMCID: PMC5939891          DOI: 10.1093/bioinformatics/btw345

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  15 in total

1.  An Eulerian path approach to DNA fragment assembly.

Authors:  P A Pevzner; H Tang; M S Waterman
Journal:  Proc Natl Acad Sci U S A       Date:  2001-08-14       Impact factor: 11.205

2.  A fast, lock-free approach for efficient parallel counting of occurrences of k-mers.

Authors:  Guillaume Marçais; Carl Kingsford
Journal:  Bioinformatics       Date:  2011-01-07       Impact factor: 6.937

3.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs.

Authors:  Daniel R Zerbino; Ewan Birney
Journal:  Genome Res       Date:  2008-03-18       Impact factor: 9.043

4.  DSK: k-mer counting with very low memory usage.

Authors:  Guillaume Rizk; Dominique Lavenier; Rayan Chikhi
Journal:  Bioinformatics       Date:  2013-01-16       Impact factor: 6.937

5.  Turtle: identifying frequent k-mers with cache-efficient algorithms.

Authors:  Rajat Shuvro Roy; Debashish Bhattacharya; Alexander Schliep
Journal:  Bioinformatics       Date:  2014-03-10       Impact factor: 6.937

6.  KAnalyze: a fast versatile pipelined k-mer toolkit.

Authors:  Peter Audano; Fredrik Vannberg
Journal:  Bioinformatics       Date:  2014-03-18       Impact factor: 6.937

7.  Efficient counting of k-mers in DNA sequences using a bloom filter.

Authors:  Páll Melsted; Jonathan K Pritchard
Journal:  BMC Bioinformatics       Date:  2011-08-10       Impact factor: 3.169

8.  Quake: quality-aware detection and correction of sequencing errors.

Authors:  David R Kelley; Michael C Schatz; Steven L Salzberg
Journal:  Genome Biol       Date:  2010-11-29       Impact factor: 13.583

9.  A new method to compute K-mer frequencies and its application to annotate large repetitive plant genomes.

Authors:  Stefan Kurtz; Apurva Narechania; Joshua C Stein; Doreen Ware
Journal:  BMC Genomics       Date:  2008-10-31       Impact factor: 3.969

10.  Aggressive assembly of pyrosequencing reads with mates.

Authors:  Jason R Miller; Arthur L Delcher; Sergey Koren; Eli Venter; Brian P Walenz; Anushka Brownley; Justin Johnson; Kelvin Li; Clark Mobarry; Granger Sutton
Journal:  Bioinformatics       Date:  2008-10-24       Impact factor: 6.937

View more
  3 in total

1.  TahcoRoll: fast genomic signature profiling via thinned automaton and rolling hash.

Authors:  Chelsea J-T Ju; Jyun-Yu Jiang; Ruirui Li; Zeyu Li; Wei Wang
Journal:  Med Rev (Berl)       Date:  2022-02-14

2.  Gerbil: a fast and memory-efficient k-mer counter with GPU-support.

Authors:  Marius Erbert; Steffen Rechner; Matthias Müller-Hannemann
Journal:  Algorithms Mol Biol       Date:  2017-03-31       Impact factor: 1.405

3.  A benchmark study of k-mer counting methods for high-throughput sequencing.

Authors:  Swati C Manekar; Shailesh R Sathe
Journal:  Gigascience       Date:  2018-12-01       Impact factor: 6.524

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.