| Literature DB >> 26982880 |
Nelson Pérez1, Miguel Gutierrez1, Nelson Vera1.
Abstract
This article is about the assessment of several tools for k-mer counting, with the purpose to create a reference framework for bioinformatics researchers to identify computational requirements, parallelizing, advantages, disadvantages, and bottlenecks of each of the algorithms proposed in the tools. The k-mer counters evaluated in this article were BFCounter, DSK, Jellyfish, KAnalyze, KHMer, KMC2, MSPKmerCounter, Tallymer, and Turtle. Measured parameters were the following: RAM occupied space, processing time, parallelization, and read and write disk access. A dataset consisting of 36,504,800 reads was used corresponding to the 14th human chromosome. The assessment was performed for two k-mer lengths: 31 and 55. Obtained results were the following: pure Bloom filter-based tools and disk-partitioning techniques showed a lesser RAM use. The tools that took less execution time were the ones that used disk-partitioning techniques. The techniques that made the major parallelization were the ones that used disk partitioning, hash tables with lock-free approach, or multiple hash tables.Entities:
Keywords: computational performance assessment; data structures; k-mer counters
Mesh:
Year: 2016 PMID: 26982880 DOI: 10.1089/cmb.2015.0199
Source DB: PubMed Journal: J Comput Biol ISSN: 1066-5277 Impact factor: 1.479