Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 KMC 2: fast and resource-frugal k-mer counting.

Literature DB >> 25609798

KMC 2: fast and resource-frugal k-mer counting.

Sebastian Deorowicz¹, Marek Kokot¹, Szymon Grabowski¹, Agnieszka Debudaj-Grabysz¹.

Abstract

MOTIVATION: Building the histogram of occurrences of every k-symbol long substring of nucleotide data is a standard step in many bioinformatics applications, known under the name of k-mer counting. Its applications include developing de Bruijn graph genome assemblers, fast multiple sequence alignment and repeat detection. The tremendous amounts of NGS data require fast algorithms for k-mer counting, preferably using moderate amounts of memory.
RESULTS: We present a novel method for k-mer counting, on large datasets about twice faster than the strongest competitors (Jellyfish 2, KMC 1), using about 12 GB (or less) of RAM. Our disk-based method bears some resemblance to MSPKmerCounter, yet replacing the original minimizers with signatures (a carefully selected subset of all minimizers) and using (k, x)-mers allows to significantly reduce the I/O and a highly parallel overall architecture allows to achieve unprecedented processing speeds. For example, KMC 2 counts the 28-mers of a human reads collection with 44-fold coverage (106 GB of compressed size) in about 20 min, on a 6-core Intel i7 PC with an solid-state disk.

Entities: Species

Mesh：

Year: 2015 PMID： 25609798 DOI： 10.1093/bioinformatics/btv022

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

Keyword Cloud
Cited

59 in total

1. Phenetic Comparison of Prokaryotic Genomes Using k-mers.

Authors: Maxime Déraspe; Frédéric Raymond; Sébastien Boisvert; Alexander Culley; Paul H Roy; François Laviolette; Jacques Corbeil
Journal: Mol Biol Evol Date: 2017-10-01 Impact factor: 16.240

2. BFC: correcting Illumina sequencing errors.

Authors: Heng Li
Journal: Bioinformatics Date: 2015-05-06 Impact factor: 6.937

3. Sequencing and de novo assembly of 150 genomes from Denmark as a population reference.

Authors: Lasse Maretty; Jacob Malte Jensen; Bent Petersen; Jonas Andreas Sibbesen; Siyang Liu; Palle Villesen; Laurits Skov; Kirstine Belling; Christian Theil Have; Jose M G Izarzugaza; Marie Grosjean; Jette Bork-Jensen; Jakob Grove; Thomas D Als; Shujia Huang; Yuqi Chang; Ruiqi Xu; Weijian Ye; Junhua Rao; Xiaosen Guo; Jihua Sun; Hongzhi Cao; Chen Ye; Johan van Beusekom; Thomas Espeseth; Esben Flindt; Rune M Friborg; Anders E Halager; Stephanie Le Hellard; Christina M Hultman; Francesco Lescai; Shengting Li; Ole Lund; Peter Løngren; Thomas Mailund; Maria Luisa Matey-Hernandez; Ole Mors; Christian N S Pedersen; Thomas Sicheritz-Pontén; Patrick Sullivan; Ali Syed; David Westergaard; Rachita Yadav; Ning Li; Xun Xu; Torben Hansen; Anders Krogh; Lars Bolund; Thorkild I A Sørensen; Oluf Pedersen; Ramneek Gupta; Simon Rasmussen; Søren Besenbacher; Anders D Børglum; Jun Wang; Hans Eiberg; Karsten Kristiansen; Søren Brunak; Mikkel Heide Schierup
Journal: Nature Date: 2017-07-26 Impact factor: 49.962

4. Using Machine Learning To Predict Antimicrobial MICs and Associated Genomic Features for Nontyphoidal Salmonella.

Authors: Marcus Nguyen; S Wesley Long; Patrick F McDermott; Randall J Olsen; Robert Olson; Rick L Stevens; Gregory H Tyson; Shaohua Zhao; James J Davis
Journal: J Clin Microbiol Date: 2019-01-30 Impact factor: 5.948

5. An efficient classification algorithm for NGS data based on text similarity.

Authors: Xiangyu Liao; Xingyu Liao; Wufei Zhu; Lu Fang; Xing Chen
Journal: Genet Res (Camb) Date: 2018-09-17 Impact factor: 1.588

6. BLESS 2: accurate, memory-efficient and fast error correction method.

Authors: Yun Heo; Anand Ramachandran; Wen-Mei Hwu; Jian Ma; Deming Chen
Journal: Bioinformatics Date: 2016-03-24 Impact factor: 6.937

7. KCMBT: a k-mer Counter based on Multiple Burst Trees.

Authors: Abdullah-Al Mamun; Soumitra Pal; Sanguthevar Rajasekaran
Journal: Bioinformatics Date: 2016-06-09 Impact factor: 6.937

8. A concurrent subtractive assembly approach for identification of disease associated sub-metagenomes.

Authors: Wontack Han; Mingjie Wang; Yuzhen Ye
Journal: Res Comput Mol Biol Date: 2017-04-12

9. Pervasive epigenetic effects of Drosophila euchromatic transposable elements impact their evolution.

Authors: Yuh Chwen G Lee; Gary H Karpen
Journal: Elife Date: 2017-07-11 Impact factor: 8.140

10. Comparing fixed sampling with minimizer sampling when using k-mer indexes to find maximal exact matches.

Authors: Meznah Almutairy; Eric Torng
Journal: PLoS One Date: 2018-02-01 Impact factor: 3.240