Literature DB >> 25609798

KMC 2: fast and resource-frugal k-mer counting.

Sebastian Deorowicz1, Marek Kokot1, Szymon Grabowski1, Agnieszka Debudaj-Grabysz1.   

Abstract

MOTIVATION: Building the histogram of occurrences of every k-symbol long substring of nucleotide data is a standard step in many bioinformatics applications, known under the name of k-mer counting. Its applications include developing de Bruijn graph genome assemblers, fast multiple sequence alignment and repeat detection. The tremendous amounts of NGS data require fast algorithms for k-mer counting, preferably using moderate amounts of memory.
RESULTS: We present a novel method for k-mer counting, on large datasets about twice faster than the strongest competitors (Jellyfish 2, KMC 1), using about 12 GB (or less) of RAM. Our disk-based method bears some resemblance to MSPKmerCounter, yet replacing the original minimizers with signatures (a carefully selected subset of all minimizers) and using (k, x)-mers allows to significantly reduce the I/O and a highly parallel overall architecture allows to achieve unprecedented processing speeds. For example, KMC 2 counts the 28-mers of a human reads collection with 44-fold coverage (106 GB of compressed size) in about 20 min, on a 6-core Intel i7 PC with an solid-state disk.
© The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

Entities:  

Mesh:

Year:  2015        PMID: 25609798     DOI: 10.1093/bioinformatics/btv022

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  59 in total

1.  Phenetic Comparison of Prokaryotic Genomes Using k-mers.

Authors:  Maxime Déraspe; Frédéric Raymond; Sébastien Boisvert; Alexander Culley; Paul H Roy; François Laviolette; Jacques Corbeil
Journal:  Mol Biol Evol       Date:  2017-10-01       Impact factor: 16.240

2.  BFC: correcting Illumina sequencing errors.

Authors:  Heng Li
Journal:  Bioinformatics       Date:  2015-05-06       Impact factor: 6.937

3.  Sequencing and de novo assembly of 150 genomes from Denmark as a population reference.

Authors:  Lasse Maretty; Jacob Malte Jensen; Bent Petersen; Jonas Andreas Sibbesen; Siyang Liu; Palle Villesen; Laurits Skov; Kirstine Belling; Christian Theil Have; Jose M G Izarzugaza; Marie Grosjean; Jette Bork-Jensen; Jakob Grove; Thomas D Als; Shujia Huang; Yuqi Chang; Ruiqi Xu; Weijian Ye; Junhua Rao; Xiaosen Guo; Jihua Sun; Hongzhi Cao; Chen Ye; Johan van Beusekom; Thomas Espeseth; Esben Flindt; Rune M Friborg; Anders E Halager; Stephanie Le Hellard; Christina M Hultman; Francesco Lescai; Shengting Li; Ole Lund; Peter Løngren; Thomas Mailund; Maria Luisa Matey-Hernandez; Ole Mors; Christian N S Pedersen; Thomas Sicheritz-Pontén; Patrick Sullivan; Ali Syed; David Westergaard; Rachita Yadav; Ning Li; Xun Xu; Torben Hansen; Anders Krogh; Lars Bolund; Thorkild I A Sørensen; Oluf Pedersen; Ramneek Gupta; Simon Rasmussen; Søren Besenbacher; Anders D Børglum; Jun Wang; Hans Eiberg; Karsten Kristiansen; Søren Brunak; Mikkel Heide Schierup
Journal:  Nature       Date:  2017-07-26       Impact factor: 49.962

4.  Using Machine Learning To Predict Antimicrobial MICs and Associated Genomic Features for Nontyphoidal Salmonella.

Authors:  Marcus Nguyen; S Wesley Long; Patrick F McDermott; Randall J Olsen; Robert Olson; Rick L Stevens; Gregory H Tyson; Shaohua Zhao; James J Davis
Journal:  J Clin Microbiol       Date:  2019-01-30       Impact factor: 5.948

5.  An efficient classification algorithm for NGS data based on text similarity.

Authors:  Xiangyu Liao; Xingyu Liao; Wufei Zhu; Lu Fang; Xing Chen
Journal:  Genet Res (Camb)       Date:  2018-09-17       Impact factor: 1.588

6.  BLESS 2: accurate, memory-efficient and fast error correction method.

Authors:  Yun Heo; Anand Ramachandran; Wen-Mei Hwu; Jian Ma; Deming Chen
Journal:  Bioinformatics       Date:  2016-03-24       Impact factor: 6.937

7.  KCMBT: a k-mer Counter based on Multiple Burst Trees.

Authors:  Abdullah-Al Mamun; Soumitra Pal; Sanguthevar Rajasekaran
Journal:  Bioinformatics       Date:  2016-06-09       Impact factor: 6.937

8.  A concurrent subtractive assembly approach for identification of disease associated sub-metagenomes.

Authors:  Wontack Han; Mingjie Wang; Yuzhen Ye
Journal:  Res Comput Mol Biol       Date:  2017-04-12

9.  Pervasive epigenetic effects of Drosophila euchromatic transposable elements impact their evolution.

Authors:  Yuh Chwen G Lee; Gary H Karpen
Journal:  Elife       Date:  2017-07-11       Impact factor: 8.140

10.  Comparing fixed sampling with minimizer sampling when using k-mer indexes to find maximal exact matches.

Authors:  Meznah Almutairy; Eric Torng
Journal:  PLoS One       Date:  2018-02-01       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.