Peter Audano1, Fredrik Vannberg1. 1. School of Biology, Georgia Institute of Technology, Atlanta, GA 30332, USA.
Abstract
MOTIVATION: Converting nucleotide sequences into short overlapping fragments of uniform length, k-mers, is a common step in many bioinformatics applications. While existing software packages count k-mers, few are optimized for speed, offer an application programming interface (API), a graphical interface or contain features that make it extensible and maintainable. We designed KAnalyze to compete with the fastest k-mer counters, to produce reliable output and to support future development efforts through well-architected, documented and testable code. Currently, KAnalyze can output k-mer counts in a sorted tab-delimited file or stream k-mers as they are read. KAnalyze can process large datasets with 2 GB of memory. This project is implemented in Java 7, and the command line interface (CLI) is designed to integrate into pipelines written in any language. RESULTS: As a k-mer counter, KAnalyze outperforms Jellyfish, DSK and a pipeline built on Perl and Linux utilities. Through extensive unit and system testing, we have verified that KAnalyze produces the correct k-mer counts over multiple datasets and k-mer sizes. AVAILABILITY AND IMPLEMENTATION: KAnalyze is available on SourceForge: https://sourceforge.net/projects/kanalyze/.
MOTIVATION: Converting nucleotide sequences into short overlapping fragments of uniform length, k-mers, is a common step in many bioinformatics applications. While existing software packages count k-mers, few are optimized for speed, offer an application programming interface (API), a graphical interface or contain features that make it extensible and maintainable. We designed KAnalyze to compete with the fastest k-mer counters, to produce reliable output and to support future development efforts through well-architected, documented and testable code. Currently, KAnalyze can output k-mer counts in a sorted tab-delimited file or stream k-mers as they are read. KAnalyze can process large datasets with 2 GB of memory. This project is implemented in Java 7, and the command line interface (CLI) is designed to integrate into pipelines written in any language. RESULTS: As a k-mer counter, KAnalyze outperforms Jellyfish, DSK and a pipeline built on Perl and Linux utilities. Through extensive unit and system testing, we have verified that KAnalyze produces the correct k-mer counts over multiple datasets and k-mer sizes. AVAILABILITY AND IMPLEMENTATION: KAnalyze is available on SourceForge: https://sourceforge.net/projects/kanalyze/.
Authors: Greg Wilson; D A Aruliah; C Titus Brown; Neil P Chue Hong; Matt Davis; Richard T Guy; Steven H D Haddock; Kathryn D Huff; Ian M Mitchell; Mark D Plumbley; Ben Waugh; Ethan P White; Paul Wilson Journal: PLoS Biol Date: 2014-01-07 Impact factor: 8.029
Authors: Michelle D Noyes; William T Harvey; David Porubsky; Arvis Sulovari; Ruiyang Li; Nicholas R Rose; Peter A Audano; Katherine M Munson; Alexandra P Lewis; Kendra Hoekzema; Tuomo Mantere; Tina A Graves-Lindsay; Ashley D Sanders; Sara Goodwin; Melissa Kramer; Younes Mokrab; Michael C Zody; Alexander Hoischen; Jan O Korbel; W Richard McCombie; Evan E Eichler Journal: Am J Hum Genet Date: 2022-03-14 Impact factor: 11.043
Authors: M Lorena Harvey; Aung Soe Lin; Lili Sun; Tatsuki Koyama; Jennifer H B Shuman; John T Loh; Holly M Scott Algood; Matthew B Scholz; Mark S McClain; Timothy L Cover Journal: Infect Immun Date: 2021-07-12 Impact factor: 3.441
Authors: Steven Flygare; Keith Simmon; Chase Miller; Yi Qiao; Brett Kennedy; Tonya Di Sera; Erin H Graf; Keith D Tardif; Aurélie Kapusta; Shawn Rynearson; Chris Stockmann; Krista Queen; Suxiang Tong; Karl V Voelkerding; Anne Blaschke; Carrie L Byington; Seema Jain; Andrew Pavia; Krow Ampofo; Karen Eilbeck; Gabor Marth; Mark Yandell; Robert Schlaberg Journal: Genome Biol Date: 2016-05-26 Impact factor: 13.583
Authors: Veronika B Dubinkina; Dmitry S Ischenko; Vladimir I Ulyantsev; Alexander V Tyakht; Dmitry G Alexeev Journal: BMC Bioinformatics Date: 2016-01-16 Impact factor: 3.169