Rayan Chikhi1, Paul Medvedev. 1. Department of Computer Science and Engineering and Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA.
Abstract
MOTIVATION: Genome assembly tools based on the de Bruijn graph framework rely on a parameter k, which represents a trade-off between several competing effects that are difficult to quantify. There is currently a lack of tools that would automatically estimate the best k to use and/or quickly generate histograms of k-mer abundances that would allow the user to make an informed decision. RESULTS: We develop a fast and accurate sampling method that constructs approximate abundance histograms with several orders of magnitude performance improvement over traditional methods. We then present a fast heuristic that uses the generated abundance histograms for putative k values to estimate the best possible value of k. We test the effectiveness of our tool using diverse sequencing datasets and find that its choice of k leads to some of the best assemblies. AVAILABILITY: Our tool KmerGenie is freely available at: http://kmergenie.bx.psu.edu/.
MOTIVATION: Genome assembly tools based on the de Bruijn graph framework rely on a parameter k, which represents a trade-off between several competing effects that are difficult to quantify. There is currently a lack of tools that would automatically estimate the best k to use and/or quickly generate histograms of k-mer abundances that would allow the user to make an informed decision. RESULTS: We develop a fast and accurate sampling method that constructs approximate abundance histograms with several orders of magnitude performance improvement over traditional methods. We then present a fast heuristic that uses the generated abundance histograms for putative k values to estimate the best possible value of k. We test the effectiveness of our tool using diverse sequencing datasets and find that its choice of k leads to some of the best assemblies. AVAILABILITY: Our tool KmerGenie is freely available at: http://kmergenie.bx.psu.edu/.
Authors: Bianca O Carmello; Rafael L B Coan; Adauto L Cardoso; Erica Ramos; Bruno E A Fantinatti; Diego F Marques; Rogério A Oliveira; Guilherme T Valente; Cesar Martins Journal: Chromosome Res Date: 2017-08-03 Impact factor: 5.239
Authors: Scott A Cunningham; Nicholas Chia; Patricio R Jeraldo; Daniel J Quest; Julie A Johnson; Dave J Boxrud; Angela J Taylor; Jun Chen; Gregory D Jenkins; Travis M Drucker; Heidi Nelson; Robin Patel Journal: J Clin Microbiol Date: 2017-04-12 Impact factor: 5.948
Authors: Eshaw Vidyaprakash; A Jeanine Abrams; William M Shafer; David L Trees Journal: Antimicrob Agents Chemother Date: 2017-03-24 Impact factor: 5.191
Authors: Kristin M Schill; Yun Wang; Robert R Butler; Jean-François Pombert; N Rukma Reddy; Guy E Skinner; John W Larkin Journal: Appl Environ Microbiol Date: 2015-10-30 Impact factor: 4.792
Authors: Xutao Deng; Samia N Naccache; Terry Ng; Scot Federman; Linlin Li; Charles Y Chiu; Eric L Delwart Journal: Nucleic Acids Res Date: 2015-01-13 Impact factor: 16.971
Authors: Lijian Xu; Yan Li; John B Biggins; Brian R Bowman; Gregory L Verdine; James B Gloer; J Andrew Alspaugh; Gerald F Bills Journal: Appl Microbiol Biotechnol Date: 2018-02-02 Impact factor: 4.813