Daniel Langenkämper1, Alexander Goesmann2, Tim Wilhelm Nattkemper3. 1. Biodata Mining, Bielefeld University, Universitätsstraße 15, Bielefeld, Germany. dlangenk@cebitec.uni-bielefeld.de. 2. Bioinformatik und Systembiologie, Justus Liebig University, Düsternbrooker Weg 20, Gießen, Germany. Alexander.Goesmann@computational.bio.uni-giessen.de. 3. Biodata Mining, Bielefeld University, Universitätsstraße 15, Bielefeld, Germany. tim.nattkemper@uni-bielefeld.de.
Abstract
BACKGROUND: With the advent of low cost, fast sequencing technologies metagenomic analyses are made possible. The large data volumes gathered by these techniques and the unpredictable diversity captured in them are still, however, a challenge for computational biology. RESULTS: In this paper we address the problem of rapid taxonomic assignment with small and adaptive data models (< 5 MB) and present the accelerated k-mer explorer (AKE). Acceleration in AKE's taxonomic assignments is achieved by a special machine learning architecture, which is well suited to model data collections that are intrinsically hierarchical. We report classification accuracy reasonably well for ranks down to order, observed on a study on real world data (Acid Mine Drainage, Cow Rumen). CONCLUSION: We show that the execution time of this approach is orders of magnitude shorter than competitive approaches and that accuracy is comparable. The tool is presented to the public as a web application (url: https://ani.cebitec.uni-bielefeld.de/ake/ , username: bmc, password: bmcbioinfo).
BACKGROUND: With the advent of low cost, fast sequencing technologies metagenomic analyses are made possible. The large data volumes gathered by these techniques and the unpredictable diversity captured in them are still, however, a challenge for computational biology. RESULTS: In this paper we address the problem of rapid taxonomic assignment with small and adaptive data models (< 5 MB) and present the accelerated k-mer explorer (AKE). Acceleration in AKE's taxonomic assignments is achieved by a special machine learning architecture, which is well suited to model data collections that are intrinsically hierarchical. We report classification accuracy reasonably well for ranks down to order, observed on a study on real world data (Acid Mine Drainage, Cow Rumen). CONCLUSION: We show that the execution time of this approach is orders of magnitude shorter than competitive approaches and that accuracy is comparable. The tool is presented to the public as a web application (url: https://ani.cebitec.uni-bielefeld.de/ake/ , username: bmc, password: bmcbioinfo).
Authors: Gene W Tyson; Jarrod Chapman; Philip Hugenholtz; Eric E Allen; Rachna J Ram; Paul M Richardson; Victor V Solovyev; Edward M Rubin; Daniel S Rokhsar; Jillian F Banfield Journal: Nature Date: 2004-02-01 Impact factor: 49.962
Authors: Marc Weber; Hanno Teeling; Sixing Huang; Jost Waldmann; Mariette Kassabgy; Bernhard M Fuchs; Anna Klindworth; Christine Klockow; Antje Wichels; Gunnar Gerdts; Rudolf Amann; Frank Oliver Glöckner Journal: ISME J Date: 2010-12-16 Impact factor: 10.302
Authors: Daniel H Huson; Suparna Mitra; Hans-Joachim Ruscheweyh; Nico Weber; Stephan C Schuster Journal: Genome Res Date: 2011-06-20 Impact factor: 9.043
Authors: Daniel Langenkämper; Tobias Jakobi; Dustin Feld; Lukas Jelonek; Alexander Goesmann; Tim W Nattkemper Journal: Front Genet Date: 2016-02-10 Impact factor: 4.599
Authors: Christophe Lambert; Cassandra Braxton; Robert L Charlebois; Avisek Deyati; Paul Duncan; Fabio La Neve; Heather D Malicki; Sebastien Ribrioux; Daniel K Rozelle; Brandye Michaels; Wenping Sun; Zhihui Yang; Arifa S Khan Journal: Viruses Date: 2018-09-27 Impact factor: 5.048
Authors: Veronika B Dubinkina; Dmitry S Ischenko; Vladimir I Ulyantsev; Alexander V Tyakht; Dmitry G Alexeev Journal: BMC Bioinformatics Date: 2016-01-16 Impact factor: 3.169