Literature DB >> 26831696

Evaluating the Quantitative Capabilities of Metagenomic Analysis Software.

Csaba Kerepesi1, Vince Grolmusz2,3.   

Abstract

DNA sequencing technologies are applied widely and frequently today to describe metagenomes, i.e., microbial communities in environmental or clinical samples, without the need for culturing them. These technologies usually return short (100-300 base-pairs long) DNA reads, and these reads are processed by metagenomic analysis software that assign phylogenetic composition-information to the dataset. Here we evaluate three metagenomic analysis software (AmphoraNet--a webserver implementation of AMPHORA2--, MG-RAST, and MEGAN5) for their capabilities of assigning quantitative phylogenetic information for the data, describing the frequency of appearance of the microorganisms of the same taxa in the sample. The difficulties of the task arise from the fact that longer genomes produce more reads from the same organism than shorter genomes, and some software assign higher frequencies to species with longer genomes than to those with shorter ones. This phenomenon is called the "genome length bias." Dozens of complex artificial metagenome benchmarks can be found in the literature. Because of the complexity of those benchmarks, it is usually difficult to judge the resistance of a metagenomic software to this "genome length bias." Therefore, we have made a simple benchmark for the evaluation of the "taxon-counting" in a metagenomic sample: we have taken the same number of copies of three full bacterial genomes of different lengths, break them up randomly to short reads of average length of 150 bp, and mixed the reads, creating our simple benchmark. Because of its simplicity, the benchmark is not supposed to serve as a mock metagenome, but if a software fails on that simple task, it will surely fail on most real metagenomes. We applied three software for the benchmark. The ideal quantitative solution would assign the same proportion to the three bacterial taxa. We have found that AMPHORA2/AmphoraNet gave the most accurate results and the other two software were under-performers: they counted quite reliably each short read to their respective taxon, producing the typical genome length bias. The benchmark dataset is available at http://pitgroup.org/static/3RandomGenome-100kavg150bps.fna.

Entities:  

Keywords:  Environmental microbiology; Genomics; Microbiology

Mesh:

Year:  2016        PMID: 26831696     DOI: 10.1007/s00284-016-0991-2

Source DB:  PubMed          Journal:  Curr Microbiol        ISSN: 0343-8651            Impact factor:   2.188


  9 in total

1.  Phylogenomic analysis of bacterial and archaeal sequences with AMPHORA2.

Authors:  Martin Wu; Alexandra J Scott
Journal:  Bioinformatics       Date:  2012-02-12       Impact factor: 6.937

2.  MEGAN analysis of metagenomic data.

Authors:  Daniel H Huson; Alexander F Auch; Ji Qi; Stephan C Schuster
Journal:  Genome Res       Date:  2007-01-25       Impact factor: 9.043

3.  Use of simulated data sets to evaluate the fidelity of metagenomic processing methods.

Authors:  Konstantinos Mavromatis; Natalia Ivanova; Kerrie Barry; Harris Shapiro; Eugene Goltsman; Alice C McHardy; Isidore Rigoutsos; Asaf Salamov; Frank Korzeniewski; Miriam Land; Alla Lapidus; Igor Grigoriev; Paul Richardson; Philip Hugenholtz; Nikos C Kyrpides
Journal:  Nat Methods       Date:  2007-04-29       Impact factor: 28.547

4.  AmphoraNet: the webserver implementation of the AMPHORA2 metagenomic workflow suite.

Authors:  Csaba Kerepesi; Dániel Bánky; Vince Grolmusz
Journal:  Gene       Date:  2013-10-19       Impact factor: 3.688

5.  Integrative analysis of environmental sequences using MEGAN4.

Authors:  Daniel H Huson; Suparna Mitra; Hans-Joachim Ruscheweyh; Nico Weber; Stephan C Schuster
Journal:  Genome Res       Date:  2011-06-20       Impact factor: 9.043

6.  Visual analysis of the quantitative composition of metagenomic communities: the AmphoraVizu webserver.

Authors:  Csaba Kerepesi; Balázs Szalkai; Vince Grolmusz
Journal:  Microb Ecol       Date:  2014-10-10       Impact factor: 4.552

7.  The metagenomics RAST server - a public resource for the automatic phylogenetic and functional analysis of metagenomes.

Authors:  F Meyer; D Paarmann; M D'Souza; R Olson; E M Glass; M Kubal; T Paczian; A Rodriguez; R Stevens; A Wilke; J Wilkening; R A Edwards
Journal:  BMC Bioinformatics       Date:  2008-09-19       Impact factor: 3.169

8.  A simple, fast, and accurate method of phylogenomic inference.

Authors:  Martin Wu; Jonathan A Eisen
Journal:  Genome Biol       Date:  2008-10-13       Impact factor: 13.583

9.  MetaSim: a sequencing simulator for genomics and metagenomics.

Authors:  Daniel C Richter; Felix Ott; Alexander F Auch; Ramona Schmid; Daniel H Huson
Journal:  PLoS One       Date:  2008-10-08       Impact factor: 3.240

  9 in total
  2 in total

1.  Metagenomic analysis of medicinal Cannabis samples; pathogenic bacteria, toxigenic fungi, and beneficial microbes grow in culture-based yeast and mold tests.

Authors:  Kevin McKernan; Jessica Spangler; Yvonne Helbert; Ryan C Lynch; Adrian Devitt-Lee; Lei Zhang; Wendell Orphe; Jason Warner; Theodore Foss; Christopher J Hudalla; Matthew Silva; Douglas R Smith
Journal:  F1000Res       Date:  2016-10-07

2.  Toward Accurate and Robust Environmental Surveillance Using Metagenomics.

Authors:  Jiaxian Shen; Alexander G McFarland; Vincent B Young; Mary K Hayden; Erica M Hartmann
Journal:  Front Genet       Date:  2021-03-05       Impact factor: 4.599

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.