Literature DB >> 21211059

Flexible taxonomic assignment of ambiguous sequencing reads.

José C Clemente1, Jesper Jansson, Gabriel Valiente.   

Abstract

BACKGROUND: To characterize the diversity of bacterial populations in metagenomic studies, sequencing reads need to be accurately assigned to taxonomic units in a given reference taxonomy. Reads that cannot be reliably assigned to a unique leaf in the taxonomy (ambiguous reads) are typically assigned to the lowest common ancestor of the set of species that match it. This introduces a potentially severe error in the estimation of bacteria present in the sample due to false positives, since all species in the subtree rooted at the ancestor are implicitly assigned to the read even though many of them may not match it.
RESULTS: We present a method that maps each read to a node in the taxonomy that minimizes a penalty score while balancing the relevance of precision and recall in the assignment through a parameter q. This mapping can be obtained in time linear in the number of matching sequences, because LCA queries to the reference taxonomy take constant time. When applied to six different metagenomic datasets, our algorithm produces different taxonomic distributions depending on whether coverage or precision is maximized. Including information on the quality of the reads reduces the number of unassigned reads but increases the number of ambiguous reads, stressing the relevance of our method. Finally, two measures of performance are described and results with a set of artificially generated datasets are discussed.
CONCLUSIONS: The assignment strategy of sequencing reads introduced in this paper is a versatile and a quick method to study bacterial communities. The bacterial composition of the analyzed samples can vary significantly depending on how ambiguous reads are assigned depending on the value of the q parameter. Validation of our results in an artificial dataset confirm that a combination of values of q produces the most accurate results.

Entities:  

Mesh:

Substances:

Year:  2011        PMID: 21211059      PMCID: PMC3024944          DOI: 10.1186/1471-2105-12-8

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


  50 in total

1.  Functional metagenomic profiling of nine biomes.

Authors:  Elizabeth A Dinsdale; Robert A Edwards; Dana Hall; Florent Angly; Mya Breitbart; Jennifer M Brulc; Mike Furlan; Christelle Desnues; Matthew Haynes; Linlin Li; Lauren McDaniel; Mary Ann Moran; Karen E Nelson; Christina Nilsson; Robert Olson; John Paul; Beltran Rodriguez Brito; Yijun Ruan; Brandon K Swan; Rick Stevens; David L Valentine; Rebecca Vega Thurber; Linda Wegley; Bryan A White; Forest Rohwer
Journal:  Nature       Date:  2008-03-12       Impact factor: 49.962

2.  Next-generation DNA sequencing.

Authors:  Jay Shendure; Hanlee Ji
Journal:  Nat Biotechnol       Date:  2008-10       Impact factor: 54.908

3.  Bacterial diversity of Taxus rhizosphere: culture-independent and culture-dependent approaches.

Authors:  Da Cheng Hao; Guang Bo Ge; Ling Yang
Journal:  FEMS Microbiol Lett       Date:  2008-07       Impact factor: 2.742

4.  Enriching plant microbiota for a metagenomic library construction.

Authors:  Hao-Xin Wang; Zhao-Liang Geng; Ying Zeng; Yue-Mao Shen
Journal:  Environ Microbiol       Date:  2008-07-09       Impact factor: 5.491

5.  Endophytic bacterial diversity in rice (Oryza sativa L.) roots estimated by 16S rDNA sequence analysis.

Authors:  Lei Sun; Fubin Qiu; Xiaoxia Zhang; Xin Dai; Xiuzhu Dong; Wei Song
Journal:  Microb Ecol       Date:  2007-08-10       Impact factor: 4.552

6.  Evolution of mammals and their gut microbes.

Authors:  Ruth E Ley; Micah Hamady; Catherine Lozupone; Peter J Turnbaugh; Rob Roy Ramey; J Stephen Bircher; Michael L Schlegel; Tammy A Tucker; Mark D Schrenzel; Rob Knight; Jeffrey I Gordon
Journal:  Science       Date:  2008-05-22       Impact factor: 47.728

7.  Accurate taxonomy assignments from 16S rRNA sequences produced by highly parallel pyrosequencers.

Authors:  Zongzhi Liu; Todd Z DeSantis; Gary L Andersen; Rob Knight
Journal:  Nucleic Acids Res       Date:  2008-08-22       Impact factor: 16.971

8.  A comparison of random sequence reads versus 16S rDNA sequences for estimating the biodiversity of a metagenomic library.

Authors:  Chaysavanh Manichanh; Charles E Chapple; Lionel Frangeul; Karine Gloux; Roderic Guigo; Joel Dore
Journal:  Nucleic Acids Res       Date:  2008-08-05       Impact factor: 16.971

9.  MetaSim: a sequencing simulator for genomics and metagenomics.

Authors:  Daniel C Richter; Felix Ott; Alexander F Auch; Ramona Schmid; Daniel H Huson
Journal:  PLoS One       Date:  2008-10-08       Impact factor: 3.240

10.  The pervasive effects of an antibiotic on the human gut microbiota, as revealed by deep 16S rRNA sequencing.

Authors:  Les Dethlefsen; Sue Huse; Mitchell L Sogin; David A Relman
Journal:  PLoS Biol       Date:  2008-11-18       Impact factor: 8.029

View more
  8 in total

1.  Metagenomic Classification Using an Abstraction Augmented Markov Model.

Authors:  Xiujun Sylvia Zhu; Monnie McGee
Journal:  J Comput Biol       Date:  2015-11-30       Impact factor: 1.479

Review 2.  Analytical tools and databases for metagenomics in the next-generation sequencing era.

Authors:  Mincheol Kim; Ki-Hyun Lee; Seok-Whan Yoon; Bong-Soo Kim; Jongsik Chun; Hana Yi
Journal:  Genomics Inform       Date:  2013-09-30

3.  MetAMOS: a modular and open source metagenomic assembly and analysis pipeline.

Authors:  Todd J Treangen; Sergey Koren; Daniel D Sommer; Bo Liu; Irina Astrovskaya; Brian Ondov; Aaron E Darling; Adam M Phillippy; Mihai Pop
Journal:  Genome Biol       Date:  2013-01-15       Impact factor: 13.583

4.  Unbiased Taxonomic Annotation of Metagenomic Samples.

Authors:  Bruno Fosso; Graziano Pesole; Francesc Rosselló; Gabriel Valiente
Journal:  J Comput Biol       Date:  2017-10-13       Impact factor: 1.479

5.  Classifying short genomic fragments from novel lineages using composition and homology.

Authors:  Donovan H Parks; Norman J MacDonald; Robert G Beiko
Journal:  BMC Bioinformatics       Date:  2011-08-09       Impact factor: 3.169

6.  BioMaS: a modular pipeline for Bioinformatic analysis of Metagenomic AmpliconS.

Authors:  Bruno Fosso; Monica Santamaria; Marinella Marzano; Daniel Alonso-Alemany; Gabriel Valiente; Giacinto Donvito; Alfonso Monaco; Pasquale Notarangelo; Graziano Pesole
Journal:  BMC Bioinformatics       Date:  2015-07-01       Impact factor: 3.169

7.  Phylogenetic placement of metagenomic reads using the minimum evolution principle.

Authors:  Alan Filipski; Koichiro Tamura; Paul Billing-Ross; Oscar Murillo; Sudhir Kumar
Journal:  BMC Genomics       Date:  2015-01-15       Impact factor: 3.969

8.  Statistical approach of functional profiling for a microbial community.

Authors:  Lingling An; Nauromal Pookhao; Hongmei Jiang; Jiannong Xu
Journal:  PLoS One       Date:  2014-09-08       Impact factor: 3.240

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.