Literature DB >> 27166378

Phylogeny-aware identification and correction of taxonomically mislabeled sequences.

Alexey M Kozlov1, Jiajie Zhang2, Pelin Yilmaz3, Frank Oliver Glöckner4, Alexandros Stamatakis5.   

Abstract

Molecular sequences in public databases are mostly annotated by the submitting authors without further validation. This procedure can generate erroneous taxonomic sequence labels. Mislabeled sequences are hard to identify, and they can induce downstream errors because new sequences are typically annotated using existing ones. Furthermore, taxonomic mislabelings in reference sequence databases can bias metagenetic studies which rely on the taxonomy. Despite significant efforts to improve the quality of taxonomic annotations, the curation rate is low because of the labor-intensive manual curation process. Here, we present SATIVA, a phylogeny-aware method to automatically identify taxonomically mislabeled sequences ('mislabels') using statistical models of evolution. We use the Evolutionary Placement Algorithm (EPA) to detect and score sequences whose taxonomic annotation is not supported by the underlying phylogenetic signal, and automatically propose a corrected taxonomic classification for those. Using simulated data, we show that our method attains high accuracy for identification (96.9% sensitivity/91.7% precision) as well as correction (94.9% sensitivity/89.9% precision) of mislabels. Furthermore, an analysis of four widely used microbial 16S reference databases (Greengenes, LTP, RDP and SILVA) indicates that they currently contain between 0.2% and 2.5% mislabels. Finally, we use SATIVA to perform an in-depth evaluation of alternative taxonomies for Cyanobacteria. SATIVA is freely available at https://github.com/amkozlov/sativa.
© The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Mesh:

Substances:

Year:  2016        PMID: 27166378      PMCID: PMC4914121          DOI: 10.1093/nar/gkw396

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


  36 in total

1.  Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eucarya.

Authors:  C R Woese; O Kandler; M L Wheelis
Journal:  Proc Natl Acad Sci U S A       Date:  1990-06       Impact factor: 11.205

2.  Quantitative phylogenetic assessment of microbial communities in diverse environments.

Authors:  C von Mering; P Hugenholtz; J Raes; S G Tringe; T Doerks; L J Jensen; N Ward; P Bork
Journal:  Science       Date:  2007-02-01       Impact factor: 47.728

3.  The All-Species Living Tree project: a 16S rRNA-based phylogenetic tree of all sequenced type strains.

Authors:  Pablo Yarza; Michael Richter; Jörg Peplies; Jean Euzeby; Rudolf Amann; Karl-Heinz Schleifer; Wolfgang Ludwig; Frank Oliver Glöckner; Ramon Rosselló-Móra
Journal:  Syst Appl Microbiol       Date:  2008-08-09       Impact factor: 4.022

Review 4.  A molecular view of microbial diversity and the biosphere.

Authors:  N R Pace
Journal:  Science       Date:  1997-05-02       Impact factor: 47.728

5.  Metagenomic systems biology of the human gut microbiome reveals topological shifts associated with obesity and inflammatory bowel disease.

Authors:  Sharon Greenblum; Peter J Turnbaugh; Elhanan Borenstein
Journal:  Proc Natl Acad Sci U S A       Date:  2011-12-19       Impact factor: 11.205

6.  Performance, accuracy, and Web server for evolutionary placement of short sequence reads under maximum likelihood.

Authors:  Simon A Berger; Denis Krompass; Alexandros Stamatakis
Journal:  Syst Biol       Date:  2011-03-23       Impact factor: 15.683

7.  An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea.

Authors:  Daniel McDonald; Morgan N Price; Julia Goodrich; Eric P Nawrocki; Todd Z DeSantis; Alexander Probst; Gary L Andersen; Rob Knight; Philip Hugenholtz
Journal:  ISME J       Date:  2011-12-01       Impact factor: 10.302

8.  RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies.

Authors:  Alexandros Stamatakis
Journal:  Bioinformatics       Date:  2014-01-21       Impact factor: 6.937

9.  The Earth Microbiome project: successes and aspirations.

Authors:  Jack A Gilbert; Janet K Jansson; Rob Knight
Journal:  BMC Biol       Date:  2014-08-22       Impact factor: 7.431

10.  Mining metadata from unidentified ITS sequences in GenBank: a case study in Inocybe (Basidiomycota).

Authors:  Martin Ryberg; R Henrik Nilsson; Erik Kristiansson; Mats Töpel; Stig Jacobsson; Ellen Larsson
Journal:  BMC Evol Biol       Date:  2008-02-18       Impact factor: 3.260

View more
  29 in total

1.  Scalable methods for analyzing and visualizing phylogenetic placement of metagenomic samples.

Authors:  Lucas Czech; Alexandros Stamatakis
Journal:  PLoS One       Date:  2019-05-28       Impact factor: 3.240

2.  Detecting and correcting misclassified sequences in the large-scale public databases.

Authors:  Hamid Bagheri; Andrew J Severin; Hridesh Rajan
Journal:  Bioinformatics       Date:  2020-09-15       Impact factor: 6.937

3.  Anatomy promotes neutral coexistence of strains in the human skin microbiome.

Authors:  Arolyn Conwill; Anne C Kuan; Ravalika Damerla; Alexandra J Poret; Jacob S Baker; A Delphine Tripp; Eric J Alm; Tami D Lieberman
Journal:  Cell Host Microbe       Date:  2022-01-06       Impact factor: 21.023

Review 4.  Music of metagenomics-a review of its applications, analysis pipeline, and associated tools.

Authors:  Bilal Wajid; Faria Anwar; Imran Wajid; Haseeb Nisar; Sharoze Meraj; Ali Zafar; Mustafa Kamal Al-Shawaqfeh; Ali Riza Ekti; Asia Khatoon; Jan S Suchodolski
Journal:  Funct Integr Genomics       Date:  2021-10-18       Impact factor: 3.410

5.  EPA-ng: Massively Parallel Evolutionary Placement of Genetic Sequences.

Authors:  Pierre Barbera; Alexey M Kozlov; Lucas Czech; Benoit Morel; Diego Darriba; Tomáš Flouri; Alexandros Stamatakis
Journal:  Syst Biol       Date:  2019-03-01       Impact factor: 15.683

6.  Microscope Assisted Uni-algal isolation through Dilution (MAU-D): a simple modified technique for tapping diverse cyanobacteria.

Authors:  Shaloo Verma; Samadhan Yuvaraj Bagul; Prassan Choudhary; Hillol Chakdar; Sudipta Das; Nahid Siddiqui; Anil Kumar Saxena
Journal:  3 Biotech       Date:  2021-06-17       Impact factor: 2.893

Review 7.  The Madness of Microbiome: Attempting To Find Consensus "Best Practice" for 16S Microbiome Studies.

Authors:  Jolinda Pollock; Laura Glendinning; Trong Wisedchanwet; Mick Watson
Journal:  Appl Environ Microbiol       Date:  2018-03-19       Impact factor: 4.792

8.  Evaluating the accuracy of amplicon-based microbiome computational pipelines on simulated human gut microbial communities.

Authors:  Jonathan L Golob; Elisa Margolis; Noah G Hoffman; David N Fredricks
Journal:  BMC Bioinformatics       Date:  2017-05-30       Impact factor: 3.169

9.  Beating Naive Bayes at Taxonomic Classification of 16S rRNA Gene Sequences.

Authors:  Michal Ziemski; Treepop Wisanwanichthan; Nicholas A Bokulich; Benjamin D Kaehler
Journal:  Front Microbiol       Date:  2021-06-18       Impact factor: 5.640

10.  Outlier detection in BLAST hits.

Authors:  Nidhi Shah; Stephen F Altschul; Mihai Pop
Journal:  Algorithms Mol Biol       Date:  2018-03-22       Impact factor: 1.405

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.