Literature DB >> 31545363

APPLES: Scalable Distance-Based Phylogenetic Placement with or without Alignments.

Metin Balaban1, Shahab Sarmashghi2, Siavash Mirarab2.   

Abstract

Placing a new species on an existing phylogeny has increasing relevance to several applications. Placement can be used to update phylogenies in a scalable fashion and can help identify unknown query samples using (meta-)barcoding, skimming, or metagenomic data. Maximum likelihood (ML) methods of phylogenetic placement exist, but these methods are not scalable to reference trees with many thousands of leaves, limiting their ability to enjoy benefits of dense taxon sampling in modern reference libraries. They also rely on assembled sequences for the reference set and aligned sequences for the query. Thus, ML methods cannot analyze data sets where the reference consists of unassembled reads, a scenario relevant to emerging applications of genome skimming for sample identification. We introduce APPLES, a distance-based method for phylogenetic placement. Compared to ML, APPLES is an order of magnitude faster and more memory efficient, and unlike ML, it is able to place on large backbone trees (tested for up to 200,000 leaves). We show that using dense references improves accuracy substantially so that APPLES on dense trees is more accurate than ML on sparser trees, where it can run. Finally, APPLES can accurately identify samples without assembled reference or aligned queries using kmer-based distances, a scenario that ML cannot handle. APPLES is available publically at github.com/balabanmetin/apples.
© The Author(s) 2019. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For permissions, please email: journals.permissions@oup.com.

Entities:  

Keywords:  Distance-based methods; genome skimming; phylogenetic placement

Mesh:

Year:  2020        PMID: 31545363      PMCID: PMC7164367          DOI: 10.1093/sysbio/syz063

Source DB:  PubMed          Journal:  Syst Biol        ISSN: 1063-5157            Impact factor:   15.683


  61 in total

1.  Fast and accurate phylogeny reconstruction algorithms based on the minimum-evolution principle.

Authors:  Richard Desper; Olivier Gascuel
Journal:  J Comput Biol       Date:  2002       Impact factor: 1.479

2.  Quantitative phylogenetic assessment of microbial communities in diverse environments.

Authors:  C von Mering; P Hugenholtz; J Raes; S G Tringe; T Doerks; L J Jensen; N Ward; P Bork
Journal:  Science       Date:  2007-02-01       Impact factor: 47.728

3.  Metagenomic species profiling using universal phylogenetic marker genes.

Authors:  Shinichi Sunagawa; Daniel R Mende; Georg Zeller; Fernando Izquierdo-Carrasco; Simon A Berger; Jens Roat Kultima; Luis Pedro Coelho; Manimozhiyan Arumugam; Julien Tap; Henrik Bjørn Nielsen; Simon Rasmussen; Søren Brunak; Oluf Pedersen; Francisco Guarner; Willem M de Vos; Jun Wang; Junhua Li; Joël Doré; S Dusko Ehrlich; Alexandros Stamatakis; Peer Bork
Journal:  Nat Methods       Date:  2013-10-20       Impact factor: 28.547

4.  Genome skimming for next-generation biodiversity analysis.

Authors:  Steven Dodsworth
Journal:  Trends Plant Sci       Date:  2015-07-20       Impact factor: 18.313

5.  LSHPlace: fast phylogenetic placement using locality-sensitive hashing.

Authors:  Daniel G Brown; Jakub Truszkowski
Journal:  Pac Symp Biocomput       Date:  2013

6.  Computational complexity of inferring phylogenies from chromosome inversion data.

Authors:  W H Day; D Sankoff
Journal:  J Theor Biol       Date:  1987-01-21       Impact factor: 2.691

7.  General time-reversible distances with unequal rates across sites: mixing gamma and inverse Gaussian distributions with invariant sites.

Authors:  P J Waddell; M A Steel
Journal:  Mol Phylogenet Evol       Date:  1997-12       Impact factor: 4.286

8.  Metagenomic analysis of the human distal gut microbiome.

Authors:  Steven R Gill; Mihai Pop; Robert T Deboy; Paul B Eckburg; Peter J Turnbaugh; Buck S Samuel; Jeffrey I Gordon; David A Relman; Claire M Fraser-Liggett; Karen E Nelson
Journal:  Science       Date:  2006-06-02       Impact factor: 47.728

9.  RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies.

Authors:  Alexandros Stamatakis
Journal:  Bioinformatics       Date:  2014-01-21       Impact factor: 6.937

10.  SimPhy: Phylogenomic Simulation of Gene, Locus, and Species Trees.

Authors:  Diego Mallo; Leonardo De Oliveira Martins; David Posada
Journal:  Syst Biol       Date:  2015-11-01       Impact factor: 15.683

View more
  15 in total

1.  African mitochondrial haplogroup L7: a 100,000-year-old maternal human lineage discovered through reassessment and new sequencing.

Authors:  Paul A Maier; Göran Runfeldt; Roberta J Estes; Miguel G Vilar
Journal:  Sci Rep       Date:  2022-06-24       Impact factor: 4.996

2.  Phylogeny Estimation Given Sequence Length Heterogeneity.

Authors:  Vladimir Smirnov; Tandy Warnow
Journal:  Syst Biol       Date:  2021-02-10       Impact factor: 15.683

3.  Read-SpaM: assembly-free and alignment-free comparison of bacterial genomes with low sequencing coverage.

Authors:  Anna-Katharina Lau; Svenja Dörrer; Chris-André Leimeister; Christoph Bleidorn; Burkhard Morgenstern
Journal:  BMC Bioinformatics       Date:  2019-12-17       Impact factor: 3.169

4.  Genesis and Gappa: processing, analyzing and visualizing phylogenetic (placement) data.

Authors:  Lucas Czech; Pierre Barbera; Alexandros Stamatakis
Journal:  Bioinformatics       Date:  2020-05-01       Impact factor: 6.937

5.  Machine learning based imputation techniques for estimating phylogenetic trees from incomplete distance matrices.

Authors:  Ananya Bhattacharjee; Md Shamsuzzoha Bayzid
Journal:  BMC Genomics       Date:  2020-07-20       Impact factor: 3.969

6.  Pandemic-scale phylogenetics.

Authors:  Cheng Ye; Bryan Thornlow; Alexander Kramer; Jakob McBroome; Angie Hinrichs; Russell Corbett-Detig; Yatish Turakhia
Journal:  bioRxiv       Date:  2021-12-06

7.  Estimating repeat spectra and genome length from low-coverage genome skims with RESPECT.

Authors:  Shahab Sarmashghi; Metin Balaban; Eleonora Rachtman; Behrouz Touri; Siavash Mirarab; Vineet Bafna
Journal:  PLoS Comput Biol       Date:  2021-11-15       Impact factor: 4.475

8.  Phylogenetic double placement of mixed samples.

Authors:  Metin Balaban; Siavash Mirarab
Journal:  Bioinformatics       Date:  2020-07-01       Impact factor: 6.937

9.  Beyond DNA barcoding: The unrealized potential of genome skim data in sample identification.

Authors:  Kristine Bohmann; Siavash Mirarab; Vineet Bafna; M Thomas P Gilbert
Journal:  Mol Ecol       Date:  2020-06-29       Impact factor: 6.185

10.  The number of k-mer matches between two DNA sequences as a function of k and applications to estimate phylogenetic distances.

Authors:  Sophie Röhling; Alexander Linne; Jendrik Schellhorn; Morteza Hosseini; Thomas Dencker; Burkhard Morgenstern
Journal:  PLoS One       Date:  2020-02-10       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.