Literature DB >> 32692823

Phylogeny Estimation Given Sequence Length Heterogeneity.

Vladimir Smirnov1, Tandy Warnow1.   

Abstract

Phylogeny estimation is a major step in many biological studies, and has many well known challenges. With the dropping cost of sequencing technologies, biologists now have increasingly large datasets available for use in phylogeny estimation. Here we address the challenge of estimating a tree given large datasets with a combination of full-length sequences and fragmentary sequences, which can arise due to a variety of reasons, including sample collection, sequencing technologies, and analytical pipelines. We compare two basic approaches: (1) computing an alignment on the full dataset and then computing a maximum likelihood tree on the alignment, or (2) constructing an alignment and tree on the full length sequences and then using phylogenetic placement to add the remaining sequences (which will generally be fragmentary) into the tree. We explore these two approaches on a range of simulated datasets, each with 1000 sequences and varying in rates of evolution, and two biological datasets. Our study shows some striking performance differences between methods, especially when there is substantial sequence length heterogeneity and high rates of evolution. We find in particular that using UPP to align sequences and RAxML to compute a tree on the alignment provides the best accuracy, substantially outperforming trees computed using phylogenetic placement methods. We also find that FastTree has poor accuracy on alignments containing fragmentary sequences. Overall, our study provides insights into the literature comparing different methods and pipelines for phylogenetic estimation, and suggests directions for future method development. [Phylogeny estimation, sequence length heterogeneity, phylogenetic placement.].
© The Author(s) 2020. Published by Oxford University Press, on behalf of the Society of Systematic Biologists.

Entities:  

Year:  2021        PMID: 32692823      PMCID: PMC7875441          DOI: 10.1093/sysbio/syaa058

Source DB:  PubMed          Journal:  Syst Biol        ISSN: 1063-5157            Impact factor:   15.683


  39 in total

1.  Twilight zone of protein sequence alignments.

Authors:  B Rost
Journal:  Protein Eng       Date:  1999-02

2.  Rose: generating sequence families.

Authors:  J Stoye; D Evers; F Meyer
Journal:  Bioinformatics       Date:  1998       Impact factor: 6.937

3.  FastTree 2--approximately maximum-likelihood trees for large alignments.

Authors:  Morgan N Price; Paramvir S Dehal; Adam P Arkin
Journal:  PLoS One       Date:  2010-03-10       Impact factor: 3.240

4.  GHOST: Recovering Historical Signal from Heterotachously Evolved Sequence Alignments.

Authors:  Stephen M Crotty; Bui Quang Minh; Nigel G Bean; Barbara R Holland; Jonathan Tuke; Lars S Jermiin; Arndt Von Haeseler
Journal:  Syst Biol       Date:  2020-03-01       Impact factor: 15.683

5.  Rapid alignment-free phylogenetic identification of metagenomic sequences.

Authors:  Benjamin Linard; Krister Swenson; Fabio Pardi
Journal:  Bioinformatics       Date:  2019-09-15       Impact factor: 6.937

6.  Fragmentary Gene Sequences Negatively Impact Gene Tree and Species Tree Reconstruction.

Authors:  Erfan Sayyari; James B Whitfield; Siavash Mirarab
Journal:  Mol Biol Evol       Date:  2017-12-01       Impact factor: 16.240

7.  SEPP: SATé-enabled phylogenetic placement.

Authors:  S Mirarab; N Nguyen; T Warnow
Journal:  Pac Symp Biocomput       Date:  2012

8.  Maximum Likelihood Phylogenetic Inference is Consistent on Multiple Sequence Alignments, with or without Gaps.

Authors:  Jakub Truszkowski; Nick Goldman
Journal:  Syst Biol       Date:  2015-11-28       Impact factor: 15.683

9.  TreeMerge: a new method for improving the scalability of species tree estimation methods.

Authors:  Erin K Molloy; Tandy Warnow
Journal:  Bioinformatics       Date:  2019-07-15       Impact factor: 6.937

10.  RAxML-NG: a fast, scalable and user-friendly tool for maximum likelihood phylogenetic inference.

Authors:  Alexey M Kozlov; Diego Darriba; Tomáš Flouri; Benoit Morel; Alexandros Stamatakis
Journal:  Bioinformatics       Date:  2019-11-01       Impact factor: 6.937

View more
  4 in total

1.  Assembling a Reference Phylogenomic Tree of Bacteria and Archaea by Summarizing Many Gene Phylogenies.

Authors:  Qiyun Zhu; Siavash Mirarab
Journal:  Methods Mol Biol       Date:  2022

2.  Automated Phylogenetic Analysis Using Best Reciprocal BLAST.

Authors:  Erin R Butterfield; James C Abbott; Mark C Field
Journal:  Methods Mol Biol       Date:  2021

3.  MAGUS+eHMMs: Improved Multiple Sequence Alignment Accuracy for Fragmentary Sequences.

Authors:  Chengze Shen; Paul Zaharias; Tandy Warnow
Journal:  Bioinformatics       Date:  2021-11-17       Impact factor: 6.937

Review 4.  Recent progress on methods for estimating and updating large phylogenies.

Authors:  Paul Zaharias; Tandy Warnow
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2022-08-22       Impact factor: 6.671

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.