Literature DB >> 16891377

Exploring the relationship between sequence similarity and accurate phylogenetic trees.

Brandi L Cantarel1, Hilary G Morrison, William Pearson.   

Abstract

We have characterized the relationship between accurate phylogenetic reconstruction and sequence similarity, testing whether high levels of sequence similarity can consistently produce accurate evolutionary trees. We generated protein families with known phylogenies using a modified version of the PAML/EVOLVER program that produces insertions and deletions as well as substitutions. Protein families were evolved over a range of 100-400 point accepted mutations; at these distances 63% of the families shared significant sequence similarity. Protein families were evolved using balanced and unbalanced trees, with ancient or recent radiations. In families sharing statistically significant similarity, about 60% of multiple sequence alignments were 95% identical to true alignments. To compare recovered topologies with true topologies, we used a score that reflects the fraction of clades that were correctly clustered. As expected, the accuracy of the phylogenies was greatest in the least divergent families. About 88% of phylogenies clustered over 80% of clades in families that shared significant sequence similarity, using Bayesian, parsimony, distance, and maximum likelihood methods. However, for protein families with short ancient branches (ancient radiation), only 30% of the most divergent (but statistically significant) families produced accurate phylogenies, and only about 70% of the second most highly conserved families, with median expectation values better than 10(-60), produced accurate trees. These values represent upper bounds on expected tree accuracy for sequences with a simple divergence history; proteins from 700 Giardia families, with a similar range of sequence similarities but considerably more gaps, produced much less accurate trees. For our simulated insertions and deletions, correct multiple sequence alignments did not perform much better than those produced by T-COFFEE, and including sequences with expressed sequence tag-like sequencing errors did not significantly decrease phylogenetic accuracy. In general, although less-divergent sequence families produce more accurate trees, the likelihood of estimating an accurate tree is most dependent on whether radiation in the family was ancient or recent. Accuracy can be improved by combining genes from the same organism when creating species trees or by selecting protein families with the best bootstrap values in comprehensive studies.

Entities:  

Mesh:

Year:  2006        PMID: 16891377     DOI: 10.1093/molbev/msl080

Source DB:  PubMed          Journal:  Mol Biol Evol        ISSN: 0737-4038            Impact factor:   16.240


  13 in total

1.  Large-scale multiple sequence alignment and tree estimation using SATé.

Authors:  Kevin Liu; Tandy Warnow
Journal:  Methods Mol Biol       Date:  2014

2.  Functional Evolution of Proteins.

Authors:  Jonathan Catazaro; Adam Caprez; David Swanson; Robert Powers
Journal:  Proteins       Date:  2019-02-19

Review 3.  Revisiting Evaluation of Multiple Sequence Alignment Methods.

Authors:  Tandy Warnow
Journal:  Methods Mol Biol       Date:  2021

4.  Evolutionary insights into the unique electromotility motor of mammalian outer hair cells.

Authors:  Oseremen E Okoruwa; Michael D Weston; Divvya C Sanjeevi; Amanda R Millemon; Bernd Fritzsch; Richard Hallworth; Kirk W Beisel
Journal:  Evol Dev       Date:  2008 May-Jun       Impact factor: 1.930

5.  Multiple sequence alignment: a major challenge to large-scale phylogenetics.

Authors:  Kevin Liu; C Randal Linder; Tandy Warnow
Journal:  PLoS Curr       Date:  2010-11-19

6.  The Influence of Hepatitis C Virus Genetic Region on Phylogenetic Clustering Analysis.

Authors:  François M J Lamoury; Brendan Jacka; Sofia Bartlett; Rowena A Bull; Arthur Wong; Janaki Amin; Janke Schinkel; Art F Poon; Gail V Matthews; Jason Grebely; Gregory J Dore; Tanya L Applegate
Journal:  PLoS One       Date:  2015-07-20       Impact factor: 3.240

7.  Phylogenetic inference under varying proportions of indel-induced alignment gaps.

Authors:  Bhakti Dwivedi; Sudhindra R Gadagkar
Journal:  BMC Evol Biol       Date:  2009-08-23       Impact factor: 3.260

8.  Evolution of Toll, Spatzle and MyD88 in insects: the problem of the Diptera bias.

Authors:  Letícia Ferreira Lima; André Quintanilha Torres; Rodrigo Jardim; Rafael Dias Mesquita; Renata Schama
Journal:  BMC Genomics       Date:  2021-07-21       Impact factor: 3.969

9.  DendroBLAST: approximate phylogenetic trees in the absence of multiple sequence alignments.

Authors:  Steven Kelly; Philip K Maini
Journal:  PLoS One       Date:  2013-03-15       Impact factor: 3.240

10.  Phylogenetic support values are not necessarily informative: the case of the Serialia hypothesis (a mollusk phylogeny).

Authors:  J Wolfgang Wägele; Harald Letsch; Annette Klussmann-Kolb; Christoph Mayer; Bernhard Misof; Heike Wägele
Journal:  Front Zool       Date:  2009-06-26       Impact factor: 3.172

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.