Literature DB >> 22139466

SATe-II: very fast and accurate simultaneous estimation of multiple sequence alignments and phylogenetic trees.

Kevin Liu1, Tandy J Warnow, Mark T Holder, Serita M Nelesen, Jiaye Yu, Alexandros P Stamatakis, C Randal Linder.   

Abstract

Highly accurate estimation of phylogenetic trees for large data sets is difficult, in part because multiple sequence alignments must be accurate for phylogeny estimation methods to be accurate. Coestimation of alignments and trees has been attempted but currently only SATé estimates reasonably accurate trees and alignments for large data sets in practical time frames (Liu K., Raghavan S., Nelesen S., Linder C.R., Warnow T. 2009b. Rapid and accurate large-scale coestimation of sequence alignments and phylogenetic trees. Science. 324:1561-1564). Here, we present a modification to the original SATé algorithm that improves upon SATé (which we now call SATé-I) in terms of speed and of phylogenetic and alignment accuracy. SATé-II uses a different divide-and-conquer strategy than SATé-I and so produces smaller more closely related subsets than SATé-I; as a result, SATé-II produces more accurate alignments and trees, can analyze larger data sets, and runs more efficiently than SATé-I. Generally, SATé is a metamethod that takes an existing multiple sequence alignment method as an input parameter and boosts the quality of that alignment method. SATé-II-boosted alignment methods are significantly more accurate than their unboosted versions, and trees based upon these improved alignments are more accurate than trees based upon the original alignments. Because SATé-I used maximum likelihood (ML) methods that treat gaps as missing data to estimate trees and because we found a correlation between the quality of tree/alignment pairs and ML scores, we explored the degree to which SATé's performance depends on using ML with gaps treated as missing data to determine the best tree/alignment pair. We present two lines of evidence that using ML with gaps treated as missing data to optimize the alignment and tree produces very poor results. First, we show that the optimization problem where a set of unaligned DNA sequences is given and the output is the tree and alignment of those sequences that maximize likelihood under the Jukes-Cantor model is uninformative in the worst possible sense. For all inputs, all trees optimize the likelihood score. Second, we show that a greedy heuristic that uses GTR+Gamma ML to optimize the alignment and the tree can produce very poor alignments and trees. Therefore, the excellent performance of SATé-II and SATé-I is not because ML is used as an optimization criterion for choosing the best tree/alignment pair but rather due to the particular divide-and-conquer realignment techniques employed.

Entities:  

Mesh:

Substances:

Year:  2011        PMID: 22139466     DOI: 10.1093/sysbio/syr095

Source DB:  PubMed          Journal:  Syst Biol        ISSN: 1063-5157            Impact factor:   15.683


  112 in total

1.  Multiple ITS haplotypes in the genome of the lichenized basidiomycete Cora inversa (Hygrophoraceae): fact or artifact?

Authors:  Robert Lücking; James D Lawrey; Patrick M Gillevet; Masoumeh Sikaroodi; Manuela Dal-Forno; Simon A Berger
Journal:  J Mol Evol       Date:  2013-12-17       Impact factor: 2.395

2.  Changes in Anthocyanin Production during Domestication of Citrus.

Authors:  Eugenio Butelli; Andrés Garcia-Lor; Concetta Licciardello; Giuseppina Las Casas; Lionel Hill; Giuseppe Reforgiato Recupero; Manjunath L Keremane; Chandrika Ramadugu; Robert Krueger; Qiang Xu; Xiuxin Deng; Anne-Laure Fanciullino; Yann Froelicher; Luis Navarro; Cathie Martin
Journal:  Plant Physiol       Date:  2017-02-14       Impact factor: 8.340

3.  An algorithm for Morphological Phylogenetic Analysis with Inapplicable Data.

Authors:  Martin D Brazeau; Thomas Guillerme; Martin R Smith
Journal:  Syst Biol       Date:  2019-07-01       Impact factor: 15.683

4.  Large-scale multiple sequence alignment and tree estimation using SATé.

Authors:  Kevin Liu; Tandy Warnow
Journal:  Methods Mol Biol       Date:  2014

5.  Phylotranscriptomic analysis of the origin and early diversification of land plants.

Authors:  Norman J Wickett; Siavash Mirarab; Nam Nguyen; Tandy Warnow; Eric Carpenter; Naim Matasci; Saravanaraj Ayyampalayam; Michael S Barker; J Gordon Burleigh; Matthew A Gitzendanner; Brad R Ruhfel; Eric Wafula; Joshua P Der; Sean W Graham; Sarah Mathews; Michael Melkonian; Douglas E Soltis; Pamela S Soltis; Nicholas W Miles; Carl J Rothfels; Lisa Pokorny; A Jonathan Shaw; Lisa DeGironimo; Dennis W Stevenson; Barbara Surek; Juan Carlos Villarreal; Béatrice Roure; Hervé Philippe; Claude W dePamphilis; Tao Chen; Michael K Deyholos; Regina S Baucom; Toni M Kutchan; Megan M Augustin; Jun Wang; Yong Zhang; Zhijian Tian; Zhixiang Yan; Xiaolei Wu; Xiao Sun; Gane Ka-Shu Wong; James Leebens-Mack
Journal:  Proc Natl Acad Sci U S A       Date:  2014-10-29       Impact factor: 11.205

6.  TIPP: taxonomic identification and phylogenetic profiling.

Authors:  Nam-Phuong Nguyen; Siavash Mirarab; Bo Liu; Mihai Pop; Tandy Warnow
Journal:  Bioinformatics       Date:  2014-10-29       Impact factor: 6.937

7.  PASTA: Ultra-Large Multiple Sequence Alignment for Nucleotide and Amino-Acid Sequences.

Authors:  Siavash Mirarab; Nam Nguyen; Sheng Guo; Li-San Wang; Junhyong Kim; Tandy Warnow
Journal:  J Comput Biol       Date:  2014-12-30       Impact factor: 1.479

8.  Multiple Sequence Alignment for Large Heterogeneous Datasets Using SATé, PASTA, and UPP.

Authors:  Tandy Warnow; Siavash Mirarab
Journal:  Methods Mol Biol       Date:  2021

Review 9.  Revisiting Evaluation of Multiple Sequence Alignment Methods.

Authors:  Tandy Warnow
Journal:  Methods Mol Biol       Date:  2021

10.  Peregrine and saker falcon genome sequences provide insights into evolution of a predatory lifestyle.

Authors:  Xiangjiang Zhan; Shengkai Pan; Junyi Wang; Andrew Dixon; Jing He; Margit G Muller; Peixiang Ni; Li Hu; Yuan Liu; Haolong Hou; Yuanping Chen; Jinquan Xia; Qiong Luo; Pengwei Xu; Ying Chen; Shengguang Liao; Changchang Cao; Shukun Gao; Zhaobao Wang; Zhen Yue; Guoqing Li; Ye Yin; Nick C Fox; Jun Wang; Michael W Bruford
Journal:  Nat Genet       Date:  2013-03-24       Impact factor: 38.330

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.