Literature DB >> 21113338

Multiple sequence alignment: a major challenge to large-scale phylogenetics.

Kevin Liu1, C Randal Linder, Tandy Warnow.   

Abstract

Over the last decade, dramatic advances have been made in developing methods for large-scale phylogeny estimation, so that it is now feasible for investigators with moderate computational resources to obtain reasonable solutions to maximum likelihood and maximum parsimony, even for datasets with a few thousand sequences. There has also been progress on developing methods for multiple sequence alignment, so that greater alignment accuracy (and subsequent improvement in phylogenetic accuracy) is now possible through automated methods. However, these methods have not been tested under conditions that reflect properties of datasets confronted by large-scale phylogenetic estimation projects. In this paper we report on a study that compares several alignment methods on a benchmark collection of nucleotide sequence datasets of up to 78,132 sequences. We show that as the number of sequences increases, the number of alignment methods that can analyze the datasets decreases. Furthermore, the most accurate alignment methods are unable to analyze the very largest datasets we studied, so that only moderately accurate alignment methods can be used on the largest datasets. As a result, alignments computed for large datasets have relatively large error rates, and maximum likelihood phylogenies computed on these alignments also have high error rates. Therefore, the estimation of highly accurate multiple sequence alignments is a major challenge for Tree of Life projects, and more generally for large-scale systematics studies.

Entities:  

Year:  2010        PMID: 21113338      PMCID: PMC2989897.1          DOI: 10.1371/currents.RRN1198

Source DB:  PubMed          Journal:  PLoS Curr        ISSN: 2157-3999


  27 in total

1.  A comprehensive comparison of multiple sequence alignment programs.

Authors:  J D Thompson; F Plewniak; O Poch
Journal:  Nucleic Acids Res       Date:  1999-07-01       Impact factor: 16.971

2.  T-Coffee: A novel method for fast and accurate multiple sequence alignment.

Authors:  C Notredame; D G Higgins; J Heringa
Journal:  J Mol Biol       Date:  2000-09-08       Impact factor: 5.469

3.  SATCHMO: sequence alignment and tree construction using hidden Markov models.

Authors:  Robert C Edgar; Kimmen Sjölander
Journal:  Bioinformatics       Date:  2003-07-22       Impact factor: 6.937

4.  Barking up the wrong treelength: the impact of gap penalty on alignment and tree accuracy.

Authors:  Kevin Liu; Serita Nelesen; Sindhu Raghavan; C Randal Linder; Tandy Warnow
Journal:  IEEE/ACM Trans Comput Biol Bioinform       Date:  2009 Jan-Mar       Impact factor: 3.710

5.  Rapid and accurate large-scale coestimation of sequence alignments and phylogenetic trees.

Authors:  Kevin Liu; Sindhu Raghavan; Serita Nelesen; C Randal Linder; Tandy Warnow
Journal:  Science       Date:  2009-06-19       Impact factor: 47.728

6.  The impact of multiple protein sequence alignment on phylogenetic estimation.

Authors:  Li-San Wang; Jim Leebens-Mack; P Kerr Wall; Kevin Beckmann; Claude W dePamphilis; Tandy Warnow
Journal:  IEEE/ACM Trans Comput Biol Bioinform       Date:  2011 Jul-Aug       Impact factor: 3.710

7.  FastTree 2--approximately maximum-likelihood trees for large alignments.

Authors:  Morgan N Price; Paramvir S Dehal; Adam P Arkin
Journal:  PLoS One       Date:  2010-03-10       Impact factor: 3.240

8.  The comparative RNA web (CRW) site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs.

Authors:  Jamie J Cannone; Sankar Subramanian; Murray N Schnare; James R Collett; Lisa M D'Souza; Yushi Du; Brian Feng; Nan Lin; Lakshmi V Madabusi; Kirsten M Müller; Nupur Pande; Zhidi Shang; Nan Yu; Robin R Gutell
Journal:  BMC Bioinformatics       Date:  2002-01-17       Impact factor: 3.169

9.  FastTree: computing large minimum evolution trees with profiles instead of a distance matrix.

Authors:  Morgan N Price; Paramvir S Dehal; Adam P Arkin
Journal:  Mol Biol Evol       Date:  2009-04-17       Impact factor: 16.240

10.  MUSCLE: a multiple sequence alignment method with reduced time and space complexity.

Authors:  Robert C Edgar
Journal:  BMC Bioinformatics       Date:  2004-08-19       Impact factor: 3.169

View more
  29 in total

Review 1.  Phylogenetic characterization of transport protein superfamilies: superiority of SuperfamilyTree programs over those based on multiple alignments.

Authors:  Jonathan S Chen; Vamsee Reddy; Joshua H Chen; Maksim A Shlykov; Wei Hao Zheng; Jaehoon Cho; Ming Ren Yen; Milton H Saier
Journal:  J Mol Microbiol Biotechnol       Date:  2012-01-31

2.  Large-scale multiple sequence alignment and tree estimation using SATé.

Authors:  Kevin Liu; Tandy Warnow
Journal:  Methods Mol Biol       Date:  2014

3.  Computational methods for Gene Orthology inference.

Authors:  David M Kristensen; Yuri I Wolf; Arcady R Mushegian; Eugene V Koonin
Journal:  Brief Bioinform       Date:  2011-06-19       Impact factor: 11.622

4.  Functional Evolution of Proteins.

Authors:  Jonathan Catazaro; Adam Caprez; David Swanson; Robert Powers
Journal:  Proteins       Date:  2019-02-19

5.  PosiGene: automated and easy-to-use pipeline for genome-wide detection of positively selected genes.

Authors:  Arne Sahm; Martin Bens; Matthias Platzer; Karol Szafranski
Journal:  Nucleic Acids Res       Date:  2017-06-20       Impact factor: 16.971

6.  ATGC database and ATGC-COGs: an updated resource for micro- and macro-evolutionary studies of prokaryotic genomes and protein family annotation.

Authors:  David M Kristensen; Yuri I Wolf; Eugene V Koonin
Journal:  Nucleic Acids Res       Date:  2016-10-18       Impact factor: 16.971

7.  Identification and characterization of proteins of unknown function (PUFs) in Clostridium thermocellum DSM 1313 strains as potential genetic engineering targets.

Authors:  Suresh Poudel; Alexander L Cope; Kaela B O'Dell; Adam M Guss; Hyeongmin Seo; Cong T Trinh; Robert L Hettich
Journal:  Biotechnol Biofuels       Date:  2021-05-10       Impact factor: 6.040

8.  RAxML and FastTree: comparing two methods for large-scale maximum likelihood phylogeny estimation.

Authors:  Kevin Liu; C Randal Linder; Tandy Warnow
Journal:  PLoS One       Date:  2011-11-21       Impact factor: 3.240

9.  Standard maximum likelihood analyses of alignments with gaps can be statistically inconsistent.

Authors:  Tandy Warnow
Journal:  PLoS Curr       Date:  2012-03-09

10.  Sifting through genomes with iterative-sequence clustering produces a large, phylogenetically diverse protein-family resource.

Authors:  Thomas J Sharpton; Guillaume Jospin; Dongying Wu; Morgan G I Langille; Katherine S Pollard; Jonathan A Eisen
Journal:  BMC Bioinformatics       Date:  2012-10-13       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.