Literature DB >> 25002495

Simple chained guide trees give high-quality protein multiple sequence alignments.

Kieran Boyce1, Fabian Sievers1, Desmond G Higgins2.   

Abstract

Guide trees are used to decide the order of sequence alignment in the progressive multiple sequence alignment heuristic. These guide trees are often the limiting factor in making large alignments, and considerable effort has been expended over the years in making these quickly or accurately. In this article we show that, at least for protein families with large numbers of sequences that can be benchmarked with known structures, simple chained guide trees give the most accurate alignments. These also happen to be the fastest and simplest guide trees to construct, computationally. Such guide trees have a striking effect on the accuracy of alignments produced by some of the most widely used alignment packages. There is a marked increase in accuracy and a marked decrease in computational time, once the number of sequences goes much above a few hundred. This is true, even if the order of sequences in the guide tree is random.

Keywords:  Clustal; Mafft; Muscle; PFAM

Mesh:

Substances:

Year:  2014        PMID: 25002495      PMCID: PMC4115562          DOI: 10.1073/pnas.1405628111

Source DB:  PubMed          Journal:  Proc Natl Acad Sci U S A        ISSN: 0027-8424            Impact factor:   11.205


  24 in total

1.  MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform.

Authors:  Kazutaka Katoh; Kazuharu Misawa; Kei-ichi Kuma; Takashi Miyata
Journal:  Nucleic Acids Res       Date:  2002-07-15       Impact factor: 16.971

2.  The use of structure information to increase alignment accuracy does not aid homologue detection with profile HMMs.

Authors:  Sam Griffiths-Jones; Alex Bateman
Journal:  Bioinformatics       Date:  2002-09       Impact factor: 6.937

3.  An algorithm for progressive multiple alignment of sequences with insertions.

Authors:  Ari Löytynoja; Nick Goldman
Journal:  Proc Natl Acad Sci U S A       Date:  2005-07-06       Impact factor: 11.205

4.  Multiple alignment by aligning alignments.

Authors:  Travis J Wheeler; John D Kececioglu
Journal:  Bioinformatics       Date:  2007-07-01       Impact factor: 6.937

5.  The effect of the guide tree on multiple sequence alignments and subsequent phylogenetic analyses.

Authors:  S Nelesen; K Liu; D Zhao; C R Linder; T Warnow
Journal:  Pac Symp Biocomput       Date:  2008

Review 6.  Who watches the watchmen? An appraisal of benchmarks for multiple sequence alignment.

Authors:  Stefano Iantorno; Kevin Gori; Nick Goldman; Manuel Gil; Christophe Dessimoz
Journal:  Methods Mol Biol       Date:  2014

7.  HOMSTRAD: a database of protein structure alignments for homologous families.

Authors:  K Mizuguchi; C M Deane; T L Blundell; J P Overington
Journal:  Protein Sci       Date:  1998-11       Impact factor: 6.725

8.  Pfam: a comprehensive database of protein domain families based on seed alignments.

Authors:  E L Sonnhammer; S R Eddy; R Durbin
Journal:  Proteins       Date:  1997-07

9.  Making automated multiple alignments of very large numbers of protein sequences.

Authors:  Fabian Sievers; David Dineen; Andreas Wilm; Desmond G Higgins
Journal:  Bioinformatics       Date:  2013-02-21       Impact factor: 6.937

10.  Progressive sequence alignment as a prerequisite to correct phylogenetic trees.

Authors:  D F Feng; R F Doolittle
Journal:  J Mol Evol       Date:  1987       Impact factor: 2.395

View more
  16 in total

1.  Reply to Tan et al.: Differences between real and simulated proteins in multiple sequence alignments.

Authors:  Kieran Boyce; Fabian Sievers; Desmond G Higgins
Journal:  Proc Natl Acad Sci U S A       Date:  2015-01-06       Impact factor: 11.205

2.  Simple chained guide trees give poorer multiple sequence alignments than inferred trees in simulation and phylogenetic benchmarks.

Authors:  Ge Tan; Manuel Gil; Ari P Löytynoja; Nick Goldman; Christophe Dessimoz
Journal:  Proc Natl Acad Sci U S A       Date:  2015-01-06       Impact factor: 11.205

3.  Evaluating Statistical Multiple Sequence Alignment in Comparison to Other Alignment Methods on Protein Data Sets.

Authors:  Michael Nute; Ehsan Saleh; Tandy Warnow
Journal:  Syst Biol       Date:  2019-05-01       Impact factor: 15.683

4.  Clustal Omega for making accurate alignments of many protein sequences.

Authors:  Fabian Sievers; Desmond G Higgins
Journal:  Protein Sci       Date:  2017-10-30       Impact factor: 6.725

5.  Instability in progressive multiple sequence alignment algorithms.

Authors:  Kieran Boyce; Fabian Sievers; Desmond G Higgins
Journal:  Algorithms Mol Biol       Date:  2015-10-09       Impact factor: 1.405

6.  Application of the MAFFT sequence alignment program to large data-reexamination of the usefulness of chained guide trees.

Authors:  Kazunori D Yamada; Kentaro Tomii; Kazutaka Katoh
Journal:  Bioinformatics       Date:  2016-07-04       Impact factor: 6.937

7.  DECIPHER: harnessing local sequence context to improve protein multiple sequence alignment.

Authors:  Erik S Wright
Journal:  BMC Bioinformatics       Date:  2015-10-06       Impact factor: 3.169

8.  Systematic exploration of guide-tree topology effects for small protein alignments.

Authors:  Fabian Sievers; Graham M Hughes; Desmond G Higgins
Journal:  BMC Bioinformatics       Date:  2014-10-04       Impact factor: 3.169

9.  Bayesian Top-Down Protein Sequence Alignment with Inferred Position-Specific Gap Penalties.

Authors:  Andrew F Neuwald; Stephen F Altschul
Journal:  PLoS Comput Biol       Date:  2016-05-18       Impact factor: 4.475

10.  FAMSA: Fast and accurate multiple sequence alignment of huge protein families.

Authors:  Sebastian Deorowicz; Agnieszka Debudaj-Grabysz; Adam Gudyś
Journal:  Sci Rep       Date:  2016-09-27       Impact factor: 4.379

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.