Literature DB >> 20207713

An alignment confidence score capturing robustness to guide tree uncertainty.

Osnat Penn1, Eyal Privman, Giddy Landan, Dan Graur, Tal Pupko.   

Abstract

Multiple sequence alignment (MSA) is the basis for a wide range of comparative sequence analyses from molecular phylogenetics to 3D structure prediction. Sophisticated algorithms have been developed for sequence alignment, but in practice, many errors can be expected and extensive portions of the MSA are unreliable. Hence, it is imperative to understand and characterize the various sources of errors in MSAs and to quantify site-specific alignment confidence. In this paper, we show that uncertainties in the guide tree used by progressive alignment methods are a major source of alignment uncertainty. We use this insight to develop a novel method for quantifying the robustness of each alignment column to guide tree uncertainty. We build on the widely used bootstrap method for perturbing the phylogenetic tree. Specifically, we generate a collection of trees and use each as a guide tree in the alignment algorithm, thus producing a set of MSAs. We next test the consistency of every column of the MSA obtained from the unperturbed guide tree with respect to the set of MSAs. We name this measure the "GUIDe tree based AligNment ConfidencE" (GUIDANCE) score. Using the Benchmark Alignment data BASE benchmark as well as simulation studies, we show that GUIDANCE scores accurately identify errors in MSAs. Additionally, we compare our results with the previously published Heads-or-Tails score and show that the GUIDANCE score is a better predictor of unreliably aligned regions.

Mesh:

Year:  2010        PMID: 20207713      PMCID: PMC2908709          DOI: 10.1093/molbev/msq066

Source DB:  PubMed          Journal:  Mol Biol Evol        ISSN: 0737-4038            Impact factor:   16.240


  27 in total

1.  Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis.

Authors:  J Castresana
Journal:  Mol Biol Evol       Date:  2000-04       Impact factor: 16.240

2.  ROCR: visualizing classifier performance in R.

Authors:  Tobias Sing; Oliver Sander; Niko Beerenwinkel; Thomas Lengauer
Journal:  Bioinformatics       Date:  2005-08-11       Impact factor: 6.937

3.  Heads or tails: a simple reliability check for multiple sequence alignments.

Authors:  Giddy Landan; Dan Graur
Journal:  Mol Biol Evol       Date:  2007-03-25       Impact factor: 16.240

4.  Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments.

Authors:  Gerard Talavera; Jose Castresana
Journal:  Syst Biol       Date:  2007-08       Impact factor: 15.683

5.  Local reliability measures from sets of co-optimal multiple sequence alignments.

Authors:  Giddy Landan; Dan Graur
Journal:  Pac Symp Biocomput       Date:  2008

6.  The effect of the guide tree on multiple sequence alignments and subsequent phylogenetic analyses.

Authors:  S Nelesen; K Liu; D Zhao; C R Linder; T Warnow
Journal:  Pac Symp Biocomput       Date:  2008

7.  Characterization of pairwise and multiple sequence alignment errors.

Authors:  Giddy Landan; Dan Graur
Journal:  Gene       Date:  2008-06-03       Impact factor: 3.688

8.  Phylogeny-aware gap placement prevents errors in sequence alignment and evolutionary analysis.

Authors:  Ari Löytynoja; Nick Goldman
Journal:  Science       Date:  2008-06-20       Impact factor: 47.728

9.  INDELible: a flexible simulator of biological sequence evolution.

Authors:  William Fletcher; Ziheng Yang
Journal:  Mol Biol Evol       Date:  2009-05-07       Impact factor: 16.240

10.  Fast statistical alignment.

Authors:  Robert K Bradley; Adam Roberts; Michael Smoot; Sudeep Juvekar; Jaeyoung Do; Colin Dewey; Ian Holmes; Lior Pachter
Journal:  PLoS Comput Biol       Date:  2009-05-29       Impact factor: 4.475

View more
  109 in total

1.  Molecular evolutionary analysis of vertebrate transducins: a role for amino acid variation in photoreceptor deactivation.

Authors:  Yi G Lin; Cameron J Weadick; Francesco Santini; Belinda S W Chang
Journal:  J Mol Evol       Date:  2013-10-22       Impact factor: 2.395

2.  Phylogenomic analyses support the bifurcation of ciliates into two major clades that differ in properties of nuclear division.

Authors:  Feng Gao; Laura A Katz
Journal:  Mol Phylogenet Evol       Date:  2013-10-09       Impact factor: 4.286

3.  Multiple Sequence Alignment Averaging Improves Phylogeny Reconstruction.

Authors:  Haim Ashkenazy; Itamar Sela; Eli Levy Karin; Giddy Landan; Tal Pupko
Journal:  Syst Biol       Date:  2019-01-01       Impact factor: 15.683

4.  Negevirus: a proposed new taxon of insect-specific viruses with wide geographic distribution.

Authors:  Nikos Vasilakis; Naomi L Forrester; Gustavo Palacios; Farooq Nasar; Nazir Savji; Shannan L Rossi; Hilda Guzman; Thomas G Wood; Vsevolod Popov; Rodion Gorchakov; Ana Vázquez González; Andrew D Haddow; Douglas M Watts; Amelia P A Travassos da Rosa; Scott C Weaver; W Ian Lipkin; Robert B Tesh
Journal:  J Virol       Date:  2012-12-19       Impact factor: 5.103

5.  Accuracy estimation and parameter advising for protein multiple sequence alignment.

Authors:  John Kececioglu; Dan DeBlasio
Journal:  J Comput Biol       Date:  2013-03-14       Impact factor: 1.479

6.  Expression of a second open reading frame present in the genome of tick-borne encephalitis virus strain Neudoerfl is not detectable in infected cells.

Authors:  Jiří Černý; Martin Selinger; Martin Palus; Zuzana Vavrušková; Hana Tykalová; Lesley Bell-Sakyi; Ján Štěrba; Libor Grubhoffer; Daniel Růžek
Journal:  Virus Genes       Date:  2016-02-29       Impact factor: 2.332

7.  Limited utility of residue masking for positive-selection inference.

Authors:  Stephanie J Spielman; Eric T Dawson; Claus O Wilke
Journal:  Mol Biol Evol       Date:  2014-06-03       Impact factor: 16.240

8.  Erasing errors due to alignment ambiguity when estimating positive selection.

Authors:  Benjamin Redelings
Journal:  Mol Biol Evol       Date:  2014-05-27       Impact factor: 16.240

9.  Genome sequence of ground tit Pseudopodoces humilis and its adaptation to high altitude.

Authors:  Qingle Cai; Xiaoju Qian; Yongshan Lang; Yadan Luo; Jiaohui Xu; Shengkai Pan; Yuanyuan Hui; Caiyun Gou; Yue Cai; Meirong Hao; Jinyang Zhao; Songbo Wang; Zhaobao Wang; Xinming Zhang; Rongjun He; Jinchao Liu; Longhai Luo; Yingrui Li; Jun Wang
Journal:  Genome Biol       Date:  2013-03-28       Impact factor: 13.583

10.  Intermediate divergence levels maximize the strength of structure-sequence correlations in enzymes and viral proteins.

Authors:  Eleisha L Jackson; Amir Shahmoradi; Stephanie J Spielman; Benjamin R Jack; Claus O Wilke
Journal:  Protein Sci       Date:  2016-03-24       Impact factor: 6.725

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.