Literature DB >> 24866534

Erasing errors due to alignment ambiguity when estimating positive selection.

Benjamin Redelings1.   

Abstract

Current estimates of diversifying positive selection rely on first having an accurate multiple sequence alignment. Simulation studies have shown that under biologically plausible conditions, relying on a single estimate of the alignment from commonly used alignment software can lead to unacceptably high false-positive rates in detecting diversifying positive selection. We present a novel statistical method that eliminates excess false positives resulting from alignment error by jointly estimating the degree of positive selection and the alignment under an evolutionary model. Our model treats both substitutions and insertions/deletions as sequence changes on a tree and allows site heterogeneity in the substitution process. We conduct inference starting from unaligned sequence data by integrating over all alignments. This approach naturally accounts for ambiguous alignments without requiring ambiguously aligned sites to be identified and removed prior to analysis. We take a Bayesian approach and conduct inference using Markov chain Monte Carlo to integrate over all alignments on a fixed evolutionary tree topology. We introduce a Bayesian version of the branch-site test and assess the evidence for positive selection using Bayes factors. We compare two models of differing dimensionality using a simple alternative to reversible-jump methods. We also describe a more accurate method of estimating the Bayes factor using Rao-Blackwellization. We then show using simulated data that jointly estimating the alignment and the presence of positive selection solves the problem with excessive false positives from erroneous alignments and has nearly the same power to detect positive selection as when the true alignment is known. We also show that samples taken from the posterior alignment distribution using the software BAli-Phy have substantially lower alignment error compared with MUSCLE, MAFFT, PRANK, and FSA alignments.
© The Author 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Entities:  

Keywords:  Bayes factor; codon models; false-positive rate; insertion/deletion; positive selection; sequence alignment

Mesh:

Year:  2014        PMID: 24866534      PMCID: PMC4155473          DOI: 10.1093/molbev/msu174

Source DB:  PubMed          Journal:  Mol Biol Evol        ISSN: 0737-4038            Impact factor:   16.240


  39 in total

1.  Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis.

Authors:  J Castresana
Journal:  Mol Biol Evol       Date:  2000-04       Impact factor: 16.240

2.  Evolutionary HMMs: a Bayesian approach to multiple alignment.

Authors:  I Holmes; W J Bruno
Journal:  Bioinformatics       Date:  2001-09       Impact factor: 6.937

3.  Evaluation of an improved branch-site likelihood method for detecting positive selection at the molecular level.

Authors:  Jianzhi Zhang; Rasmus Nielsen; Ziheng Yang
Journal:  Mol Biol Evol       Date:  2005-08-17       Impact factor: 16.240

4.  Local reliability measures from sets of co-optimal multiple sequence alignments.

Authors:  Giddy Landan; Dan Graur
Journal:  Pac Symp Biocomput       Date:  2008

5.  Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene.

Authors:  R Nielsen; Z Yang
Journal:  Genetics       Date:  1998-03       Impact factor: 4.562

6.  A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with application to the chloroplast genome.

Authors:  S V Muse; B S Gaut
Journal:  Mol Biol Evol       Date:  1994-09       Impact factor: 16.240

7.  The posterior probability distribution of alignments and its application to parameter estimation of evolutionary trees and to optimization of multiple alignments.

Authors:  L Allison; C S Wallace
Journal:  J Mol Evol       Date:  1994-10       Impact factor: 2.395

8.  A codon-based model of nucleotide substitution for protein-coding DNA sequences.

Authors:  N Goldman; Z Yang
Journal:  Mol Biol Evol       Date:  1994-09       Impact factor: 16.240

9.  MAFFT multiple sequence alignment software version 7: improvements in performance and usability.

Authors:  Kazutaka Katoh; Daron M Standley
Journal:  Mol Biol Evol       Date:  2013-01-16       Impact factor: 16.240

10.  Fast statistical alignment.

Authors:  Robert K Bradley; Adam Roberts; Michael Smoot; Sudeep Juvekar; Jaeyoung Do; Colin Dewey; Ian Holmes; Lior Pachter
Journal:  PLoS Comput Biol       Date:  2009-05-29       Impact factor: 4.475

View more
  22 in total

1.  Historian: accurate reconstruction of ancestral sequences and evolutionary rates.

Authors:  Ian H Holmes
Journal:  Bioinformatics       Date:  2017-04-15       Impact factor: 6.937

2.  Evaluating Statistical Multiple Sequence Alignment in Comparison to Other Alignment Methods on Protein Data Sets.

Authors:  Michael Nute; Ehsan Saleh; Tandy Warnow
Journal:  Syst Biol       Date:  2019-05-01       Impact factor: 15.683

3.  Absence of positive selection on CenH3 in Luzula suggests that holokinetic chromosomes may suppress centromere drive.

Authors:  František Zedek; Petr Bureš
Journal:  Ann Bot       Date:  2016-09-10       Impact factor: 4.357

4.  Correlations between alignment gaps and nucleotide substitution or amino acid replacement.

Authors:  Tae-Kun Seo; Benjamin D Redelings; Jeffrey L Thorne
Journal:  Proc Natl Acad Sci U S A       Date:  2022-08-16       Impact factor: 12.779

5.  Multiple evolution of flavonoid 3',5'-hydroxylase.

Authors:  Christian Seitz; Stefanie Ameres; Karin Schlangen; Gert Forkmann; Heidi Halbwirth
Journal:  Planta       Date:  2015-04-28       Impact factor: 4.116

6.  Evidence of Statistical Inconsistency of Phylogenetic Methods in the Presence of Multiple Sequence Alignment Uncertainty.

Authors:  A S Md Mukarram Hossain; Benjamin P Blackburne; Abhijeet Shah; Simon Whelan
Journal:  Genome Biol Evol       Date:  2015-07-01       Impact factor: 3.416

7.  Kr/Kc but not dN/dS correlates positively with body mass in birds, raising implications for inferring lineage-specific selection.

Authors:  Claudia C Weber; Benoit Nabholz; Jonathan Romiguier; Hans Ellegren
Journal:  Genome Biol       Date:  2014       Impact factor: 13.583

8.  A simple method to control over-alignment in the MAFFT multiple sequence alignment program.

Authors:  Kazutaka Katoh; Daron M Standley
Journal:  Bioinformatics       Date:  2016-02-26       Impact factor: 6.937

9.  CenH3 evolution reflects meiotic symmetry as predicted by the centromere drive model.

Authors:  František Zedek; Petr Bureš
Journal:  Sci Rep       Date:  2016-09-15       Impact factor: 4.379

10.  Inferring Indel Parameters using a Simulation-based Approach.

Authors:  Eli Levy Karin; Avigayel Rabin; Haim Ashkenazy; Dafna Shkedy; Oren Avram; Reed A Cartwright; Tal Pupko
Journal:  Genome Biol Evol       Date:  2015-11-03       Impact factor: 3.416

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.