Literature DB >> 14660683

Phylogenetic estimation of context-dependent substitution rates by maximum likelihood.

Adam Siepel1, David Haussler.   

Abstract

Nucleotide substitution in both coding and noncoding regions is context-dependent, in the sense that substitution rates depend on the identity of neighboring bases. Context-dependent substitution has been modeled in the case of two sequences and an unrooted phylogenetic tree, but it has only been accommodated in limited ways with more general phylogenies. In this article, extensions are presented to standard phylogenetic models that allow for better handling of context-dependent substitution, yet still permit exact inference at reasonable computational cost. The new models improve goodness of fit substantially for both coding and noncoding data. Considering context dependence leads to much larger improvements than does using a richer substitution model or allowing for rate variation across sites, under the assumption of site independence. The observed improvements appear to derive from three separate properties of the models: their explicit characterization of context-dependent substitution within N-tuples of adjacent sites, their ability to accommodate overlapping N-tuples, and their rich parameterization of the substitution process. Parameter estimation is accomplished using an expectation maximization algorithm, with a quasi-Newton algorithm for the maximization step; this approach is shown to be preferable to ordinary Newton methods for parameter-rich models. Overlapping tuples are efficiently handled by assuming Markov dependence of the observed bases at each site on those at the N - 1 preceding sites, and the required conditional probabilities are computed with an extension of Felsenstein's algorithm. Estimated substitution rates based on a data set of about 160,000 noncoding sites in mammalian genomes indicate a pronounced CpG effect, but they also suggest a complex overall pattern of context-dependent substitution, comprising a variety of subtle effects. Estimates based on about 3 million sites in coding regions demonstrate that amino acid substitution rates can be learned at the nucleotide level, and suggest that context effects across codon boundaries are significant.

Entities:  

Mesh:

Year:  2003        PMID: 14660683     DOI: 10.1093/molbev/msh039

Source DB:  PubMed          Journal:  Mol Biol Evol        ISSN: 0737-4038            Impact factor:   16.240


  159 in total

1.  Bayesian Markov chain Monte Carlo sequence analysis reveals varying neutral substitution patterns in mammalian evolution.

Authors:  Dick G Hwang; Phil Green
Journal:  Proc Natl Acad Sci U S A       Date:  2004-08-03       Impact factor: 11.205

2.  Estimating the frequency of events that cause multiple-nucleotide changes.

Authors:  Simon Whelan; Nick Goldman
Journal:  Genetics       Date:  2004-08       Impact factor: 4.562

3.  Aligning multiple genomic sequences with the threaded blockset aligner.

Authors:  Mathieu Blanchette; W James Kent; Cathy Riemer; Laura Elnitski; Arian F A Smit; Krishna M Roskin; Robert Baertsch; Kate Rosenbloom; Hiram Clawson; Eric D Green; David Haussler; Webb Miller
Journal:  Genome Res       Date:  2004-04       Impact factor: 9.043

4.  A comparative method for finding and folding RNA secondary structures within protein-coding regions.

Authors:  Jakob Skou Pedersen; Irmtraud Margret Meyer; Roald Forsberg; Peter Simmonds; Jotun Hein
Journal:  Nucleic Acids Res       Date:  2004-09-24       Impact factor: 16.971

5.  Amino acid coevolution induces an evolutionary Stokes shift.

Authors:  David D Pollock; Grant Thiltgen; Richard A Goldstein
Journal:  Proc Natl Acad Sci U S A       Date:  2012-04-30       Impact factor: 11.205

Review 6.  Variation in the mutation rate across mammalian genomes.

Authors:  Alan Hodgkinson; Adam Eyre-Walker
Journal:  Nat Rev Genet       Date:  2011-10-04       Impact factor: 53.242

7.  Comparative assessment of methods for aligning multiple genome sequences.

Authors:  Xiaoyu Chen; Martin Tompa
Journal:  Nat Biotechnol       Date:  2010-05-23       Impact factor: 54.908

8.  Using non-reversible context-dependent evolutionary models to study substitution patterns in primate non-coding sequences.

Authors:  Guy Baele; Yves Van de Peer; Stijn Vansteelandt
Journal:  J Mol Evol       Date:  2010-07-11       Impact factor: 2.395

9.  Are Synonymous Sites in Primates and Rodents Functionally Constrained?

Authors:  Nicholas Price; Dan Graur
Journal:  J Mol Evol       Date:  2015-11-12       Impact factor: 2.395

10.  Parallelism and Epistasis in Skeletal Evolution Identified through Use of Phylogenomic Mapping Strategies.

Authors:  Jacob M Daane; Nicolas Rohner; Peter Konstantinidis; Sergej Djuranovic; Matthew P Harris
Journal:  Mol Biol Evol       Date:  2015-10-08       Impact factor: 16.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.