| Literature DB >> 30104502 |
Ashley I Teufel1, Andrew M Ritchie2, Claus O Wilke3, David A Liberles4.
Abstract
When mutational pressure is weak, the generative process of protein evolution involves explicit probabilities of mutations of different types coupled to their conditional probabilities of fixation dependent on selection. Establishing this mechanistic modeling framework for the detection of selection has been a goal in the field of molecular evolution. Building on a mathematical framework proposed more than a decade ago, numerous methods have been introduced in an attempt to detect and measure selection on protein sequences. In this review, we discuss the structure of the original model, subsequent advances, and the series of assumptions that these models operate under.Entities:
Keywords: evolutionary modeling; mutation-selection models; protein evolution
Year: 2018 PMID: 30104502 PMCID: PMC6115872 DOI: 10.3390/genes9080409
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Figure 1Illustration of the variety of nonstationary and non-time-homogeneous phylogenetic models. (A) The Barry-Hartigan model is the most general possible Markov substitution model. The substitution process is modeled by an arbitrary Markov transition matrix for each branch (P–P, indicated by branch colors) and a vector of initial frequencies at the root . (B) A homogeneous but nonstationary model. Transition matrices are derived from a reversible continuous-time Markov model. All branches share the same rate matrix R, but the stationary frequencies are permitted to vary across the tree. (C) A non-homogeneous model reflecting more recent methods whereby a small number of reversible Markov models with different rates and frequencies are assigned to multiple branches within the tree (shaded region). (D) A nonstationary ‘breakpoint’ model in which state frequencies may differ within as well as among lineages.
Figure 2Schematic showing the construction of mutation-selection models in an equilibrium framework. Clockwise from top left: the model describes the DNA substitution process within a protein-coding sequence along a rooted phylogenetic tree. Each substitution is modeled by the product of a mutation rate and the rate of fixation within a population. The rate of mutation among codons is represented by the product of transition rates at each codon position. The fixation rate is represented by a population genetic model operating on a selection coefficient derived from a set of site-specific amino acid fitness values (AA values shown as single-letter codes). Fitness values may be treated as distinct parameters for each codon site, or probabilities may be calculated via a mixture over an assumed or unknown number of site categories (random effects). In a Bayesian framework the prior distribution of an unknown set of site categories may be given by a Dirichlet Process (DP).