Literature DB >> 23884461

Population size dependence of fitness effect distribution and substitution rate probed by biophysical model of protein thermostability.

Abstract

The predicted effect of effective population size on the distribution of fitness effects and substitution rate is critically dependent on the relationship between sequence and fitness. This highlights the importance of using models that are informed by the molecular biology, biochemistry, and biophysics of the evolving systems. We describe a computational model based on fundamental aspects of biophysics, the requirement for (most) proteins to be thermodynamically stable. Using this model, we find that differences in population size have minimal impact on the distribution of population-scaled fitness effects, as well as on the rate of molecular evolution. This is because larger populations result in selection for more stable proteins that are less affected by mutations. This reduction in the magnitude of the fitness effects almost exactly cancels the greater selective pressure resulting from the larger population size. Conversely, changes in the population size in either direction cause transient increases in the substitution rate. As differences in population size often correspond to changes in population size, this makes comparisons of substitution rates in different lineages difficult to interpret.

Entities: Chemical Disease Species

Keywords: effective population size; epistasis; nearly neutral theory; population bottleneck; protein stability; substitution rate

Mesh：

Substances：
Proteins

Year: 2013 PMID： 23884461 PMCID： PMC3787666 DOI： 10.1093/gbe/evt110

Source DB: PubMed Journal: Genome Biol Evol ISSN： 1759-6653 Impact factor: 3.416

Introduction

Novel mutations that appear in a genome can be advantageous, increasing the resulting organism’s fitness, deleterious, reducing the fitness, or effectively neutral, having such a small effect on fitness that the fate of the mutation in the population is dominated by random drift. The relative fraction of these three different types of mutations, and the form of the overall distribution of fitness effects (, where and are the fitness of the wild type and mutant, respectively) caused by such mutations, has been a topic of interest and debate (Bustamante 2005; Eyre-Walker and Keightley 2007). Characterizing this distribution is essential for understanding the nature of genetic variation, including polymorphisms that may cause or influence diseases, as well as characterizing the evolutionary dynamics. Larger population sizes result in increased magnitude of the selective pressure acting on mutations of a given value of s. The fitness effect and the effective population size generally appear as a product in many evolutionary and genetic calculations, so often equations reference the population-scaled fitness effect ( for haploid organisms). For instance, , the probability of a new mutation with fitness effect s being fixed in an otherwise homogeneous diploid population, relative to the probability of fixation of a neutral mutation , is given by (Fisher 1930; Kimura 1957, 1962; Crow and Kimura 1970) where the approximation is valid for small s. The effective population size affects the substitution rate differently depending up the relative number of advantageous, deleterious, and neutral mutations (Gillespie 1999). If there are a substantial number of adaptive mutations, whose probability of fixation is less dependent on the population size, the substitution rate would be higher in larger populations reflecting the greater number of mutations that arise. Conversely, if the mutations are either neutral or so deleterious as to have negligible fixation probability, as suggested by the neutral theory of molecular evolution (Kimura 1968, 1983), then the substitution rate would be relatively independent of population size, with the greater number of mutations cancelling the decreased probability of fixation. The nearly neutral theory of evolution (Ohta 1973, 1992; Kimura 1983) emphasizes the role of slightly deleterious mutations, whose probability of acceptance is smaller in larger populations. This latter theory predicts that smaller populations should evolve faster. Observations of the population dependence of the substitution rate are difficult. One approach is to examine the dependence of the substitution rate on the rate of recombination. In regions of the genome with low recombination rates, mutations in linked genes compete for fixation (Hill and Robertson 1966), which has a similar effect as a lower effective population size. High recombination rates reduce this effect, so that regions of the genome that recombine rapidly are characterized by an increased effective population size. It has been observed that regions in Drosophila genome with high recombination rates evolve slower than regions of low recombination rates, consistent with the predictions of the nearly neutral theory (Larracuente et al. 2008; Arguello et al. 2010; Campos et al. 2012). Other studies have looked at the difference in the evolution of genes on sex chromosomes, observing higher rates of nonsynonymous substitutions on the nonrecombining chromosome (Wyckoff et al. 2002; Berlin and Ellegren 2006). These types of analysis assume that the mutation process in these two different types of regions, as well as the properties of the encoded proteins (e.g., expression levels, structure, and function), are not systematically dissimilar in ways that affect the substitution rate. There are, for instance, correlations between recombination rate, GC content, mutation rate, and rate of biased gene conversion that remain to be elucidated (Hardison et al. 2003; Duret 2006; Duret and Arndt 2008). Sex chromosomes might also be subject to specific adaptive selection that cannot be easily distinguished from reduced selection. A more direct approach is to examine how the substitution rate differs in different lineages. For instance, substitution rates have been compared in primates and rodents (Wu and Li 1985; Ohta 1995; Weinreich 2001), although such comparisons are compromised by differences in, for example, generation time, cell division rate, metabolic rate, mating behavior, ecological niche, and DNA repair mechanisms (Bromham et al. 1996; Bromham 2011). Faster evolution has been observed in endosymbiotic bacteria and fungi compared with their free-living relatives (Moran 1996; Woolfit and Bromham 2003). Endosymbiotic bacteria and fungi would have their effective population size reduced by the lower population sizes of their host, and would also be expected to undergo population bottlenecks when relatively few endosymbionts are transmitted to progeny, reducing the intra-host variation; this process is generally modelled as reducing the effective population size (Rispe and Moran 2000). The faster evolution of the endosymbionts is again consistent with the nearly neutral theory, although other biological or ecological characteristics of endosymbionts might complicate the analysis. Comparisons have been made of the rates of evolution of island and mainland populations, with the island populations again having a smaller population size due to the population bottleneck that occurs during colonialization as well as due to habitat restriction; some studies have concluded that the smaller populations evolve faster (Johnson and Seger 2001; Woolfit and Bromham 2005), although other studies have reached different or more nuanced conclusions (Charlesworth and Eyre-Walker 2007; Wright et al. 2009). Theoretical models of these effects are often based on simple models of the fitness landscape, such that the distribution of fitness effects is constant (Ohta 1977) or that the fitness of the mutant alleles has a fixed distribution (Kingman 1978). These models can break down if the population is far from a fitness optimum due to mutation–selection balance, where the preponderance of deleterious mutations is balanced by the greater fixation probabilities of advantageous mutations (Hartl et al. 1985; Cherry 1998; Wylie and Shakhnovich 2011; Charlesworth 2013). Mutation–selection balance cannot be achieved if the distribution of selective coefficients is independent of fitness; a stable equilibrium requires that there be an increased tendency toward acceptance of deleterious mutations as the fitness increases (Cherry 1998). In the case of a fitness function that plateaus as the fitness increases, increasing the population size would result in a higher equilibrium fitness, which can reduce the fitness impact of mutations, resulting in a narrower distribution of fitness effects. Under some conditions, this contraction of the distribution in s can exactly cancel the explicit population size dependence of S, so that the distribution of population-scaled fitness effects [], and therefore the substitution rate, is independent of population size, even in the nearly neutral model (Cherry 1998; Charlesworth 2013). The evolutionary process involves modifications of interacting biological macromolecules. By creating evolutionary models that explicitly include the properties of these evolving biomolecules, we can develop more realistic models of the evolutionary process, better understand how the evolutionary dynamics depends on biological context, and improve our understandings of how the properties of these biological molecules arose. To fulfill these ambitions, we need to create computational models that capture the salient aspects of the biology while still being computationally tractable. It has been noted that much of the selection pressure on coding regions involves maintaining an adequate degree of thermodynamic stability for the resulting expressed proteins (Wang and Moult 2001; Zeldovich et al. 2007; Drummond and Wilke 2008; Serohijos et al. 2012). This has led to studies investigating how these genetic regions would evolve where the fitness corresponds to a simple function of stability, such as the fraction of proteins that would be folded at equilibrium (Williams et al. 2006; Chen and Shakhnovich 2009; Goldstein 2011; Wylie and Shakhnovich 2011; Pollock et al. 2012). We investigate the distribution of selective effects generated by such a simple model, including its dependence on the effective population size. Our results correspond roughly to the nearly neutral theory. We find that the distribution of population-scaled fitness effect [] is essentially independent of the effective population size, suggesting that evolutionary dynamics, at least of regions of protein-coding genes where selection for stability dominates, should be similarly independent of population size. We observe, however, a strong dependence of the distribution of fitness effects, as well as overall substitution rate, on fluctuations in population size. This result can explain why differences in substitution rates have been observed in different lineages, and indicate that the effect of population bottlenecks on substitution rates cannot be modeled through an appropriate modification of the effective population size.

Materials and Methods

Protein Model

We consider a genome containing a 900-nucleotide gene, coding for a 300-residue protein, as described in earlier publications (Williams et al. 2006; Goldstein 2011; Pollock et al. 2012). We use a simple fitness function based on protein stability, where the fitness w of a protein is equal to the probability that the protein is folded at equilibrium PFolded, given by where is the difference in free energy between the folded and unfolded states, T is the temperature, and k is Boltzmann’s constant. (Note that more negative values of correspond to higher stability.) The free energy of a protein with sequence in any given conformation is computed by summing the contact energies of all of the pairs of residues which are in contact in that conformation, where we use the contact energies determined by Miyazawa and Jernigan (1985) based on frequencies of contacts in known protein structures; residues are in contact if their atoms ( for glycine) are closer than 7 Å to each other. We consider the native state of the protein to be the conformation of the purple acid phosphatase (PDB 1QHW; Lindqvist et al. 1999); the free energy for a given sequence in this particular native state is designated . We assume that the distribution of free energies for the large ensemble of unfolded states can be represented as a Gaussian distribution with mean and variance . We estimate and by calculating the free energy of the sequence in a set of 55 alternative structures. The free energy difference between the folded and unfolded states is then given by is set to 10160. T is set to 20 °C. The probability of folding, and thus the fitness, is then calculated using equation (2).

Evolutionary Model

We initialize a nucleic acid sequence to a set of 300 random codons (excluding stop codons). The codons are translated into a protein sequence using the standard genetic code, and the free energy of folding (and organismal fitness) calculated as described earlier. We simulate evolutionary dynamics where we assume that the mutation rate is slow relative to the fixation time, so that population variation can be ignored. We calculate the rate of all 3 × 900 possible single nucleotide substitutions, equal to the rate of mutation (using a K80 nucleotide substitution model [Kimura 1980] with a transition-transversion ratio of 2.0) times the probability of fixation of the mutation, calculated by computing the free energy of folding of the mutant and using equation (1). (Mutations resulting in stop codons are considered lethal.) We estimate the time to the next substitution by drawing from an exponential distribution with decay rate equal to the sum of all of the individual substitution rates, and choose a mutation to accept with probability proportional to its substitution rate. The protein sequence evolves with increasing stability (decreasing ) until the point of mutation–selection balance, where there is no further long-term change of stability. The simulation is then extended to an evolutionary interval of 10 nucleotide substitutions expected per nucleotide location for neutral substitutions. Only the data subsequent to the establishment of mutation–selection balance are used in the subsequent analysis. These simulations are repeated 100 times with = 104, = 106, and = 108. At each time point of the simulation, we calculate the effect of every possible single nucleotide mutation, and use all of these mutations to calculate the distribution of population-scaled fitness effects [] as well as the instantaneous substitution rate, represented by the ratio of nonsynonymous to synonymous substitution rates (dN/dS).

Explorations of Alternative Models

How sensitive are the results to a particular model? Two aspects of this model might be particularly relevant: 1) the specific relationship between fitness and stability; 2) the epistasis between various locations in the protein in calculating the fitness. We examine these aspects sequentially. The relationship between protein stability and organismal fitness is still unclear and is possibly complicated (Bershtein et al. 2012). In particular, there are indications that avoiding aggregation may be more important than the concentration of the folded state (Chen and Dokholyan 2008; Zhang et al. 2008; Johnson and Hummer 2011; Levy et al. 2012; Yang et al. 2012). A linear relationship has been observed between fitness cost and fraction of aggregated proteins (Geiler-Samerotte et al. 2011); such an effect would not greatly change the model discussed earlier, as it still results in a linear relationship between fraction of folded protein and fitness, albeit with a different slope. To see how sensitive the results are to variations in the fitness function, we consider a different fitness function representing a fitness penalty for aggregation, which we model using a quadratic function of the amount of unfolded protein where is chosen so that at is roughly similar to that in the original model represented by equation (2). Thirty simulations are made for each of the three values of using this fitness function.

Removing Epistasis

The contribution of every residue to the fitness depends on the amino acids at every other location in the protein. This is due to two different aspects of the model. First, the energetics are based on contact potentials, which are a function of pairs of amino acids which are in contact in the native or alternative structures. Second, the fitness is a nonlinear function of the free energy of folding, as indicated by equation (2). In particular, an amino acid substitution at one location in the protein will cause a change in the protein’s stability, but the effect of this change on the protein fitness will depend on the prior stability, which depends on the amino acids found in all other locations. This can be seen if we represent s as a function of the initial stability and change in stability (Wylie and Shakhnovich 2011) To remove these sources of epistasis, we construct a model where the fitness contribution of every amino acid at each location is fixed. We first compute the fitness wRef of the reference sequence. We then calculate the change in fitness for every single amino acid change, to generate , where is the fitness of an amino acid sequence differing from the reference sequence by the single replacement of for . The fitness of any arbitrary sequence is represented as The fitness effect corresponding to a mutation from amino acid to at position l is given by where we have taken advantage of the fact that the fitnesses during the simulation are all close to unity. For calculating the resulting distribution of population-scaled fitness effects, we take advantage of the simplicity of the model to calculate the distributions by summing over all possible mutations from all possible codons at each location, weighted by the equilibrium probability of the original codon and the mutation, using the approaches described in Tamuri et al. (2012). We averaged over the results obtained with 10 different reference sequences.

Variations in Effective Population Size: Bottlenecks

We also perform simulations with the original model (eq. 2) where the effective population size fluctuates between 106 and 104, representing periodic population bottlenecks, with equal amounts of evolutionary time spent at each population level. The period of the oscillations in units of evolutionary time vary between 0.001 and 1.0 expected neutral substitutions per location, with 10 simulations performed for each period.

Results

After an initial period, the free energy of folding of the proteins reached values of approximately ∼ −7 for = 104 to −12 kcal mol−1 for = 108. This degree of stability is roughly similar to that observed in real proteins. It is important to note that this stability is far from optimum; we can use hill-climbing algorithms to find sequences with stabilities in the order of −118 kcal/mol (Goldstein 2011). This marginal stability is also observed with natural proteins, which can be modified to have higher stabilities while retaining native-like activities (Serrano et al. 1993; Giver et al. 1998; Van den Burg et al. 1998; Zhao and Arnold 1999; Korkegian et al. 2005). The stability of these modelled proteins represents mutation–selection balance, where the greater number of destabilizing mutations is compensated by the higher acceptance rate for stabilizing mutations (Goldstein 2011). Figure 1 shows the distribution of population-scaled fitness effects of nonsynonymous mutations for three different population sizes varying over four orders of magnitude. As can be seen, the distributions are extremely similar. All of the distributions of deleterious mutations are strongly leptokurtic, fitting an inverted Gamma distribution with shape parameter . Approximately 25% of mutations are effectively neutral (), approximately 25% are mildly deleterious (), and 50% are strongly deleterious (). Although the distribution corresponds to the near-neutral theory, the substitution rate is nearly independent of population size, in agreement with the predictions of Cherry (1998): the average ratio of nonsynonymous to synonymous substitution rates () only changes minimally, from 0.350 for = 104 to 0.338 for = 108.

Distribution of population-scaled fitness effects for nonsynonymous mutations when fitness is proportional to the fraction of proteins folded at equilibrium, calculated using equation (2), for = 104 (green), = 106 (blue), and = 108 (red), on linear (A) and log (B) scales. To see how the results vary with the fitness function, we perform simulations using an alternative model based on aggregation, as represented by equation (4). The resulting distribution of population-scaled fitness effects (fig. 2A) is similar to the earlier model, with a rate of evolutionary change essentially independent of effective population size, with changing from 0.327 for = 104 to 0.323 for = 108.

(A) Distribution of population-scaled fitness effects for nonsynonymous mutations using a fitness model penalizing unfolded protein, based on equation (4), for = 104 (green), = 106 (blue), and = 108 (red). (B) Distribution of population-scaled fitness effects for a model where epistasis has been removed, based on equation (7), with the same color scheme as (A). (C) Distribution of population-scaled fitness effects for effective population size oscillating between 104 and 106, with the fitness calculated using equation (2), for various periods of the fluctuation: fixed at = 106 (blue), period = 0.1 (red), 0.01 (green), and 0.001 (cyan). Periods represent durations in evolutionary time corresponding to expected number of base substitutions per nucleotide position under conditions of neutral evolution. Removing epistasis through the use of the model represented by equation (7) results in a strong dependence of the distribution of S on the population size, as shown in figure 2B. There is also an extremely strong dependence of the substitution rate with effective population size, with changing from 0.725 for = 104 to 0.357 for = 106 to 0.027 for = 108. In contrast to differences in effective population size, there can also be fluctuations in effective population size. We perform simulations where the effective population size alternates between = 106 and = 104, as illustrated in the bottom of figure 3A, representing repetitive population bottlenecks. The resulting distribution of population-scaled fitness effects, for various timescales of population changes, is shown in figure 2C. With faster changes in population, there is a substantial increase in the number of advantageous mutations. The dependence of the rate of evolution on the period of the population changes is shown in figure 3B. With faster and faster changes, there is approximately a doubling of the average value of dN/dS. The instantaneous value of dN/dS for a period of fluctuation of 0.1, averaged over 1,000 cycles, is shown in figure 3A. Both increases and decreases in effective population sizes cause increases in the instantaneous value of dN/dS; when the population size increases, there is increased selection for greater stability, resulting in an increase in the number of stabilizing (adaptive) substitutions, while decreases in the population size results in a decrease in selective constraints, resulting in increased acceptance of slightly destabilizing substitutions.

(A) Instantaneous values of dN/dS during fluctuations in effective population size, when the fluctuations have a period of 0.1. Both increases and decreases in population size cause transient increases in the rate of evolution. Periods represent durations in evolutionary time corresponding to expected number of base substitutions per nucleotide position under conditions of neutral evolution. (B) Averaged values of for fluctuating population sizes, as a function of the period of the fluctuations (blue). The value of when the population is fixed at = 106 is shown for comparison (red).

Discussion

Using a simple but reasonable model of protein thermodynamics to provide a fitness function, we find that the distribution of the population-scaled fitness effects and the substitution rate are remarkably unaffected by the effective population size. In contrast to the small dependence of these evolutionary parameters on , we find a strong effect from time varying effective population sizes. There is a large transient increase in the number of adaptive substitutions when the population size increases, as the protein adapts to the greater degree of selective pressure; there is also a transient increase in the number of slightly deleterious substitutions when the population size decreases, as the selective pressure relaxes and the protein evolves to lower stabilities. This effect depends on the timescale of the population fluctuations, but is significant over a wide range.

Why the Lack of Dependence on Effective Population Size?

When a mutation occurs, the values of s and S corresponding to a given value of is approximately given by where we have assumed that the fitness of the wild type is close to unity (true of these simulations) and that the magnitude of is sufficiently small that a Taylor expansion is justified. (This is also not a bad assumption, as mutations with very large destabilizing effects will be evolutionarily unimportant, and mutations with very large stabilizing effects are extremely rare.) (A more exact but less general calculation can be performed starting with eq. 5.) For any given values of and , S is proportional to , so that distribution of population-scaled selective effects, , is then a stretched version of given by where is given by The stability is based on a large number of stabilizing and destabilizing interactions, involving residues throughout the protein. The magnitude of these interactions is on the same scale as the total free energy of folding, so that significant changes in can be caused by modifying only a few of these interactions. As a result the distribution of changes of free energy of folding, , is relatively unaffected by the stability , as long as the protein is not excessively (i.e., unrealistically) stable (Goldstein 2011), a conclusion that has been verified both by other simulations and experimental measurements (Bloom et al. 2005, 2006, 2007; Tokuriki et al. 2007). Although is independent , the slope of the fitness function of equation (2) will be dependent on the protein stability, becoming closer to zero as the protein stability increases, so that is dependent on as well as . As a protein evolves toward higher stability, the distribution of is constant but the selective pressure relaxes until the expected change in fitness, or alternatively the equilibrium average value of S for accepted mutations, is approximately zero. We can describe this equilibrium condition as where represents the value of S averaged over fixed substitutions, and is the fixation probability, which we are assuming, as in equation (1), is only a function of S. Note that, as long as is fixed, the only adjustable parameter in equation (12) is . There will be a certain value of where equation (12) is satisfied. (For the current model, this value is approximately .) More negative values result in a positive , moving the system to a flatter region of the fitness curve, making less negative, while less negative values result in a positive . The result is that the free energy of folding will change, modifying until , so that the change in the slope of the fitness landscape cancels the effect of the changing population size. At this equilibrium, the distribution of population-scaled fitness effects will be given by equation (9) with . The resulting distribution of S will be dependent on the value of , which will depend on the forms of and , but not on the value of . The generality of this argument indicates that this observation should not be dependent on a specific fitness function. What is required is for the fitness to be a concave function of some parameter and that this parameter is what Cherry terms an equimutable parameter, where the distribution of changes of this parameter with mutations is independent of its current value (Cherry 1998); for the current model, this is satisfied by the observed independence of on . It is likely that the characteristics of protein stability that provide for equimutability—that stability is a composite function that depends on contributions from many appropriately sized terms, where the rapidly declining number of increasingly stable sequences means the stability is far from optimal—is common in biology. We use fraction of proteins folded (eq. 2) as our fitness function, but alternative formulations (avoiding of self aggregation, eq. 4) give similar results. The calculation of the fitness is highly epistatic, where the contribution of each amino acid to the fitness depends on the rest of the protein sequence. This epistasis is required for this population size independence. When the epistasis is removed and the fitness becomes the sum of a large number of contributions from simple states, the fitness function ceases to be a concave function of a composite property, and a strong population size dependence results, as shown in figure 2B. The results presented here indicate that changes in population size affect evolutionary dynamics quite differently from differences in population size, as has been suggested by Charlesworth and Eyre-Walker (2007) and Cherry (1998). In particular, they noted that large increases in population size can cause adaptive bursts that overcome the decrease in substitution rate due to the stronger selective constraints. The work presented here also emphasizes the role of population changes, except in our model it is only the changes in population size that cause significant changes in the substitution rate. As a result, population increases of arbitrary size will cause increases in the substitution rate, as there is no decrease in the substitution rate to be overcome. Similarly, decreases in the population size will cause increases in the substitution rate, but this will only be a transient effect resulting from the change in population size, rather than the difference in population size.

How Does the Substitution Rate Depend on the Distribution of Mutational Effects?

In addition to being largely independence of population size, the substitution rate will also be independent of the magnitude of the effect of mutations on the protein stability; that is, scaling all of the values of by a constant factor will result in a change the stability of the protein so as to scale by , resulting in the same distribution of and thus the same substitution rate. The substitution rate, however, is dependent on the shape of To explore this dependence, we constructed a simpler model where a fraction of all mutations is destabilizing (with change in free energy of folding ), a fraction is neutral (), and a fraction is stabilizing (with change in free energy of folding ). The protein stability is adjusted until equation (12) is satisfied, and the relative fitness of the three different types of mutants and the corresponding acceptance rates calculated using equation (1). Figure 4A shows the dependence of the substitution rate on the fraction and effect of the destabilizing mutations (, , ). As shown, the rate is relatively insensitive to the magnitude of the destabilization, but extremely sensitive to the relative fraction. Increasing results in a corresponding stabilization of the protein, resulting in a movement to the flatter part of the fitness curve, reducing the impact of these deleterious mutations on the fitness. Compensation resulting from changes in are much weaker. Modifying the fraction and effect of stabilizing mutations shows a smaller effect, with the substitution rate increasing both with the fraction and magnitude of the stabilization, as shown in figure 4B (, , ).

(A) Effect on dN/dS of changing the fraction of mutations that are deleterious (), computed using a simple model, where the effect of the mutation on the free energy of folding is equal to (blue), (red), (green), (purple), and (orange). Other parameters as defined in the text. (B) Effect on dN/dS of changing the fraction of mutations that are advantageous (), computed using a simple model, where the effect of the mutation on the free energy of folding is equal to (blue), (red), (green), (purple), and (orange). Other parameters as defined in the text.

Comparison with Experimental Observations

The results presented here seem in contradiction with the observations described in the Introduction, where faster substitution rates are observed in 1) regions of the genome undergoing slow recombination compared with regions undergoing fast recombination, 2) endosymbionts compared with their free living relatives, and 3) island populations compared with mainland populations. In each of these three cases, there seems to be faster evolution in smaller populations, as would be predicted based on the nearly neutral model. As remarked earlier, there are many complicated issues in these comparisons, so that it is difficult to conclude that the only relevant differences between these two sets are differences in effective population size. We also note that the differences in substitution rates observed in these comparisons are generally quite modest. For instance, Woolfit and Bromham (2005) observed a median increase in dN/dS of only 20% in island populations compared with mainland populations, with no significant increase in overall substitution rate. Campos et al. (2012) observed that autosomal genes in nonrecombining regions in Drosophila had a dN/dS ratio of approximately 45% higher than similar genes in recombining regions of the genome. Although it is difficult to make quantitative comparisons with the simple models presented here, and it is difficult to estimate differences in effective population sizes (Gossmann et al. 2011, 2012), position-specific measures of fitness, as represented by equation (7), result in dN/dS increasing by a over factor of over 26 as the population size is reduced from 108 to 104, which suggests that it may be the weakness of effect of population size on substitution rate that requires an explanation. Additionally, these comparisons often interpret changes in effective population size as equivalent to differences in effective population size. Island populations undergo severe population bottlenecks, and bottlenecks are generally considered to reduce the effective population size. The analysis presented here suggests that population bottlenecks affect evolutionary dynamics quite differently from constant differences in population size, and it might be the population bottlenecks, with the resulting decrease and increase in selective constraints, that are affecting the substitution rate, while a static lower effective population size would have no such effect. Charlesworth and Eyre-Walker (2007), for instance, observed that differences in substitution rate between island and mainland populations depend upon whether a mainland population colonized an island (population size decrease in the island population) or an island population colonized a mainland (population size increase in the mainland population). Significantly, in the latter case, the mainland population generally had a higher rate of evolution than the island population, as would be predicted by the model presented here. This indicates that comparisons between the evolution of different lineages should be interpreted with care, as it would be difficult to disentangle the very different ways that static population size differences and population size fluctuations contribute to substitution rates. This is an inherent problem with this type of comparisons, as related lineages with different effective population sizes must have experienced the changes in population size that caused these differences. A similar argument can be made comparing the effect of recombination rates on effective population size. Competition between mutations occurring at different points on a genetically linked region of the genome may correspond to reduced effective population sizes, but these mutations would occur sporadically. In this case, there would be temporal fluctuations in this effective population size, as other mutations with different fitness effects occur in nearby genes. In this way, lack of recombination would result in variations in effective population size, increasing the rate of evolutionary change, as has been observed experimentally (Larracuente et al. 2008; Arguello et al. 2010; Campos et al. 2012). Regions of low or no recombination would also be more subject to selective sweeps, providing a further mechanism for rapid changes in effective population size. Again, as with lineage-specific substitution rates, it is difficult to disentangle differences from fluctuations in effective population sizes.

Why Do Some Proteins Evolve Faster than Others?

As pointed out by Cherry (1998), with the exception of conspicuous outliers, differences in the substitution rates in different proteins is surprisingly modest, varying by approximately an order of magnitude (Grishin et al. 2000). Differences in the mutation rate in different parts of the genome would contribute to this rate variation, as would differences in the number of sites under nonthermodynamic constraints such as the requirements of functionality (Zuckerkandl 1976). In addition, as described earlier, the substitution rate is strongly dependent on , in particular on the fraction of destabilizing mutations. It is likely that this distribution is dependent on the size, structure, composition, and environment of the protein, leading to variation in the substitution rates. Finally, there has been significant interest in proteins that are unfolded under physiological conditions, or contain significant unstructured regions (Wright and Dyson 1999; Dunker et al. 2008). The selective constraints on these proteins and regions are still poorly understood.

Limitations of the Model

Evolutionary dynamics are dependent on the mapping between genotype, phenotype, and fitness. In this article, we analyze a model of protein thermodynamics that provide a reasonable mapping between these quantities. It is known that achieving sufficient thermodynamic stability is an important selective constraint for many proteins (Wang and Moult 2001; Zeldovich et al. 2007; Drummond and Wilke 2008; Serohijos et al. 2012). Although our model of thermodynamics is, by necessity, simplistic, it does include many realistic aspects, such as the need for considering differences between the free energy of the native state and a large ensemble of alternative states and the stability being a holistic function of many epistatic energetic interactions. Importantly, it reproduces many known properties of proteins and their evolution, such as the observed marginal stability, the higher rate of evolution of exposed locations compared with buried locations, the tendency for hydrophobic residues to cluster in the interior, the dependence of protein stability on population size, and over-dispersion of the molecular clock (Goldstein 2011). Our measure of organismal fitness, the fraction of proteins folded at equilibrium, is certainly overly simple. There are likely to be specific requirements at particular locations in the protein, necessary for achieving functionality. These requirements on the protein sequence will, however, likely be sufficiently rigid so that changes in these locations would contribute minimally to the evolutionary dynamics. Other properties, such as resistance to aggregation, are also likely significant (Chen and Dokholyan 2008; Zhang et al. 2008; Johnson and Hummer 2011; Levy et al. 2012; Yang et al. 2012). Using a different fitness function based on avoiding aggregation, as represented by equation (4), did not significantly change the results. As long as the fitness function is a concave function of free energy of folding, increasing the population size will move the protein to a higher, and correspondingly flatter, region of the fitness landscape, and will result in the population-independent substitution rates described here. This is likely true even if the fitness is a concave function of some other quantity besides protein stability (such as saturation kinetics in biochemical reactions; Hartl et al. 1985), as long as this quantity is an aggregate quantity dependent on overall properties of the protein sequence that fulfils the equimutability criterion. An important caveat of this analysis is the assumption of a slow mutation rate, so that the time for fixation (or elimination) is short relative to the length of evolutionary time between mutations. Genetic variation in the population would affect the substitution rate and would also be dependent on the population size. Wylie and Shakhnovich (2011), for instance, have observed in a simple model that the distribution of fitness effects depends on the mutation rate, indicating that the presence of multiple mutations in the population has an effect. Similar complications can emerge if the timescale for fluctuations in population size become comparable with or shorter than the fixation time (Otto and Whitlock 1997). Neglecting this effect is a limitation of this work.

63 in total

1. Elevated rates of nonsynonymous substitution in island birds.

Authors: K P Johnson; J Seger
Journal: Mol Biol Evol Date: 2001-05 Impact factor: 16.240

Review 2. Intrinsically unstructured proteins: re-assessing the protein structure-function paradigm.

Authors: P E Wright; H J Dyson
Journal: J Mol Biol Date: 1999-10-22 Impact factor: 5.469

3. On the probability of fixation of mutant genes in a population.

Authors: M KIMURA
Journal: Genetics Date: 1962-06 Impact factor: 4.562

4. Population size and molecular evolution on islands.

Authors: Megan Woolfit; Lindell Bromham
Journal: Proc Biol Sci Date: 2005-11-07 Impact factor: 5.349

5. Fast accumulation of nonsynonymous mutations on the female-specific W chromosome in birds.

Authors: Sofia Berlin; Hans Ellegren
Journal: J Mol Evol Date: 2005-11-30 Impact factor: 2.395

6. Thermodynamic prediction of protein neutrality.

Authors: Jesse D Bloom; Jonathan J Silberg; Claus O Wilke; D Allan Drummond; Christoph Adami; Frances H Arnold
Journal: Proc Natl Acad Sci U S A Date: 2005-01-11 Impact factor: 11.205

7. Accelerated evolution and Muller's rachet in endosymbiotic bacteria.

Authors: N A Moran
Journal: Proc Natl Acad Sci U S A Date: 1996-04-02 Impact factor: 11.205

Review 8. The genome as a life-history character: why rate of molecular evolution varies between mammal species.

Authors: Lindell Bromham
Journal: Philos Trans R Soc Lond B Biol Sci Date: 2011-09-12 Impact factor: 6.237

9. Constraints imposed by non-functional protein-protein interactions on gene expression and proteome size.

Authors: Jingshan Zhang; Sergei Maslov; Eugene I Shakhnovich
Journal: Mol Syst Biol Date: 2008-08-05 Impact factor: 11.429

10. The impact of recombination on nucleotide substitutions in the human genome.

Authors: Laurent Duret; Peter F Arndt
Journal: PLoS Genet Date: 2008-05-09 Impact factor: 5.917

18 in total

Review 1. Selection on protein structure, interaction, and sequence.

Authors: Peter B Chi; David A Liberles
Journal: Protein Sci Date: 2016-02-11 Impact factor: 6.725

2. Determining the factors driving selective effects of new nonsynonymous mutations.

Authors: Christian D Huber; Bernard Y Kim; Clare D Marsden; Kirk E Lohmueller
Journal: Proc Natl Acad Sci U S A Date: 2017-04-11 Impact factor: 11.205

Review 3. Merging molecular mechanism and evolution: theory and computation at the interface of biophysics and evolutionary population genetics.

Authors: Adrian W R Serohijos; Eugene I Shakhnovich
Journal: Curr Opin Struct Biol Date: 2014-06-19 Impact factor: 6.809

4. Accelerated simulation of evolutionary trajectories in origin-fixation models.

Authors: Ashley I Teufel; Claus O Wilke
Journal: J R Soc Interface Date: 2017-02 Impact factor: 4.118

Review 5. Bridging the physical scales in evolutionary biology: from protein sequence space to fitness of organisms and populations.

Authors: Shimon Bershtein; Adrian Wr Serohijos; Eugene I Shakhnovich
Journal: Curr Opin Struct Biol Date: 2016-10-31 Impact factor: 6.809

6. A systematic survey of an intragenic epistatic landscape.

Authors: Claudia Bank; Ryan T Hietpas; Jeffrey D Jensen; Daniel N A Bolon
Journal: Mol Biol Evol Date: 2014-11-03 Impact factor: 16.240

Review 7. Shifts in amino acid preferences as proteins evolve: A synthesis of experimental and theoretical work.

Authors: Noor Youssef; Edward Susko; Andrew J Roger; Joseph P Bielawski
Journal: Protein Sci Date: 2021-08-12 Impact factor: 6.993

8. Simple Biophysical Model Predicts Faster Accumulation of Hybrid Incompatibilities in Small Populations Under Stabilizing Selection.

Authors: Bhavin S Khatri; Richard A Goldstein
Journal: Genetics Date: 2015-10-04 Impact factor: 4.562

Review 9. Detecting selection on protein stability through statistical mechanical models of folding and evolution.

Authors: Ugo Bastolla
Journal: Biomolecules Date: 2014-03-07

10. Weak negative and positive selection and the drift load at splice sites.

Authors: Stepan V Denisov; Georgii A Bazykin; Roman Sutormin; Alexander V Favorov; Andrey A Mironov; Mikhail S Gelfand; Alexey S Kondrashov
Journal: Genome Biol Evol Date: 2014-05-14 Impact factor: 3.416