Daniel M Lyons1, Zhengting Zou1, Haiqing Xu1, Jianzhi Zhang2. 1. Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, USA. 2. Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, USA. jianzhi@umich.edu.
Abstract
Patterns of epistasis and shapes of fitness landscapes are of wide interest because of their bearings on a number of evolutionary theories. The common phenomena of slowing fitness increases during adaptations and diminishing returns from beneficial mutations are believed to reflect a concave fitness landscape and a preponderance of negative epistasis. Paradoxically, fitness decreases tend to decelerate and harm from deleterious mutations shrinks during the accumulation of random mutations-patterns thought to indicate a convex fitness landscape and a predominance of positive epistasis. Current theories cannot resolve this apparent contradiction. Here, we show that the phenotypic effect of a mutation varies substantially depending on the specific genetic background and that this idiosyncrasy in epistasis creates all of the above trends without requiring a biased distribution of epistasis. The idiosyncratic epistasis theory explains the universalities in mutational effects and evolutionary trajectories as emerging from randomness due to biological complexity.
Patterns of epistasis and shapes of fitness landscapes are of wide interest because of their bearings on a number of evolutionary theories. The common phenomena of slowing fitness increases during adaptations and diminishing returns from beneficial mutations are believed to reflect a concave fitness landscape and a preponderance of negative epistasis. Paradoxically, fitness decreases tend to decelerate and harm from deleterious mutations shrinks during the accumulation of random mutations-patterns thought to indicate a convex fitness landscape and a predominance of positive epistasis. Current theories cannot resolve this apparent contradiction. Here, we show that the phenotypic effect of a mutation varies substantially depending on the specific genetic background and that this idiosyncrasy in epistasis creates all of the above trends without requiring a biased distribution of epistasis. The idiosyncratic epistasis theory explains the universalities in mutational effects and evolutionary trajectories as emerging from randomness due to biological complexity.
Epistasis, or genetic interaction among a set of mutations, impacts the phenotypic effects of mutations and shapes fundamental evolutionary processes[1]. Epistasis is said to be positive (or negative) for a particular trait such as fitness if the trait value of the individual with multiple mutations is greater (or smaller) than the expectation from the corresponding single mutants under no epistasis[1]. A number of evolutionary theories such as the mutational deterministic hypothesis of the evolution of sexual reproduction[2] and the hypothesis of reduction in mutational load by truncation selection against deleterious mutations depend on assumptions of general trends of epistasis[3]. Universal patterns involving epistasis are emerging from decades of intense investigations[4,5]. For instance, many experimental evolution studies have shown that fitness increase slows during the organismal adaptation to a constant environment[6]. While the speed of fitness increase is typically measured per unit time[6], the same trend is observed when the speed is measured per mutation accrued[7]. This phenomenon of slowing adaptation is at least in part due to diminishing returns epistasis, a common observation that advantageous mutations are less beneficial on fitter genetic backgrounds[8-11]. Because diminishing returns epistasis is a form of negative epistasis, the above observations are thought to indicate a preponderance of negative epistasis between beneficial mutations and a concave fitness landscape[12]. Given the concave shape of the landscape inferred from ascent to a fitness peak, one would also expect to observe a concave shape during descent from the fitness peak—that is, accelerating fitness decline by mutation accumulation and negative epistasis between deleterious mutations. Contrary to this expectation, mutation accumulation experiments in the near absence of selection have revealed decelerating fitness declines[13-15] and manipulative experiments have demonstrated that deleterious mutations tend to be less harmful in less fit genetic backgrounds (a.k.a. increasing costs epistasis because of the higher costs of deleterious mutations in fitter genotypes)[16]. These observations concerning deleterious mutations are thought to indicate a convex fitness landscape and a predominance of positive epistasis[12,13].Apparently, the inferred shape of the fitness landscape and distribution of epistasis from climbing fitness peaks contrast the shape and distribution inferred from going down fitness peaks[12]. Consider a restatement of the problem—mutation accumulation and manipulative experiments suggest that the majority of mutational paths down a fitness peak are convex (positively epistatic). Thus, most mutational pathways up the peak during adaptation should be convex as well (positively epistatic), but the opposite is observed. We term this contradiction the uphill-downhill paradox.Although several theoretical models have been proposed to explain the inferred prevalence of either negative or positive epistasis[9,10,13,16-20], these models cannot simultaneously explain both in the same species, leaving the uphill-downhill paradox unresolved. For instance, the modular life model explains the negative epistasis among beneficial mutations by functional saturation of modules[17], but it also predicts negative epistasis among detrimental mutations. The metabolic control theory has been invoked in explaining the positive epistasis among deleterious mutations because a deleterious mutation in a linear pathway causes a smaller flux reduction when other enzymes in the pathway have already been adversely affected[16,21]. But this theory would also predict positive epistasis of beneficial mutations. Alternatively, one must additionally assume that adaptation is biased towards mutations that disrupt costly expendable pathways in order to explain diminishing returns[12]. Clearly, this assumption cannot be generally true. Theoretical work making no mechanistic assumptions have shown some promise[22,23]. Such work has found that pairwise epistasis between successive adaptive mutations is positively biased during late stages of adaptation even in a landscape of no overall bias of epistasis, suggesting that the epistasis between beneficial mutations may not represent the overall epistasis in the landscape. However, how this finding relates to the observed epistasis trends in adaptation and mutation accumulation is unclear. Rather than assuming a specific biological mechanism, below we propose and demonstrate that epistasis is generally idiosyncratic and that this idiosyncrasy is responsible for the general trends in both climbing and descending from fitness peaks.
RESULTS
Why epistasis could be highly idiosyncratic
Let g be the population growth rate (a.k.a. Malthusian fitness, logarithm of Wrightian fitness, or fitness for short) of a genotype in an environment and let n be the number of nucleotide sites in the genome that impact g. In general, g can be expressed as the sum of 2−1 terms of fitness effects, including the additive effect of every site, the interactive effect of every pair of sites, the interactive effect of every triplet of sites, and so on (see Methods). We refer to this model of fitness landscape as the n-order model, because it includes all terms up to the n-order interaction. It can be shown that a mutation at a single site changes up to 2/(2−1) ≈ 50% of all terms of effects making up g. Under the assumption that the interactive terms are idiosyncratic (i.e., varying with the interacting nucleotides involved), a single mutation can differentially alter as many as 2 (or ~25% of) terms of effects in two genotypes that differ by only one nucleotide; this number can rise up to 2 (or ~50% of) terms if the two genotypes are more different (see Methods). Given the potential of such a large fraction of differentially affected terms of g, it is not surprising that the same mutation could have vastly different effects in different genotypes. As long as the idiosyncrasy assumption holds, the same argument can be made for any phenotypic trait whose value is expressed as the sum of all additive and interactive terms of effects. Of course, not all 2−1 terms of effects are of the same magnitude, which would increase or decrease the effective fraction of terms differentially altered by a mutation. Regardless, the above consideration elucidates why mutational effects could be highly sensitive to the genetic background when biological interactions are complex.
Epistasis is highly idiosyncratic
To quantify the above sensitivity that originates from idiosyncratic epistasis, we define an idiosyncratic index (Iid) for a mutation as the variation in the fitness difference between genotypes that differ by the mutation, relative to the variation in the fitness difference between random genotypes for the same number of genotype pairs. Here, the variation may be measured by standard deviation (SD), range, or other statistics. We can further compute the Iid for a fitness landscape by averaging Iid of individual mutations considered. The Iid for a landscape varies from 0 to 1, corresponding to the minimal and maximal levels of idiosyncrasy, respectively. We first estimated Iid for the fitness landscape of a yeast tRNA gene that includes experimentally measured fitness of over 65,000 genotypes[24]. For example, the G-to-A mutation at site 10 has a fitness effect varying from −0.53 to 0.29 (SD = 0.13) on 88 different backgrounds. For comparison, the fitness difference between a randomly picked genotype and another randomly picked genotype varies from −0.59 to 0.74 (SD = 0.26) for 88 genotype pairs sampled (Fig. 1a). So, the ratio of the two SDs is 0.49. This analysis was repeated for 828 single mutations (considering reverse mutations) (Fig. 1b), and the average ratio of SD is Iid = 0.612 ± 0.005 (SE). Iid can be similarly defined for non-fitness traits, and we estimated Iid for a variety of empirical phenotype landscapes that are experimentally determined[24-32] and one that is computationally predicted (RNA-stability) (Supplementary Table 1). Overall, Iid varies from 0.18 to 0.80 among the 12 landscapes examined, with a mean of 0.43 (Fig. 1c). Hence, in an average phenotype landscape, a particular mutation’s effects across different backgrounds are 43% as variable as if they are randomly drawn from the effects of any number of mutations on any genetic background. To exclude the possibility that the observed idiosyncrasy is largely due to imprecise phenotyping, we computed Iid for the same 828 mutations in the tRNA landscape, but used fitness estimates from different numbers of experimental replicates, because the measurement error should decrease with the number of replicates. We found that Iid is insensitive to the number of replicates (Extended Data Fig. 1a), suggesting that the high idiosyncrasy is not explained by potentially imprecise phenotyping. Additionally, phenotypic values in the RNA-stability landscape were computed deterministically without measurement error, but mutational effects are still a quarter as idiosyncratic as the maximum (Iid = 0.25). We similarly observed high idiosyncrasy when range instead of SD of effects was used in estimating Iid (Extended Data Fig. 1b). Finally, another measure of idiosyncrasy is the frequency of sign epistasis in a landscape, or the proportion of mutations that are beneficial in some backgrounds but detrimental in others. In agreement with the high idiosyncrasy indices, nearly all mutations exhibit sign epistasis in all landscapes (93.7% of mutations, on average) (Extended Data Fig. 1b).
Fig. 1.
Idiosyncratic index of a wide variety of phenotype landscapes.
(a) Frequency distribution of the fitness effects of the mutation from G to A at position 10 across all available genetic backgrounds (purple) and the corresponding distribution of the fitness difference between two random genotypes for the same number of genotype pairs (grey) in the yeast tRNA fitness landscape. (b) Frequency distribution of the standard deviation (SD)-based idiosyncrasy index (Iid), which is the ratio of the SD of fitness effects of a particular mutation on different backgrounds to the SD of fitness differences between random genotype pairs, for all individual mutations in the yeast tRNA landscape. (c) SD-based Iid of various phenotype landscapes. Error bars show standard errors. Detailed information of each landscape is provided in Supplementary Table 1. (d) Schematic of a highly idiosyncratic fitness landscape. Genotypes are represented by circles, and the fitness of a genotype is represented by the circle size. The three black circles labeled with H, I, L respectively indicate three focal genotypes with relatively high, intermediate, and low fitness values, whereas the grey circles represent one-mutation neighbors of the focal genotypes. Each light-green outlined area encompasses a focal genotype and some one-mutation neighbors. Solid arrows indicate single mutations, whereas dotted arrows indicate multiple mutations. Solid arrows of the same color indicate the same mutation.
Extended Data Fig. 1
The high idiosyncrasy indices (Iid) observed are not due to phenotype measurement errors or the use of standard deviation (SD) instead of range of mutational effects.
(a) SD-based Iid of the yeast tRNA fitness landscape is insensitive to the number of experimental replicates used in the fitness estimation. Boxplots show the distribution of Iid values of 828 single mutations in the tRNA landscape, calculated based on different numbers of replicates. The lower and upper edges of a box represent the first (qu1) and third (qu3) quartiles, respectively, the horizontal line inside the box indicates the median (md), the whiskers extend to the most extreme values inside inner fences, md ± 1.5(qu3 − qu1), and the grey dots represent values outside the inner fences (outliers). Violet dots show mean Iid of all mutations calculated based on respective numbers of replicates. (b) Range-based Iid for various phenotype landscapes. Error bars show standard errors. Detailed information of each landscape is provided in Table S1. Shown in red is the fraction of mutations exhibiting sign epistasis in each phenotype landscape.
Expected consequences of idiosyncratic epistasis
Below we demonstrate the consequences of the substantial idiosyncrasy we find with regard to fitness, but the same applies to other traits. In a maximally idiosyncratic fitness landscape such as the one described by the house-of-cards model[33], fitness values (circle sizes in Fig. 1d) of neighboring genotypes connected through single mutations are uncorrelated. The fitness of a neighboring genotype of a high- or a low-fitness focal genotype is expected to be the same. Hence, the fitness difference between a neighboring genotype (grey circle) and the focal genotype (black circle) is expected to be less positive or more negative as the fitness of the focal genotype rises. In other words, beneficial mutations are less beneficial and deleterious mutations are more deleterious on fitter genotypes, causing diminishing returns and increasing costs, respectively. These arguments apply not only to the effects of the same mutation on different genetic backgrounds but also to the effects of different mutations on different backgrounds. That is, an arbitrary mutation on a relatively fit background is expected to be less beneficial or more detrimental than another arbitrary mutation on a relatively unfit background. Under the foregoing model of g, one can mathematically prove that, in the presence of idiosyncrasy of at least one interactive term, the correlation between the fitness effect of a mutation and the background fitness is negative, for both the same and different mutation(s) (see Methods). In the case of different mutations, among-site/state variation in the additive effect further contributes to the negative correlation (see Methods). Importantly, all of the above occurs even with no bias toward positive or negative epistasis in the fitness landscape and no fitness estimation error.
Idiosyncratic epistasis causes the trends of diminishing returns and increasing costs
To examine whether the extent of idiosyncratic epistasis in an actual fitness landscape is sufficient to explain the observed diminishing returns and increasing costs, we simulated a series of 16 fitness landscapes with n = 16 binary sites, under the n-order model of g. In the kth landscape in the series (1 ≤ k ≤ 16), we considered up to the kth order of interaction. That is, each term of effect from the first to the kth order interaction is a random variable independently drawn from the standard normal distribution whereas all other terms are set to 0. When k rises from 1 to 16, Iid increases from 0 to 0.69 (Fig. 2a), which is close to the theoretically predicted value (see Methods). In all simulated landscapes except the one with Iid = 0, most if not all mutations exhibit a negative Pearson’s correlation between fitness effect and background fitness (boxes in Fig. 2a). In addition, the larger the k and Iid, the more negative the correlations (boxes in Fig. 2a), supporting the role of idiosyncratic epistasis in creating the negative correlations. For comparison, 87.8% of mutations from the yeast tRNA fitness landscape show a negative correlation between fitness effect and background fitness (Fig. 2b). A similar trend is seen in other empirical phenotype landscapes (Extended Data Fig. 2a, b). Separating mutations that are beneficial or detrimental on the wild-type background or an arbitrary background reveals the familiar patterns of diminishing returns and increasing costs in both the simulated and empirical landscapes (Extended Data Fig. 3).
Fig. 2.
Negative correlation between mutational effect and background fitness as a result of idiosyncratic epistasis.
(a) Boxplot showing the distribution of Pearson’s correlation coefficient (r) between mutational effect and background fitness for individual mutations in a series of n-order landscapes. The lower and upper edges of a box represent the first (qu1) and third (qu3) quartiles, respectively, the horizontal line inside the box indicates the median (md), the whiskers extend to the most extreme values inside inner fences, md ± 1.5(qu3 − qu1), and the grey dots represent values outside the inner fences (outliers). Green diamonds represent r values for all mutations pooled in each landscape, while red circles indicate SD-based Iid of the landscapes. (b) Distribution of r for 828 individual mutations in the tRNA fitness landscape. (c) Relationship between background fitness and mutational effect for 414 mutations (reversions not considered) in the tRNA fitness landscape. The red line depicts the running mean in non-overlapping X-axis bins of width = 0.02, in all bins with more than 10 data points. To avoid a spurious correlation due to shared measurement error on the X- and Y-axis, we used three replicates of background fitness measures for the X-axis and three other replicates for the Y-axis in (b) and (c). For each mutation and its reversion, a randomly picked one is considered in (c).
Extended Data Fig. 2
Negative correlation between mutational effect and background phenotype in GFP and RNA-folding landscapes.
(a) Distribution of Pearson’s correlation coefficient (r) between mutational effect and background phenotype for individual mutations in the GFP landscape. (b) Distribution of r for individual mutations in the RNA-folding landscape. (c) Relationship between background phenotype and mutational effect for all mutations in the GFP landscape. (d) Relationship between background phenotype and mutational effect for all mutations in the RNA-folding landscape. MFE, minimum free energy. The red line depicts the running mean in non-overlapping X-axis bins of width = 0.02 and 2 in (c) and (d), respectively, in all bins with more than 10 data points. There is no measurement error in the RNA-folding landscape. Shared measurement error between mutational effect and background fitness cannot be controlled for in GFP as replicate fitness measurements are not available. For each mutation and its reverse, we considered a random one of them in (c) and (d).
Extended Data Fig. 3
Patterns of correlation between mutational effect and background fitness/phenotype for individual beneficial or deleterious mutations in various landscapes.
(a) Boxplots showing distributions of correlations in a series of n-order landscapes of 16 sites (where the highest order of nonzero interaction is indicated on the X-axis) for beneficial (blue) and deleterious (red) mutations, respectively. The lower and upper edges of a box represent the first (qu1) and third (qu3) quartiles, respectively, the horizontal line inside the box indicates the median (md), the whiskers extend to the most extreme values inside inner fences, md ± 1.5(qu3 − qu1), and the dots represent values outside the inner Patterns of correlation between mutational effect and background fitness/phenotype for individual beneficial or deleterious mutations in various landscapes. (a) Boxplots showing distributions of correlations in a series of n-order landscapes of 16 sites (where the highest order of nonzero interaction is indicated on the X-axis) for beneficial (blue) and deleterious (red) mutations, respectively. The lower and upper edges of a box represent the first (qu1) and third (qu3) quartiles, respectively, the horizontal line inside the box indicates the median (md), the whiskers extend to the most extreme values inside inner fences, md ± 1.5(qu3 − qu1), and the dots represent values outside the inner fences (outliers). (b-d) Frequency distributions of correlations for individual beneficial mutations (blue) and deleterious mutations (red) in the tRNA (b), GFP (c), and RNA-folding (d) landscapes. Whether a mutation is beneficial or deleterious is determined in reference to the wild-type (tRNA and GFP) or an arbitrary reference genotype (n-order and RNA-folding). The wider distribution for deleterious than beneficial mutations is at least in part due to the larger number of deleterious than beneficial mutations.
Furthermore, the effects of different mutations also negatively correlate with background fitness in the simulated landscapes (green diamonds in Fig. 2a; see Methods), as well as in the tRNA fitness landscape (Fig. 2c) and other empirical phenotype landscapes (Extended Data Fig. 2c, d). As mentioned, the negative correlation in the simulated landscape with Iid = 0 (green diamond in Fig. 2a) is due to the contribution from the among-site/state variation in additive effect; in the absence of this variation, all genotypes are equally fit, so the correlation disappears.
Idiosyncratic epistasis causes slowing fitness drops in mutation accumulation
When random mutations accrue in a relatively fit population in the near absence of selection, population fitness is expected to decline. Because idiosyncratic epistasis renders random mutations on average less deleterious on relatively unfit genotypes than on relatively fit genotypes, the fitness drop of the population is expected to decelerate during mutation accumulation until it reaches the mean fitness of all genotypes in the landscape, around which the fitness should subsequently fluctuate. We confirmed this prediction in the simulated n-order landscapes: As k and Iid increase, the slowing curvature becomes more prominent (Fig. 3a). A change in mutational supply explains why the fitness decline is decelerating even when Iid = 0 (see Methods), and as expected, this trend diminishes as n rises (Extended Data Fig. 4). For comparison, decelerating fitness declines are apparent during simulated mutation accumulations in the tRNA fitness landscape (Fig. 3b) and other empirical phenotype landscapes (Extended Data Fig. 5).
Fig. 3.
Fitness declines decelerate during mutation accumulation as a result of idiosyncratic epistasis.
(a) Average fitness trajectories simulated in a series of n-order landscapes. (b) Ten thousand fitness trajectories of mutation accumulation simulated in the tRNA fitness landscape, with the average fitness of all trajectories at each step shown in black. Dotted lines indicate the mean fitness of all genotypes in the corresponding landscape.
Extended Data Fig. 4
Average fitness trajectories of mutation accumulation simulated in various n-order additive landscapes (k = 1) with different numbers of sites (n).
The mean trajectories are scaled so that the minimum fitness appearing in the trajectory is 0 and the maximum is 1 to allow direct comparison.
Extended Data Fig. 5
Fitness declines decelerate during mutation accumulation as a result of idiosyncratic epistasis. Fitness declines decelerate during mutation accumulation as a result of idiosyncratic epistasis.
(a) A total of 5000 fitness trajectories of mutation accumulation simulated in the GFP landscape, with the average trajectory shown in black, at each step when the trajectory number exceeds 10. (b) A total of 350 fitness trajectories of mutation accumulation simulated in the RNA-folding landscape, with the average trajectory shown in black. The dotted lines indicate the mean phenotypic value of all genotypes in the landscape, excluding non-active genotypes in the GFP landscape. For comparison, the dashed line in (a) or (b) represents the predicted linear decline given the slope in the first mutational step.
The drop in fitness to the mean of all genotypes during mutation accumulation is observed in the n-order landscapes (Fig. 3a) and RNA-stability landscape (Extended Data Fig. 5b), while the average mutation accumulation trajectory in tRNA (Fig. 3b) and GFP (Extended Data Fig. 5a) landscapes fluctuates above the mean of all genotypes. This latter phenomenon is due to preferential sampling of genotypes close to the wild-type in the experimental data, “trapping” many simulated mutation accumulation trajectories around the wild-type. The former two theoretically simulated/calculated landscapes do not have such biases.
Idiosyncratic epistasis causes slowing fitness gains in adaptation
Idiosyncratic epistasis, in combination with certain distributions of genotype fitness or interactive effects, creates the phenomenon of decelerating fitness gains during adaptation. In a solely additive landscape with Iid = 0, each adaptive trajectory is basically a random ordering of the beneficial mutations. Thus, the mean fitness increase of every step is the mean of the effect of all beneficial mutations, leading to a linear average trajectory regardless of the fitness distribution (Fig. 4a). In an idiosyncratic landscape with Iid > 0, the mutation fixed at each step during adaptation is a random draw from beneficial mutations instead of all mutations. Because of this bias, the shape of the adaptive trajectory, unlike that of mutation accumulation, is dependent on the distribution of genotype fitness or interactive effects. For example, in a house-of-cards model (a special case of n-order landscapes with only the highest-order interaction term), when fitness of all genotypes is gamma distributed with shape parameter >1, =1, or <1 (Extended Data Fig. 6a), fitness rises sublinearly, linearly, and superlinearly with the number of mutations accumulated, respectively (Extended Data Fig. 6b) (see also ref. [34]). Although not sufficient for creating a decelerating adaptive trajectory, idiosyncrasy causes a decelerating trajectory in a wide range of full n-order landscapes, including, for example, those with normal- (Fig. 4a), gamma- (Extended Data Fig. 6c), and beta-distributed (Extended Data Fig. 6d) interaction effects. As expected, adaptation slows more drastically with greater Iid (Fig. 4a; Extended Data Fig. 6c, d). Simulated adaptation also decelerates in the tRNA fitness landscape (Fig. 4b) as well as in other empirical phenotype landscapes (Extended Data Fig. 7), suggesting that the empirical cases fulfill both the idiosyncrasy and fitness distribution requirements.
Fig. 4.
Adaptation slows as a result of idiosyncratic epistasis.
(a) Average adaptive trajectories simulated in a series of n-order landscapes. (b) A total of 15,878 adaptive trajectories simulated in the tRNA fitness landscape, with the average fitness of all trajectories shown in black, at each step when the trajectory number exceeds 10.
Extended Data Fig. 6
Idiosyncratic epistasis is necessary but not sufficient to cause decelerating adaptations. Idiosyncratic epistasis is necessary but not sufficient to cause decelerating adaptations.
(a) Gamma distributions of genotype fitness for house-of-cards landscapes, with different values of the gamma shape parameter (b) Theoretically computed mean fitness trajectories of adaptation on landscapes in (a) with corresponding colors. (c) Average adaptive trajectories starting from the genotype with the lowest fitness (0), simulated in a series of n-order landscapes of 16 sites where each nonzero interaction term of each genotype is drawn from a gamma distribution of = 1. (d) Average adaptive trajectories starting from the genotype with the lowest fitness (0), simulated in a series of n-order landscapes of 16 sites where each nonzero interaction term of each genotype is drawn from a beta distribution with For each landscape in (c) and (d), the distribution of epistasis between mutations is symmetrical with mean equal to 0.
Extended Data Fig. 7
Adaptation slows in empirical phenotype landscapes. Adaptation slows in empirical phenotype landscapes.
(a) A total of 5000 adaptive trajectories simulated in the GFP landscape, with the average trajectory shown in black, at each step when the trajectory number exceeds 10. (b) A total of 350 adaptive trajectories simulated in the RNA-folding landscape, with the average trajectory shown in black, at each step when the trajectory number exceeds 10. For comparison, the dashed line in (a) or (b) represents the predicted linear increase given the slope in the first mutational step.
DISCUSSION
In summary, we proposed a simple theory that uses the idiosyncrasy of epistasis to explain some of the most commonly observed patterns of mutational effects and evolutionary trajectories. Phenotype landscapes of a variety of genes and taxa confirm our assumption of idiosyncratic epistasis. Contrary to the common intuition, our work shows that diminishing returns and decelerating adaptations do not suggest a bias toward negative epistasis in the underlying fitness landscape or a concave landscape. Similarly, increasing costs and slowing fitness declines during mutation accumulation do not indicate a bias toward positive epistasis or a convex landscape. Thus, our theory resolves the uphill-downhill paradox.Although the idiosyncrasy of epistasis is a major characteristic of empirical phenotype landscapes (Fig. 1c), biological interactions are not completely idiosyncratic. Rather, idiosyncratic epistasis should serve as a null model for the role of epistasis in mutational effects and evolution. For example, the relationship between the mutational robustness of a genotype (i.e., fitness insensitivity to mutation) and its adaptability/evolvability to environmental challenges is debated[35-37]. Our theory reveals an intrinsic positive correlation between robustness and adaptability due to idiosyncratic epistasis, because, as the fitness of a genotype rises, deleterious mutations are more detrimental (i.e., lower robustness) and advantageous mutations are less beneficial (i.e., lower adaptability). Deviations from this null expectation may reveal interesting forms of epistasis beyond idiosyncrasy. Similarly, because slowing fitness drops during mutation accumulation naturally emerge from idiosyncratic epistasis, such observations need not be explained by selection for “genomic buffering against the fitness reduction caused by accumulated mutations”[13]. Rather, when this trend is absent or when the opposite trend is observed, selection for mutational robustness of the wild-type may be invoked[38].A major question is the relative contributions of idiosyncrasy and various biological mechanisms to the universal uphill/downhill observations in empirical data. Clonal interference[39] and changes in mutational supply[40] likely contribute to slowing adaptation. Interestingly, diminishing returns contributes more greatly than changes of mutational supply to slowing adaptation even in a constant environment[41], implying the importance of idiosyncrasy. Comparing adaptations of sexually and asexually reproducing organisms may provide a way to test the relative importance of clonal interference and idiosyncrasy. We emphasize that the high yet incomplete idiosyncrasy we find means that there is room for the action of various biological mechanisms. For example, the arrangement of enzymes in a metabolic pathway obviously has effects on the epistasis of mutations of enzyme genes[21] and biological systems do show modularity[42]. It will be important to develop the idiosyncratic epistasis theory into a model that can be fit to empirical data and compared directly to other models. Given that the idiosyncratic epistasis theory makes only one assumption—epistasis is at least somewhat idiosyncratic—such work will likely be fruitful in illuminating the causes and consequences of epistasis in a wide variety of systems.How does the idiosyncrasy of epistasis arise from the underlying deterministic biological interactions? The n-order model reveals that the number of interactive terms determining the phenotype of a genotype is potentially astronomical and that the same mutation differentially alters a substantial fraction of these terms in even slightly different genotypes. Consequently, it is difficult to predict the mutational effect in any particular genotype despite the underlying deterministic biological interactions, much like the apparently random outcome of a die roll that is deterministically shaped by myriad factors such as the movement of air molecules. That the universal trends of mutational effects and evolutionary trajectories emerge from this randomness due to idiosyncratic epistasis is no more surprising than the tendency of observing a smaller number in a second roll of a die when the first roll yields a five.
METHODS
Number of terms of phenotypic effects altered by a mutation
Let g be the Malthusian fitness (fitness for short) of a genotype in an environment and let n be the number of nucleotide sites in a genome that are relevant to g. Here g equals the sum of the additive fitness effect of every site (i.e., first order interaction), the interactive effect of every pair of sites (i.e., second order interaction), the interactive effect of every triplet of sites (i.e., third order interaction), and so on. That is, g contains terms of effects of the kth order of interaction (1 ≤ k ≤ n), for a total of 2−1 terms. Among these terms, 2 terms involve any particular site. Thus, a mutation at a single site potentially changes 2/(2−1) ≈ 50% of all terms making up g.
Differentially altered terms of effects in two genotypes caused by the same mutation
When the same mutation of allele P changing to Q at site k occurs in two different genotypes that differ at m sites (k is not one of the m sites), the differentially altered terms of effects in these genotypes must involve site k and at least one of the m sites, and may also involve site(s) identical between the two genotypes. The total number of differentially altered terms equals the number of terms involving k and other site(s) that may or may not be identical between the two genotypes, minus the number of terms involving k and other site(s) that are identical between the two genotypes. The resulting number is (2 − 1) − (2 − 1) = (2 − 1)2. When m = 1, the above number is 2. That is, up to 2/(2−1) ≈ 25% of terms are differentially altered by the same mutation in two genotypes that differ at only one site. When m = n-1, the above number is 2 – 1. That is, up to (2−1)/(2−1) ≈ 50% of terms are differentially altered by the same mutation in two maximally different genotypes.
The fitness effect of a given mutation is negatively correlated with background fitness
Let us consider the n-order landscape model and focus on the mutation from the P allele to the Q allele at site k of the genome. We examine the fitness effect of this mutation on different genetic backgrounds. Let X represent any genotype with the P allele at site k. Among them, x is the ith genotype whose Malthusian fitness is can be written as where is the sum of additive (i.e., 1st order interactive) effects, is the sum of the 2nd to nth order interactive effects involving the focal site k, and is the sum of the 2nd to nth order interactive effects that do not involve site k.Similarly, let Y represent any genotype with the Q allele at site k. For each genotype x, we have a corresponding genotype y that is identical to x except that site k now has the Q allele. , the fitness of y, can be written as , where is the sum of additive (i.e., 1st order interactive) effects, is the sum of the 2nd to nth order interactive effects involving site k, and is the sum of the 2nd to nth order interactive effects that do not involve site k.Note that the difference in the additive effect between alleles P and Q is a constant that is not influenced by sites other than k. That is, . Therefore, we haveHere, N is the total number of pairs of (x, y) and equals 4 for a genome with n sites each with four states, Cov stands for covariance, and Var stands for variance.Also note that, because x and y are the same except at site k, . Letbe the Pearson correlation between I and I. We have . Hence,. Under the reasonable assumption that the corresponding interactive terms of x and y are sampled from the same distribution, and are expected to be equal. Hence, . Thus, when I and I have a correlation of 1; otherwise . When epistasis is to some extent idiosyncratic, I does not correlate perfectly with I, resulting in .Under the assumption of independence among the interactive terms of a genotype, we have Cov(mutational effect, fitness of the background genotype) = Cov(R − R, R) . This mathematical result means that, when epistasis is to some extent idiosyncratic, for any given mutation, we expect a negative correlation between the background fitness and mutational effect, which is exactly what diminishing returns of beneficial mutations and increasing costs of deleterious mutations are. The above result holds when fitness is replaced with any phenotypic trait as long as the trait value of each genotype can be expressed as the sum of the 2−1 terms of effects.
Mutational effect is generally negatively correlated with background fitness
Below we show that the preceding result about a given mutation also applies to different mutations. That is, we expect a negative correlation between the mutational effect and background fitness even when different mutations are considered. R, the Malthusian fitness of genotype t, can be expressed by . Here, the superscript indicates the site(s) involved in an additive or interactive term. For instance, I2 stands for the additive effect of site 2 and stands for the interactive effect between sites 1 and 2.Let X represent an arbitrary genotype and Y represent another genotype that differs from X by a particular mutation named W that occurs at site k. We haveIn the above, all corresponding terms between I and I are equal except for the terms involving k. So,Under the assumption that all I terms in an R are independent from one another, Cov(mutational effect, background fitness)=According to the law of total variance and the law of total covariance, we can expand each term in the above equation. Let us use the second order interaction between site s and site k as an example.As shown in the section about a given mutation, as long as there is some degree of idiosyncrasy, So, Further, because = 0 under the reasonable assumption that, given W, and follow the same distribution. Hence, The same conclusion applies to all terms except the first-order interactive (additive) term, which is Because additive effects are independent of the genetic background, given W, and are both fixed and are two randomly sampled values from the same distribution. Hence, and . So and As W varies, and are two random variables from the same distribution. They have the same variance and are not usually completely correlated. So, . Under the special case when all additive terms are equal,Thus, Cov(mutational effect, background fitness) = Therefore, in the n-order model, mutational effect is negatively correlated with background fitness even for different mutations. As shown in the above mathematical derivation, this negative correlation has two sources: unequal additive effects and idiosyncratic epistasis. Given the same additive effects, increasing the idiosyncrasy in epistasis strengthens the negative correlation. As in the preceding section, the result here applies to any phenotypic trait as long as the trait value of a genotype can be expressed as the sum of the 2−1 terms of effects.
Expected idiosyncrasy index under the n-order landscape model
The variance of the effect of a particular mutation across all genetic backgrounds can be calculated as follows. Let X represent an arbitrary genotype and Y represent another genotype that differs from X at site k only. We have shown earlier thatIn the above, all corresponding terms between I and I are equal except for the terms involving k. So, and V If we assume that all interactive terms for X and Y are independent with the same variance σ2, Var.If there are M states at each site, among the kth order interactive terms, terms are expected to be the same between two random genotypes. One can show that Thus, two random genotypes are expected to differ by approximately 2 − 1 – e terms. Hence, the variance of the fitness difference between two random genotypes is Var Because M ≥ 2, < 2. So, when n is large, Var is approximately 2σ2. Therefore, the idiosyncrasy index becomes = ~0.71. Our numerical finding (the most right red dot in Fig. 2A) confirms this result.
Mutational supply and evolutionary trajectories
During adaptation, if the supply of beneficial mutations diminishes as the fitness of a population rises, the speed of population fitness increase per unit time will decline. However, if the speed of fitness increase is measured per beneficial mutation accrued as in the present study, the reducing supply of beneficial mutations will not reduce the speed of fitness increase.During mutation accumulation in the near absence of selection, as the population fitness declines, the supply of beneficial mutations should increase and the supply of deleterious mutations should decrease. Thus, even under a purely additive model, the speed of population fitness drop slows. When only the first few mutations accrued are examined, however, this phenomenon of slowing fitness drops should be minimal under the purely additive model unless the number of possible mutations is very limited.
Empirical phenotype landscapes
An unbiased search for phenotype landscape data published between 2000 and 2019 was preformed using Google Scholar with words such as “epistasis”, “fitness landscape”, or “genetic interaction”. A total of 18 datasets were found for which quantitative phenotype values were published or could be calculated without extensive analysis (e.g., studies reporting only sequencing reads were excluded) and which included genotypes with at least two mutations in comparison with the reference genotype. Measured phenotypes included protein function such as log(fluorescence), log(Wrightian fitness) or growth rate, and colony size. For landscapes reporting genotypes with nucleotide mutations, all 12 classes of single mutations were considered. For landscapes reporting genotypes with amino acid mutations, all 380 mutations between any two amino acids were considered as single mutations. Genotypes with fitness at the minimum detection limit (e.g., non-fluorescent GFP genotypes) or that were lethal or non-growing (e.g., tRNA genotypes with Wrightian fitness relative to the wild-type = 0.5) were excluded. A final set of 12 studies with at least 10 single mutations and at least an average of 10 fitness effects measured per mutation were used for further analysis. Supplementary Table 1 lists the basic information of these phenotype landscapes. The original study of the tRNA fitness landscape reported Wrightian fitness relative to the wild-type; we computed Malthusian fitness = log(Wrightian fitness) in the present study.To investigate the evolution of an arbitrary RNA that has a complex phenotype with no measurement error, we mapped the RNA-stability landscape of RNAs of 72 nucleotides. Similar to ref. [36], we defined the fitness of a sequence as the absolute value of the minimum free energy (MFE) of its most stable secondary structure, calculated using ViennaRNA (https://www.tbi.univie.ac.at/RNA/). The starting sequence was taken from a yeast tRNA sequence. Mutants were randomly created on each of two million random background genotypes, and this set of genotypes was used for subsequent analyses.
Simulating idiosyncratic fitness landscapes
We simulated a series of 16-site fitness landscapes under n-order models with two states (A/T) per site, including all 65,536 genotypes. The fitness of a genotype is determined by additive effects (referred to as first order interactions) and interactive effects. For the kth order interaction (1 ≤ k ≤ 16), there are interactive terms. For each of these terms, there are 2 possible state combinations. The fitness effect of each state combination of each interaction term for each order of interaction is drawn independently from the standard normal distribution, and the fitness of the genotype concerned is the sum of all these terms. Sixteen landscapes were made by including successively increasing orders of interactions. For instance, the first landscape contains only 1st order interactions (purely additive), the second landscape contains only 1st and 2nd order interactions, and the sixteenth landscape contains all orders of interactions. In each landscape, fitness values are linearly scaled to the interval of [0, 1]. As expected, Iid increased with the number of orders of interactions included (orange circles in Fig. 2a), because the numerator in the formula of Iid increased whereas the denominator stayed more or less constant. In each of these landscapes, epistasis between mutations is symmetrically distributed with the mean equal to 0. We also simulated additive landscapes with larger n values to examine the linearity of fitness drops during mutation accumulation.
Estimating idiosyncrasy index
For each single mutation in a fitness landscape, we calculated its fitness effects on all genetic backgrounds available. For each mutation, we also derived a control set of fitness effects by randomly sampling (with replacement) the same number of pairs of genotypes from the landscape as used for the mutation and computing the fitness difference for each pair. We then calculated the standard deviation (SD) of fitness effects and range of fitness effects for each mutation and its control dataset. For each mutation, we calculated the ratio in the SD (or range) between the actual data and the control data. The average ratio across all single mutations is the Iid of the landscape, and the error bars in Fig. 1c are the standard error of the mean (SE). The same method is used to estimate Iid of other phenotype landscapes. Although empirical phenotype landscape data typically include only a small fraction of nonrandomly sampled genotypes and their phenotypes, this nonrandom sampling is not expected to substantially affect Iid estimation, because both the variation of the effect of a mutation and the variation in the control data are estimated using the available landscape data. The theoretical minimum and maximum Iid for an individual mutation is, asymptotically, 0 and 1, respectively. However, because we use randomly sampled mutations to empirically estimate Iid, the Iid may be above 1 in some cases.If a mutation’s range of effects in different backgrounds crosses 0, the mutation exhibits sign epistasis. For each landscape, we calculated the proportion of mutations that exhibited sign epistasis, excluding mutations with little reported information (i.e., those that appear on less than five backgrounds).
Examining correlation between background fitness and mutational effect
For the simulated n-order landscapes and empirical landscapes (tRNA fitness, GFP activity, and RNA-stability), Pearson’s correlation coefficient was calculated between mutational effect on a particular trait and background trait value for each single mutation. Mutations appearing on less than four backgrounds were excluded. Pearson’s correlation coefficient was also calculated between all mutational effects and background trait values for each landscape.In the tRNA fitness landscape, the fitness of each genotype was measured in six replicates. To exclude artificial correlation due to measurement error, the background fitness of each case of a single mutation is calculated using the mean fitness value from replicates 1–3, while the mutational effect is computed using mean fitness from replicates 4–6.Additionally, two mutations which are the reverse of each other on the same backgrounds can automatically create a negative correlation between all mutational effects and background fitness. Hence, in each landscape where this could occur we randomly chose a mutation or its reversion when pooling all mutations together (green diamonds in Fig. 2a; Fig. 2c; Extended Data Fig. 2c–d).For analysis of diminishing returns and increasing costs, mutations were deemed beneficial or detrimental depending on their effect on the wild-type genotype in GFP and tRNA, or a random arbitrary genotype in RNA-stability, or on a genotype with fitness value closest to the average fitness in the n-order landscapes.
Simulating evolutionary trajectories in mutation accumulation (MA)
For each empirical landscape, MA from an initial genotype was simulated by randomly choosing single mutations until the resulting genotype was non-functional (GFP) or for a maximum of 10 mutational steps (tRNA) or 50 mutational steps (RNA-stability). For all plots concerning MA, the mean phenotype value of the landscape was calculated from all genotypes.For the GFP landscape, genotypes were not allowed to be revisited within a trajectory. If an MA trajectory was part of another simulated trajectory, the shorter trajectory was discarded. A total of 3,069 MA trajectories were simulated from each of 3,069 initial genotypes with activity equal to or greater than that of the wild-type. In the tRNA fitness landscape, 10,000 MA trajectories were simulated starting from the wild-type genotype. In the n-order fitness landscapes, 10,000 MA trajectories were simulated starting from the genotype with fitness closest to the 90th percentile. For the RNA-stability landscape, a total of 350 MA trajectories were simulated starting with the final genotypes from the simulated adaptations.
Simulating adaptive trajectories
For each empirical landscape, adaptation from an initial genotype was simulated by randomly choosing a single beneficial mutation, which increased the value of the trait concerned, until no more single beneficial mutations were available. A total of 5,000 adaptive trajectories starting from 3,441 initial genotypes chosen from the bottom 15% of genotypes (activity ≤ −0.4) were simulated for the GFP landscape. A total of 350 adaptive trajectories starting from 350 initial genotypes chosen from the bottom 0.0175% of genotypes in the RNA-stability landscape were simulated. In the tRNA fitness landscape, we simulated five adaptive trajectories starting from each genotype with fitness = 0.5; trajectories longer than two steps were retained, totaling 15,878 trajectories. In the n-order fitness landscapes, adaptations start from all genotypes in the bottom 20% of fitness distribution; among 10 adaptation simulations starting from each genotype, trajectories equal to or longer than two steps were retained.
The high idiosyncrasy indices (Iid) observed are not due to phenotype measurement errors or the use of standard deviation (SD) instead of range of mutational effects.
(a) SD-based Iid of the yeast tRNA fitness landscape is insensitive to the number of experimental replicates used in the fitness estimation. Boxplots show the distribution of Iid values of 828 single mutations in the tRNA landscape, calculated based on different numbers of replicates. The lower and upper edges of a box represent the first (qu1) and third (qu3) quartiles, respectively, the horizontal line inside the box indicates the median (md), the whiskers extend to the most extreme values inside inner fences, md ± 1.5(qu3 − qu1), and the grey dots represent values outside the inner fences (outliers). Violet dots show mean Iid of all mutations calculated based on respective numbers of replicates. (b) Range-based Iid for various phenotype landscapes. Error bars show standard errors. Detailed information of each landscape is provided in Table S1. Shown in red is the fraction of mutations exhibiting sign epistasis in each phenotype landscape.
Negative correlation between mutational effect and background phenotype in GFP and RNA-folding landscapes.
(a) Distribution of Pearson’s correlation coefficient (r) between mutational effect and background phenotype for individual mutations in the GFP landscape. (b) Distribution of r for individual mutations in the RNA-folding landscape. (c) Relationship between background phenotype and mutational effect for all mutations in the GFP landscape. (d) Relationship between background phenotype and mutational effect for all mutations in the RNA-folding landscape. MFE, minimum free energy. The red line depicts the running mean in non-overlapping X-axis bins of width = 0.02 and 2 in (c) and (d), respectively, in all bins with more than 10 data points. There is no measurement error in the RNA-folding landscape. Shared measurement error between mutational effect and background fitness cannot be controlled for in GFP as replicate fitness measurements are not available. For each mutation and its reverse, we considered a random one of them in (c) and (d).
Patterns of correlation between mutational effect and background fitness/phenotype for individual beneficial or deleterious mutations in various landscapes.
(a) Boxplots showing distributions of correlations in a series of n-order landscapes of 16 sites (where the highest order of nonzero interaction is indicated on the X-axis) for beneficial (blue) and deleterious (red) mutations, respectively. The lower and upper edges of a box represent the first (qu1) and third (qu3) quartiles, respectively, the horizontal line inside the box indicates the median (md), the whiskers extend to the most extreme values inside inner fences, md ± 1.5(qu3 − qu1), and the dots represent values outside the inner Patterns of correlation between mutational effect and background fitness/phenotype for individual beneficial or deleterious mutations in various landscapes. (a) Boxplots showing distributions of correlations in a series of n-order landscapes of 16 sites (where the highest order of nonzero interaction is indicated on the X-axis) for beneficial (blue) and deleterious (red) mutations, respectively. The lower and upper edges of a box represent the first (qu1) and third (qu3) quartiles, respectively, the horizontal line inside the box indicates the median (md), the whiskers extend to the most extreme values inside inner fences, md ± 1.5(qu3 − qu1), and the dots represent values outside the inner fences (outliers). (b-d) Frequency distributions of correlations for individual beneficial mutations (blue) and deleterious mutations (red) in the tRNA (b), GFP (c), and RNA-folding (d) landscapes. Whether a mutation is beneficial or deleterious is determined in reference to the wild-type (tRNA and GFP) or an arbitrary reference genotype (n-order and RNA-folding). The wider distribution for deleterious than beneficial mutations is at least in part due to the larger number of deleterious than beneficial mutations.
Average fitness trajectories of mutation accumulation simulated in various n-order additive landscapes (k = 1) with different numbers of sites (n).
The mean trajectories are scaled so that the minimum fitness appearing in the trajectory is 0 and the maximum is 1 to allow direct comparison.
Fitness declines decelerate during mutation accumulation as a result of idiosyncratic epistasis. Fitness declines decelerate during mutation accumulation as a result of idiosyncratic epistasis.
(a) A total of 5000 fitness trajectories of mutation accumulation simulated in the GFP landscape, with the average trajectory shown in black, at each step when the trajectory number exceeds 10. (b) A total of 350 fitness trajectories of mutation accumulation simulated in the RNA-folding landscape, with the average trajectory shown in black. The dotted lines indicate the mean phenotypic value of all genotypes in the landscape, excluding non-active genotypes in the GFP landscape. For comparison, the dashed line in (a) or (b) represents the predicted linear decline given the slope in the first mutational step.
Idiosyncratic epistasis is necessary but not sufficient to cause decelerating adaptations. Idiosyncratic epistasis is necessary but not sufficient to cause decelerating adaptations.
(a) Gamma distributions of genotype fitness for house-of-cards landscapes, with different values of the gamma shape parameter (b) Theoretically computed mean fitness trajectories of adaptation on landscapes in (a) with corresponding colors. (c) Average adaptive trajectories starting from the genotype with the lowest fitness (0), simulated in a series of n-order landscapes of 16 sites where each nonzero interaction term of each genotype is drawn from a gamma distribution of = 1. (d) Average adaptive trajectories starting from the genotype with the lowest fitness (0), simulated in a series of n-order landscapes of 16 sites where each nonzero interaction term of each genotype is drawn from a beta distribution with For each landscape in (c) and (d), the distribution of epistasis between mutations is symmetrical with mean equal to 0.
Adaptation slows in empirical phenotype landscapes. Adaptation slows in empirical phenotype landscapes.
(a) A total of 5000 adaptive trajectories simulated in the GFP landscape, with the average trajectory shown in black, at each step when the trajectory number exceeds 10. (b) A total of 350 adaptive trajectories simulated in the RNA-folding landscape, with the average trajectory shown in black, at each step when the trajectory number exceeds 10. For comparison, the dashed line in (a) or (b) represents the predicted linear increase given the slope in the first mutational step.
Authors: Jeffrey E Barrick; Dong Su Yu; Sung Ho Yoon; Haeyoung Jeong; Tae Kwang Oh; Dominique Schneider; Richard E Lenski; Jihyun F Kim Journal: Nature Date: 2009-10-18 Impact factor: 49.962