David B Saakian1,2, Tatiana Yakushkina3, Chin-Kun Hu1,4,5. 1. Institute of Physics, Academia Sinica, Nankang, Taipei 11529, Taiwan. 2. A.I. Alikhanyan National Science Laboratory (Yerevan Physics Institute) Foundation, Alikhanian Brothers St. 2, Yerevan 375036, Armenia. 3. National Research University Higher School of Economics, Moscow, 101000, Myasnitskaya Street 20, Russia. 4. National Center for Theoretical Sciences, National Tsing Hua University, Hsinchu 30013, Taiwan. 5. Business School, University of Shanghai for Science and Technology, Shanghai 200093, China.
Abstract
We propose a modification of the Crow-Kimura and Eigen models of biological molecular evolution to include a mutator gene that causes both an increase in the mutation rate and a change in the fitness landscape. This mutator effect relates to a wide range of biomedical problems. There are three possible phases: mutator phase, mixed phase and non-selective phase. We calculate the phase structure, the mean fitness and the fraction of the mutator allele in the population, which can be applied to describe cancer development and RNA viruses. We find that depending on the genome length, either the normal or the mutator allele dominates in the mixed phase. We analytically solve the model for a general fitness function. We conclude that the random fitness landscape is an appropriate choice for describing the observed mutator phenomenon in the case of a small fraction of mutators. It is shown that the increase in the mutation rates in the regular and the mutator parts of the genome should be set independently; only some combinations of these increases can push the complex biomedical system to the non-selective phase, potentially related to the eradication of tumors.
We propose a modification of the Crow-Kimura and Eigen models of biological molecular evolution to include a mutator gene that causes both an increase in the mutation rate and a change in the fitness landscape. This mutator effect relates to a wide range of biomedical problems. There are three possible phases: mutator phase, mixed phase and non-selective phase. We calculate the phase structure, the mean fitness and the fraction of the mutator allele in the population, which can be applied to describe cancer development and RNA viruses. We find that depending on the genome length, either the normal or the mutator allele dominates in the mixed phase. We analytically solve the model for a general fitness function. We conclude that the random fitness landscape is an appropriate choice for describing the observed mutator phenomenon in the case of a small fraction of mutators. It is shown that the increase in the mutation rates in the regular and the mutator parts of the genome should be set independently; only some combinations of these increases can push the complex biomedical system to the non-selective phase, potentially related to the eradication of tumors.
The concepts of genome instability1234567891011 and clonal evolutionary dynamics121314 are key ideas in understanding the causes and behavior of cancer. Genome instability is a hallmark of cancer11, plays a significant role in bacterial evolution1516171819, and has been a focus of physics-community research on oncology20. We need more complex nonlinear models12131421 than those for the viral evolution22232425 for the cancer analysis.Models with relatively simple evolutionary dynamics can be applied to cancers that arise from one or two driver mutations, e.g., inherited retinoblastoma; however, most cancers require three or more. It has been proposed126 that some cancers may be associated with a substantial increase in the mutation rate of the genome. The question here is how the dynamics and phase structure of the evolving clonal population.There have been some attempts to construct and solve evolutionary dynamics systems with mutator genes1617. The earlier work16 on this subject, influenced by experimental results on E. coli27, discusses the problem of how selection in natural populations favors more rapid mutation. The authors suggest a simple model with a linear fitness function to identify the conditions that lead to selection for or against mutator states. Some approximate estimates were made for the steady state distribution and the dynamics of the mutator fraction in the population. The later paper17 used a quasispecies model with a multiplicative fitness function.Classical quasispecies models2223 describe the evolutionary process of an infinite population of binary sequences caused by two forces: mutation and selection; the former is controlled by the mutation rate and the latter by the fitness function. One of the most impressive results for these models is the existence of a phase transition2328, known as the “error threshold” in some versions of the fitness landscape. When the mutation rate is small, a selective phase is observed, where sequences organize into a “quasispecies” near the sequence with the maximum fitness value. The non-selective phase observed with higher mutation rates shows dilution throughout the sequence space.Returning to mutator dynamics, Nagar and Jain17 first studied the possible phases for these systems. Here, the fraction of mutator genome sequences in the population defines different phases: a normal phase with only normal sequences (a degenerate case when there are no back-mutations in a mutator gene), a mixed phase in which both mutators and non-mutators coexist, and a pure mutator phase with only mutators.Among other findings, it has been shown that the small mutation rates for the mutator gene lead to the mixed phase with a small fraction of the mutator allele, but an increase in these rates turns the system to the mutator phase. The infinite population model with a constant mutation rate has been solved in ref. 29 (see also the recent review30).When we map the evolutionary dynamics to statistical physics, the genome length plays the role of lattice size. In statistical physics, different phases arise as different analytical expressions for free energy at a large lattice size. Similarly, we need the large genome length to define the evolutionary dynamics phases. In this paper, we derive these different expressions for the mean fitness in mixed and mutator phases. The most realistic case, when back mutation for the mutator gene can be neglected due to its small impact during the time of the experiment, gives a very simple situation: in a mutator phase in a steady state, there is a zero fraction of normal sequences, even for small genome length, but there is a finite fraction of normal sequences in the mixed phase.Several different classes of evolution models form the basis of mutator phenomenon analysis:Class I Phenomenological models56. Such models consider deterministic or stochastic equations for a set of several genotypes. Models in this class are relatively simple but useful for describing the specific types of experiments, e.g., investigations that show how the mutator mechanism competes with the ordinary evolution scheme with a high mutation rate56.In ref. 5, a simple infinite population model without selection has been exposed, and it has been assumed that cancer arises after a given number of mutations in oncogenes. The probability of the start of cancer development at a fixed time has been calculated for the mutator mechanism as well as without mutators. The accuracy of the derivations has been well controlled using a small parameter: the ratio of the ordinary mutation rate to the mutation rate due to mutators. A different phenomenological model has been proposed without derivation details6, which discusses three types of fitness landscapes. This concept is of interest, and we consider one of the fitness landscapes in our work. However, the phenomenological models as a methodology cannot describe the different phases of the evolutionary dynamics accurately.Class II Infinite population microscopic models. Models in this class are defined in the binary sequence space with a finite genome length1617. The Crow-Kimura182231 and Eigen23242532 models, which are the quasispecies models mentioned above, have been widely investigated over recent decades, especially in the context of virus evolution. Mathematically, each of these models is described via a system of nonlinear master equations, which can be mapped to the chain of linear ordinary differential equations with some nonlinear transformation. In the Crow-Kimura setting, mutation and selection are parallel processes, while in the Eigen model, mutation is connected to selection. In smooth fitness landscapes, the steady-state distributions of the Crow-Kimura and Eigen models can be mapped to each other. In the large genome limit, the Crow-Kimura model is equivalent to a discrete-time Eigen model33 and is very close to branching processes. Moreover, the Wright-Fisher model3435 for large populations can be mapped to a discrete-time Eigen model33.Class III Finite population microscopic models36. In this case, models are also defined in the binary sequence space36. In ref. 4, numerical simulations have been performed for the Moran model by considering the Muller ratchet in the presence of mutators. In ref. 7, the growth population version of the Wright-Fisher model3435 for the four-dimensional landscape has been formulated; there are neutral, deleterious and advantageous mutations, as well as a sufficient number of mutations in the part of the genome that brings about the mutator effect. According to these results, the mutator phenotype evolves only at a moderate level of selection for the advantageous mutations.Class IV Infinite population and infinite number of genotypes models without backward mutations2930. These models are useful to describe the virus evolutionary dynamics for a short period of time. Compared with the microscopic models, which represent different possible phases and collective effects in evolution, class IV models are less fundamental.In the current paper, we consider the microscopic infinite population models (class II). The discrete-time version of the Eigen model232425 was originally introduced to describe the self-replication of macromolecules at the origin of life (see also3738). In this model, the selection and mutation happen together (mutation directly follows replication), while in the Crow-Kimura model, originally proposed to describe biological speciation, mutation and selection happen in parallel. In both models, the genome is represented by a chain of L letters (genes), which take the values ±1, similar to the Ising spin3940. There is a mutation rate of μ per letter. We take the sequence with all “+” letters as the reference sequence. All of the genomes with the same total number l of “−” letters (l point mutations from the reference sequence) have fitness r ≡ f(1 − 2l/L), where f(x) as a fitness function, x = 1 − 2l/L. For the initial symmetric distribution, the model can be described via differential equations of L + 1 variables P, corresponding to the total probabilities of the sequences with l mutations in the genome. The mean fitness of the Crow-Kimura model has been calculated using algebraic methods41 and the Hamilton-Jacobi equation (HJE) method4243. The exact dynamics had been derived using the HJE44. Here, we construct a mutator gene model on the basis of the Crow-Kimura model182231 and solve it for the general fitness case, by providing also the solution for the Eigen model version of the mutator phenomena.Our solutions for both models are consistent with each other.We find the analytical solution for the mean fitness for the general fitness function, including the multidimensional case, and clarify and calculate the order parameters and the phases of the model, which was an open problem until now. Previous work56 has been focused on the comparison between the efficiencies of cancer development via ordinary evolutionary dynamics and the evolutionary dynamics using the mutator mechanism. Our quantitative results of the phase structure (shown in Fig. 1) depict the borders of different phases. The procedure to obtain Fig. 1 is described below.
Figure 1
The phase structure of the mutator model with single peak landscape with zero value for any argument except l = 0: f(x = 1) = g(x = 1) = J.
The parameters of the model are α1 = a, α2 = 0, μ1 = 1, and μ2 = μ. There are three phases: mixed phase with 0 < s < 1, 0 < q < 1, non-selective phase with s = 0, 0 < q ≤ 1 and mutator phase with 0 < s, q = 1. The border between non-selective and mutator phases is given by μ = J, the border between non-selective and mixed phases is given by a = J − 1, between mixed and mutator phases is given by a + 1 = μ line. From the bio-medical perspectives we distinguish the mutator I and mutator II, mixed I and mixed II subphases. From the mutator I, the system transforms to the non-selective phase simply increasing the μ. From the mixed II the system transformers to the non-selective phase simply increasing the a ≡ α1. From the mutator II and mixed I subphases we need change both a and μ to transform the system to the non-selective phase. We have the same picture in case of Eigen model with a fitness A for the peak sequence and fitness 1 for other sequences. We have for the mixed phase R = e−(A, for the mutator phase R = e−A and for the non-selective phase R = 1. The border between non-selective and mutator phases is given by μγ = lnA, the border between non-selective and mixed phases is given by h = lnA − γ, between mixed and mutator phases is given by h = (μ − 1)γ line.
The phase structure may have biomedical implications. In ref. 9, it was proposed that the transition from the mixed phase to the non-selective phase is associated with the possible eradication of the tumor. We provide the solution for two types of fitness landscapes; a symmetric case, where the fitness landscape depends on the total number of mutations from the reference sequences, and a random fitness landscape.Key technical terms and notations are listed in Tables 1 and 2, respectively.
Table 1
Terms.
Lethal mutations
A mutation that results in the premature death of the organism carrying it.
Selective phase
A state of evolutionary system of binary sequences, when the majority of the population is concentrated near the peak genome (with the maximum fitness value) in the sequence space.
Non-selective phase
A state of evolutionary system of binary sequences, when the population is diluted in the sequence space.
Solvable cases of fitness landscape
Random fitness landscape and symmetric fitness landscape.
Random fitness landscape
The values of fitness defined on a sequence space are random numbers with some distribution.
Symmetric fitness landscape
The values of fitness defined on a sequence space by a function of the total mutation number.
Mutator phase
The absolute majority in the population has a mutator-type (mutator allele of the mutator gene) after long enough time of evolution.
Mixed phase
There is a finite fraction q of mutator sequences in the population after longtime evolution (0 < q < 1).
Table 2
Notations.
L
The genome length.
l
The number of mutations.
ξθ, θ = 0,1, …, L
ξ0 = ±1 — the state of the mutator gene, ξθ = ±1 (1 ≤ θ ≤ L) — the states of genes in the regular part of the genome.
Si, i = 1, …, 2L
The genome sequence.
rl
The fitness value after l mutations in the Crow-Kimura model.
Pl, Ql
The probability of the wild-types and mutator-types with l mutations.
xl = 1 − 2l/L
The average gene state in the genome sequence that corresponds to l mutations from the reference sequence.
f(xl), g(xl)
The fitness functions of the wild-type and mutator-type genome sequences which assigns the reproductive rate to the average gene state xl
μ1, μ2
The mutation rate for the wild-types and mutator-types.
μ
The substitution used to denote briefly: μ2 = μ for the choice μ = 1.
α1, α2
The transition rates from the wild-type to mutator-type and vise versa.
α
The substitution used to denote briefly: α = α1 for the choice α2 = 0.
s
The surplus of the whole population: the average state of the gene.
s1, s2
The surpluses for the wild-types and mutator types.
The average number of mutations, which is equal to (1 − s)L/2.
q
The total probability of the mutator-types.
R
The mean fitness.
u(x, t)
The probability of l mutations is , Eq. (5).
v1(x, t), v2(x, t)
The relative probabilities of the wild and mutator types after l mutations, x = 1 — 2l/L, Eq. (5).
V±(x)
The potential of evolutionary dynamics, Eq. (26).
k
The coefficient in the linear fitness, f(x) = kx.
J
The peak fitness in the single peak fitness landscape for the Crow-Kimura model.
U
The total number of mutations for one generation in population genetics.
h
The transition probability wild-type to mutator-type for one generation in population genetics.
ε
The time period of one generation.
pi, qi
The probabilities of the i-th sequences for the wild type and mutator type in the Eigen model.
The transition probabilities between the sequences in wild-type and mutator-type in the Eigen model.
The probabilities of errorless replication per nucleotide for the wild-type and mutator-type.
The probabilities of errorless replication for the genome the wild-type and mutator-type.
d(i, j)
The Hamming distance between the i-th and j-the genomes
A
The peak fitness in single peak fitness landscape for the Eigen model.
τ
The generation period of the virus.
yi
The relative length of the i-th piece of genome in multi-dimensional model.
Results
The model with a symmetric fitness landscape
In this section, we consider a modification of the Crow-Kimura model182231 for evolutionary processes under the influence of a mutator gene. The chain of (L + 1) genes defines the genome, which can be formally subdivided into two parts; a regular part with L genes and a mutator gene, whose mutations change the fitness and the mutation rate in the regular part of the genome. We represent each gene with two alleles by ξ = ±1, θ = 0, 1, …, L. The regular set of genes is denoted by the sequences S = (ξ1, …, ξ), i = 1, …, 2. The mutator gene in the state ξ0 = +1 determines the normal dynamics of the regular part with a mutation rate μ1. This wild-type form of the gene corresponds to the most common phenotype in the population. If the state of the mutator gene changes to ξ0 = −1, then the evolutionary process switches to a different regime with a mutation rate μ2. The mutant-type gene defines an altered phenotype, which is called a mutator phenotype. We use α1 and α2 to designate forward (ξ0: +1 → −1) and backward (ξ0: −1 → +1) mutation rates of the mutator gene.We assume that the fitness landscape is symmetric, which means it depends solely on the number of mutations, l, from the reference sequence (without loss of generality, we take the reference sequence to be S = (+1, …, +1)). For this purpose, it is natural to use the notion of Hamming distance between S and because all sequences in the same Hamming class have the same value of fitness. To facilitate further analysis, we introduce the variable for the mean value, . The state of the system is represented by two probability distributions. We use to denote the relative frequency of normal genome sequences (wild-types) with l mutations (in the l—th Hamming class) at the time moment, t, and for the relative frequency of sequences with an increased mutation rate (mutator-types) with l mutations, and we have a probability balance condition for any time moment, . All admissible transitions between system states can be seen as arrows in Fig. 2 4546.
Figure 2
The scheme of available transitions for the system states (arrows denote transitions).
The upper chain corresponds to the genome without a mutator allele (wild-type); the lower chain corresponds to the genome with a mutator allele (mutator-type). l is the number of mutations in the regular part of genome.
To describe the evolution of the probability distribution under consideration, we use the following system of 2(L + 1) differential equations:Here, f(x) is a fitness function for the normal genome sequence, and g(x) is a fitness function for the mutator sequence. As long as changes in the mutator gene significantly affect both the fitness and mutation rate in the regular part, fitness functions f(x) and g(x) can be different, and μ2 is commonly 10–100 times larger than μ1210. Note that the rates α1 and μ1 also differ because that α1 is proportional to the number of possible mutagenic loci. The coefficients (L − l + 1) and (l + 1) appear in the Eq. (1) transition terms according to the combinatorial formulae for Hamming class probabilities4147. To investigate Eq.(1), we consider a nonlinear transformation48:It is easy to show that P and Q are the solutions of the following system of linear equations:In Eqs (1) and (2), and are the probabilities under the normalization constraint, while P and Q of Eq. (3) are not normalized.In this paper, we focus on the following characteristics of the model in a steady state: the mean fitness R, the total surplus s (the expected value of x), the surplus for the wild-type part of the population s1 and for the mutator-type part s2, and the fraction of the mutator sub-population q:The surplus s shows the average number of mutations in the population, The maximum of the distributions P and Q is attained at points L(1 − s1)/2 and L(1 − s2)/244. Therefore, the value of the mean fitness for wild-type sequences is equal to f(s1), and that for mutator-type sequences is equal to g(s2), as the distribution is narrow due to the large constant L in the exponent in Eq. (5) below. Assuming a smooth distribution, we obtain the following from Eqs (1) and (2):Calculating the mean fitness in growing populations is crucial for understanding evolutionary dynamics and is the primary concern of this investigation. Depending on the values of the main model parameters, we can obtain different analytical expressions for the mean fitness, which correspond to different phases of the system. The key problem is to first solve the large L limit, and clarify the following items:A. Is it possible to have selective and non-selective phases in the system with a phase transition between them?B. Does the average number of mutations depend on the state of the mutator gene or is it the same in both cases (s1 = s2)?C. Is it possible to have a significant fraction of normal genomes in the population: 1 − q > 0?D. Are there large finite L corrections for q at L ~ 10000?These four questions are discussed in this paper; we show that both negative and positive answers exist for each. The second question (B) leads to the most considerable distinction in the analysis. In the Results section, we first investigate the case s1 = s2; then, in the Methods section, we describe a more complex situation with s1 ≠ s2. Moreover, contrary to the classical evolution models for quasispecies, here, the evolutionary picture depends on the genome length.
The case of forward and backward mutations of the mutator gene
Let us consider a smooth solution of Eq. (3) at the limit L → ∞ after the simplification mentioned in the previous section. We assume the following ansatz424449:where we denoted x = 1 — 2l/L for large values of L. This form of expressions for P and Q allows the application of the HJE method to the system of equations (5) (see Methods). Putting our ansatz (5) into Eq. (3), we get a system of two equations for the variables v1, v2, du/dx and du/dt. Looking at these equations as a system of two linear algebraic equations for v1 and v2, we get a zero determinant condition for the non-zero solution of v1 and v2, so the final equation contains only du/dt and du/dx. The fact that L disappears in the Hamilton-Jacobi equation in the large L limit shows the correctness of our ansatz. We derive the mean fitness investigating the Hamiltonian without solving it and calculating the function u(x) in the steady state.We obtain the following formulae for the mean fitness R using the potential function V±44:Here, we use μ1 = 1, μ2 = μ, and . The numerical simulation for the system (1) supports the analytical result (6) well, as seen from Table 3.
Table 3
The comparison of the results for f(x) = g(x) = 3x2/2, μ1 = 1, μ2 = μ, α1 = α2 = 1, L = 400.
μ
3.5
3.0
2.5
2.0
1.5
1.0
0.5
Rnum
0.1907
0.2514
0.32405
0.4127
0.5240
0.6684
0.8626
Rth
0.1811
0.2436
0.3180
0.4084
0.5212
0.6666
0.8615
R is the numerical result and R is given by Eq. (6).
Having the expression for the mean fitness R, we can calculate the surplus of distribution s from the following equation (see Methods):At the time limit t → ∞, we denote v(x) ≡ v(x, t), k = 1,2.It is important to note that Eq. (7) is valid whether s1 = s2 or not. However, for α1 · α2 ≠ 0, we have that s1 = s2. In this case, substituting the solution of R and s, we get the following system of equations for the surplus s and ratio v2(s)/v1(s) in the steady state (see Methods) :In the limit of large t and L, we have q = v2(s)/(v1(s) + v2(s)). For the case f(x) ≡ g(x), we get a simple relationship from (8):
The model with uni-directional mutations of the mutator gene
The probability of back mutation (from mutant to wild-type) is small and can be neglected in finite populations. Let us generalize this idea, and consider the infinite population model with α2 = 0, α1 = a (μ1 = 1, μ2 = μ). In this case, the expressions obtained for potential functions V±(x) are still valid, but it is necessary to take into account both branches of the potential, V+ and V−, and define the mean fitness as their maxima.Taking V−(x) in Eq. (6), we get the mutator phase (here and below, we use terminology from17):The latter equation for the surplus has the same form as for the ordinary Crow-Kimura model41. It can be derived directly from system (1) if we assume that the vast majority of the population has the mutator allele and omit the contribution of P in the second equation of Eq. (1). In the mutator phase, the mean fitness is defined by the fitness landscape for the mutator genome. Further, we will consider the case when two fitness landscapes are identical, e.g., f(x) = g(x).Taking V+(x) in Eq. (6) under the condition α2 = 0, α1 = a, μ1 = 1, μ2 = μ leads to the following expression for the mean fitness of the mixed phase (see Methods):In the mixed phase, the mean fitness is defined by the wild-type fitness function and the mutation rate. However, numerical calculations for the linear fitness landscapes f(x) = kx or simple form quadratic fitness landscapes f(x) = kx2/2 show that the vast majority of the population has the mutator allele for a large enough genome length. It is worth pointing out that this result (the majority of population has a mutator-type) is correct for any smooth fitness landscapes for a long enough genome length. We verify that for the single peak fitness function, there is a finite fraction of wild-type sub-population at the large genome limit. Such results will be presented below.Two phases have been found for the discrete-time Eigen model in ref. 17, where the mean fitness has been calculated for f(x) = k(x − 1), k ≫ 1. This model can be exactly mapped into the Crow-Kimura model at the large L limit, when the mutation rate per gene is fixed. Figure 3 gives a comparison of our analytical results for the mean fitness R with the numerics. Figure 3 shows that our analytical results are quite reliable.
Figure 3
The mean fitness R versus k of the model with linear fitness landscape f(x) = kx, μ1 = 1, μ2 = 10, α2 = 0, α1 = 0.02.
There are two phases in the model for the general values of parameters: mixed phase with and mutator phase with . The border between two phases is given by equation . In our case k ≈ 0.212.
The single peak model with uni-directional mutations of the mutator gene
Let us consider a single peak fitness function, f(x = 1) = g(x = 1) = J and f(x ≠ 1) = g(x ≠ 1) = 0. We have three possible phases in a steady state (Fig. 1). We consider either the steady state of Eq. (1), or the asymptotic solution of Eq. (3), replacing 25. We provide the following expressions for the mean fitness in the mutator phase R, mixed phase R and non-selective phase R; see Fig. 1:We first calculate P0 considering the equation for dP0/dt in Eq. (1) and ignore P1 term. The fraction of the wild-type sequences in the l-th Hamming class in the population is P = P0(1/J), which can be derived from Eq. (55) of ref. 50; therefore, for the fraction of the population with a normal mutator gene, we have . Considering analogously the equation for dQ0/dt and ignoring the Q1 term, we obtain . Conversely, the definition of R gives P0 + Q0 = R/J ≡ (J − a − 1)/J. Thus, we derive for P0 the following equation:This expression allows us to calculate the mutator allele probability q using the equivalence :At L = 5000, the accuracy of our analytical result is approximately 0.1%, as seen in Fig. 4. Figure 4 and Eq. (14) show that the fraction of the wild allele does not approach to 0 even for a large L.
Figure 4
The dependence of the mutator probability q on the genome length L.
The the single peak fitness model (smooth line) with J = 1.05, μ1 = 1, μ2 = 10, a = α1 = 0.001. For the L = 5000 the single peak model’s numerical result coincides with the analytical result for L = ∞ with the relative accuracy about 0.1%.
The model with uni-directional mutations of the mutator gene and a smooth fitness landscape
As previously mentioned, we have the mixed and mutator phases. In the mutator phase, the numerics give that the mutator sub-population dominates with q = 1, even for finite L, so we can ignore the first chain of transitions and reduce the system to the standard Crow-Kimura model with mutation rate μ2 and fitness function g(x).Let us now consider the mixed phase. If α1 = a, α2 → 0, then Eq. (8) gives q → 1 for this case. Assuming an ansatz , we obtain:The smooth curves in Figs 5 and 6 obtained by numerical calculations illustrate that the 1 − q strongly depends on the genome length, L. For the small genome length L, Fig. 7 obtained by numerical calculations support the behavior
Figure 5
The dependence of the mutator probability q on the genome length L for the linear fitness model: f(x) = kx, k = 1, μ1 = 1, μ2 = 10, α2 = 0, a = α1 = 0.3.
The smooth line corresponds to the numerics, the solid dots correspond to our analytical result.
Figure 6
The dependence of the mutator probability q on the genome length L for 10 < L < 10000.
We consider the linear fitness model: f(x) = kx, k = 1, μ1 = 1, μ2 = 10, α2 = 0, a = α1 = 0.05.
Figure 7
The dependence of the mutator probability q on the a = α1 for the linear fitness model: f(x) = kx, k = 1, μ1 = 1, μ2 = 10, α2 = 0, L = 1000.
As has been derived in ref. 29, for the large L, we get 1 − q ≪ 1. In the Methods section, we calculate (1 − q) for the system with a general smooth symmetric fitness function and parameters a, μ. For linear fitness functions f(x) = kx and small values of a and μ1/μ2 ≡ 1/μ, we obtain a simple expression:This result is well supported by the numerics, as seen in Fig. 5 and Table 4 for 1 − q ≪ 1, when the exponent is a large negative number. The transition between two regimes (16) and (17) is for the values of L where the expression of the exponent in Eq. (17) is ~1.
Table 4
The results for f(x) = g(x) = kx, μ1 = 1, μ2 = μ, α1 = a, α2 = 0.
L
1000
1000
1000
1000
k
0.3
0.3
0.3
1
a
0.0001
0.001
0.01
0.3
Rn
0.0439
0.0430
0.0340
0.1142
R
0.0439
0.0430
0.0340
0.1142
1 − q
0.9945
0.9460
0.530
6/107
K
0.999996
0.9994
0.930
1/105
. R is the numerical result.
In population genetics, the fitness difference for one mutation is denoted by S and corresponds to the linear coefficient k; U is the total number of mutations per generation. We use the equalities U = εμ1, S = ε2k/L and h = εα1, where ε is the discrete time step of one generation and h is the transition probability to the mutator allele during one generation. Applying the discrete-time Eigen model to Crow-Kimura model mapping, we get a condition for the exponent:Otherwise, the numerics give q ≪ 1 instead of Eq. (17).The linear fitness case can be solved exactly. We introduce the generation functions Q(z) = ∑Qz and P(z) = ∑Pz, so rewriting Eq. (1) with these variables gives a system of equations for the generation functions:It is easy to calculate R from the first equation with the constraint that P(z) is an L-th order polynomial. Another constraint is the probability balance condition, P(1) + Q(1) = 1. Solving this system, we can calculate P, Q with relative accuracy . The generation function method has been applied before for investigation of the Crow-Kimura model51.
The Eigen model with a mutator gene and random fitness landscape
The probability distribution for the Eigen model is defined for genome sequences with the wild and mutator type (p and q, respectively, 0 ≤ i < 2). Consider now the following system of equationswhere r is the fitness function, Q and are probabilities of mutation from S to S, Q = w(1 − w), w is the probability of errorless replication per nucleotide for sequences with normal mutator gene and 1 − e− ≈ h is the transition probability from the wild type to the mutator type. The diagonal terms of the mutation matrix are Q = w ≡ Q ≡ e−, where γ = −Lln(w) ≈ L(1 − w) is the mutation parameter in the Eigen model, , is the probability of errorless replication per nucleotide for sequences with a mutator allele. The mutation parameter for the mutator type is . Here, d(i, j) is the Hamming distance between two sequences, S and S.For the single peak fitness function with r0 = A and r = 1, i ≥ 1 the mean fitness is similar to the one obtained for the Crow-Kimura model: QAe− for the mixed phase, for the mutator phase and 1 for the non-selective phase. The random fitness landscape is a reasonable approximation for real biological data52. However, it was shown that it is almost equivalent to the single peak fitness landscape33. If we assume a log-normal distribution of Wrightian fitnesses, r:then, we have the maximal value of fitness 33, while for the majority of the population, fitness is equal to 1. Using the formulae for the single peak fitness case from the Methods section, we obtain:where we dropped the small term, h/μ. The steady state distribution is the same for the continuous-time Eigen model (20) and for the discrete-time Eigen model. For small values of h ≪ 1, U ≡ γ = L(1 − q) ≪ 1 and (A − 1) ≡ S ≪ 1, we get:Equation (23) coincides with the Crow-Kimura model result in Eq. (14) after mapping τJ = S, U = τ, h = aτ, where τ is the growing time scale of the replicator, and we identified the Crow-Kimura model dynamics after the time τ with the dynamics of the discrete time Eigen model33.It is interesting that there is no μ dependence in our expression. Taking the case of strong selection, s ≫ U, we obtain the situation when q ≪ h/U, as in the case of refs 53 and 54. According to ref. 53 (Table 1), we have U = 0.003, q = 0.00001 and h/U = 0.0001, while for ref. 54 (see abstract), we have q = 3/105, h = 5/106. The case of large q = 1/408 = 0.245%55, p. 109, perhaps is related to the smooth fitness landscape.
Discussion
In this work, we have investigated the mutator phenomenon in the evolutionary process in a stable environment for infinite population size. Previous studies have obtained some results for a microscopic model with uni-directional mutation (wild-type → mutator-type) and linear Malthusian fitness in a 1-dimensional landscape. The mutator and mixed phases have been considered theoretically in the literature, so we continued to focus on the phase structure in this research. We have calculated exactly, for the first time, the phase structure of the model for the general fitness function, the analytical expression for the mean fitness for any scheme of mutations between normal and mutator states in the d-dimensional fitness landscape (see Methods). Moreover, we have discussed the important case of the random fitness landscape. We have derived the formula for the mutator-type probability, q, for the large genome length. This mathematical problem is highly non-trivial, as it is non-perturbative via the value of the back mutation rate: see Eqs (8) and (17). Equation (17) shows that the mutator-type takes over the entire population for the model with a linear fitness function. This effect has been observed experimentally2756.In ref. 17, an expression for the mutator allele probability has been derived when the mutator-type is the minority and the fitness function is linear. Our results confirm these findings17; furthermore, our method provides an analytical solution for the model in this case. When condition (18) is broken, the wild-type takes over the population, which has also been proven experimentally54. The value of q depends on the genome length (q ≪ 1 or 1 − q ≪ 1; see Fig. 5), and for the surplus, both alternatives hold (s1 = s2 or s1 ≠ s2). Thus, the finite size corrections could be large, even for the large genome length L ~ 104.A careful investigation of the evolutionary models is essential for biological applications. Looking at possible phases with order parameters is needed to provide an appropriate level of accuracy. As long as cancer is assumed to be a collective phenomenon, even with some collective intellect57, it is necessary to use analytical methods for complex systems. Let us compare our results with those by Desai and Fisher29 who solved a phenomenological model, while we rigorously solved the microscopic model. For “small” genome sizes, the mutator fraction is proportional to the small parameter α, as has been found in ref. 29. What we claim is q → 1 for sufficiently large genome length for any smooth fitness function. The latter case is a non-perturbative phenomenon, and it is not represented in the approach of29, as they neglected the backward mutation between different classes of wild-types. The approach by Desai and Fisher29 cannot be applied for the steady state of our model, as it is completely wrong for that case; however, Desai and Fisher obtained29 a useful (especially for the cancer case) analytical estimate for the short-time dynamics for the involved case of general fitness function.Theoretical results concerning the error threshold58 have a practical impact for viruses and cancer, and it has been recently suggested to use an error catastrophe as a new therapeutic strategy for the solid cancer treatment10. A phase structure shown in Fig. 1 qualitatively describes the different stages of cancer; the mixed phase s ≠ 0 characterizes the early non-aggressive version of tumor, and the mutator phase is an aggressive stage of cancer, possibly with metastasis. To eradicate the tumor according to the strategy suggested in ref. 9, it should be transferred to the non-selective phase s = 0. As seen in Fig. 1, there are four different situations. In the mixed II subphase, we can push the tumor to the non-selective phase (the eventual goal of error-catastrophe therapy) to increase the forward mutation rate for the mutator gene α1, which is responsible for the genome stability. In the mutator I subphase, we should increase the mutation rate μ to push the tumor to the non-selective phase. In the mutator II and mixed I subphases, we need to increase both versions of mutation rates. The phase structure has the same form for the random fitness case and a similar form for other fitness landscapes.Further research should focus on the finite population version of the mutator model, as the population size can drastically affect the evolutionary dynamics8. The problem has already attracted attention19, but it is reasonable first to solve a simpler case of infinite population limit.To summarize, let us directly reply to the four questions stated in the beginning of the Results section.A. There is a phase transition between two phases, selective and non-selective, for the random fitness landscape and for the last two types of fitness landscapes considered in ref. 6 (the fitness landscape is flat until some critical number of mutations, then jumps and later is flat again; the fitness decreases linearly until some number of mutations and then jumps). For the linear fitness landscape, there is no phase transition.B. For small enough genome length s1 ≠ s2, see the Methods section for a detailed explanation.C. For a small genome length, it is possible to have some non-zero fraction of wild-types (see the Methods section).D. There are large finite L corrections for q; see Figs 5 and 6.
Methods
To perform the analytical investigation Eq. (3), we use the HJE approach. Ignoring O(1/L) correction terms and using Eq. (5) and the formulae P = v1e and Q = v2e in Eq. (3), we get:Here, we denoted and . We consider Eq. (24) as a homogeneous system of equations for v1 and v2, so its determinant is equal to 0 for the non-zero solution of v1 and v2. First, we find from the latter condition and write an equation, , where H± is a Hamiltonian. Then, we derive the potential function asFor the general case we have the following Hamiltonian:Deriving V±(x) according to Eqs (25) and (26), we get Eq. (6) to calculate the mean fitness R.At the maximum point x = s of the distribution, we take in Eq. (26) to obtain Eq. (7) for the surplus s. Putting to Eq. (24) the expression for the R, and looking the point x = s, we derive Eq. (8).If f(x) = g(x), α1 = a, α2 = 0 (μ1 = 1, μ2 = μ), then, taking into account the ansatz P = e, we derive an equation:and the second equation for the mixed phase in Eq. (11): f(s) = R.
The mutator-type probability
Consider the general case s1 ≠ s2 and parameters α1 = a, α2 = 0, μ1 = 1, μ2 = μ and f(x) = g(x). For Q ~ P, we consider the second equation in (1) with the substitution P = e in a steady state. We derive an equation for Q:We can write an approximate solution:If P ≪ Q, then, substituting , we get the following equation for :The expression for s2 can be derived from R = f(s2). We took the solution (27) at the interval [s3, 1] (where s3 is a switching point between s1 and s2) and the solution (30) at the interval [s1, s3]. We use the concatenation of these two solutions, assuming the smoothness of the derivative , thus we obtain the following equation for s3:Putting a condition u(s1) = 0 (which is equivalent to ∑P = 1) and , l0 = (1 + s2)L/2, we get:Here, u′ and are calculated using Eqs (27) and (30), respectively. The expression in the exponent in Eq. (32) is exact at the limit L → ∞.These analytical results allow us to predict, which allele takes over the population. The analysis of Table 4 reveals, that if , then the mutator-type dominates. If 1 − K ≪ 1, then the wild-type takes over the population.
The case of small mutation rate for mutator allele a
Let us suppose that forward mutation rate a is sufficiently small. This assumption leads to the following equations:From (31), we obtainWe are looking the case of large μ, therefore, we can replace s3 by s1:ThusFor the linear fitness f(x) = kx, we get Eq. (17):This solution is correct when the following condition is met:
The mean fitness for the model with d-dimensional fitness landscape and mutator gene
The multidimensional model has been discussed in ref. 19, where two parts of genome were considered; the first part is with lethal mutations and the second part is with deleterious and advantageous mutations. The genome is formally fractured into d parts with the lengths Ly. For the wild-type, the fitness is equal to ∑f(x), and for the mutator-type, the fitness is ∑g(x), where x = 1−2l/(Ly) and l is the number of “−” spins (alleles) in the i-th part of genome. We denote mutation rates as μ for the wild-type and ν for the mutator-type, see ref. 49 for the multidimensional fitness model without a mutator gene. For these modified expressions, we calculate the mean fitness R as a maximum of the potential function V±, defined as:Here α1, α2 have the same meaning as in Eq. (1). The model considered in ref. 19 corresponds to the values: d = 2, f1(1) = g1(1) = 0, f1(x) = g1(x) = −∞ for −1 ≤ x < 1 and f2(x) = g2(x) = k2(1 − x), for the case of finite population.
The Eigen setting of the mutator model
Consider the first equation in the system of Eq. (20) as an equation for the eigenvalue R. Compared with the ordinary Eigen model, the matrix of the linear system is multiplied by e−. Thus, we can directly use the results from58 and write the expression for the mean fitness of the mixed phase for the fitness function r = f(x):For the single peak fitness, we obtain for the selective phase58:For the non-selective phase, we simply have R = 1.In the mutator phase, we ignore the first chain of equations in Eq. (20) and get R as the mean fitness of the Eigen model with the value:Consider now the distribution of population in the single peak fitness case. Ignoring the back mutations for the equation for q0 (because they give O(1/L) terms), we obtain:As all the sequences, besides p0, q0, have a fitness value 1, we have an equation (q0 + p0)A + (1 − q0 − p0) = R. ThereforeWe derive eventually:In the ordinary Eigen modelThis equation is valid in our case as all the equations in the first chain of Eq. (20) are derived from the ordinary model58 by multiplying on e−, moreover, we have that Re = QA. These two equations give together Eq. (22) of the text.For the discrete-time Eigen model with a time step n, we have the probabilities p(n), q(n) and the following iteration equations:
Additional Information
How to cite this article: Saakian, D. B. et al. The rich phase structure of a mutator model. Sci. Rep. 6, 34840; doi: 10.1038/srep34840 (2016).