Literature DB >> 21849390

Computability, Gödel's incompleteness theorem, and an inherent limit on the predictability of evolution.

Abstract

The process of evolutionary diversification unfolds in a vast genotypic space of potential outcomes. During the past century, there have been remarkable advances in the development of theory for this diversification, and the theory's success rests, in part, on the scope of its applicability. A great deal of this theory focuses on a relatively small subset of the space of potential genotypes, chosen largely based on historical or contemporary patterns, and then predicts the evolutionary dynamics within this pre-defined set. To what extent can such an approach be pushed to a broader perspective that accounts for the potential open-endedness of evolutionary diversification? There have been a number of significant theoretical developments along these lines but the question of how far such theory can be pushed has not been addressed. Here a theorem is proven demonstrating that, because of the digital nature of inheritance, there are inherent limits on the kinds of questions that can be answered using such an approach. In particular, even in extremely simple evolutionary systems, a complete theory accounting for the potential open-endedness of evolution is unattainable unless evolution is progressive. The theorem is closely related to Gödel's incompleteness theorem, and to the halting problem from computability theory.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2011 PMID： 21849390 PMCID： PMC3284140 DOI： 10.1098/rsif.2011.0479

Source DB: PubMed Journal: J R Soc Interface ISSN： 1742-5662 Impact factor: 4.118

Introduction

Much of evolutionary theory is, in an important sense, fundamentally historical. The process of evolutionary diversification unfolds in a vast genotypic space of potential outcomes, and explores some parts of this space and not others. Nevertheless, a great deal of current theory restricts attention to a relatively small subset of this space, chosen largely based on historical or contemporary patterns, and then predicts evolutionary dynamics. Although this can work well for making short-term predictions, ultimately it must fail once evolution gives rise to genuinely novel genotypes lying outside this predefined set [1]. This potential limitation on the predictive ability of many models of evolution has been noted on various occasions throughout the development of evolutionary theory [1-4], perhaps most famously by the Dutch biologist Hugo DeVries when he remarked that ‘Natural selection may explain the survival of the fittest, but it cannot explain the arrival of the fittest’ [5]. Such statements hint at the notion that many models of evolution are what we might call ‘local’, or ‘closed’, in the sense that they focus attention on a very small (local) region of the evolutionary tree and do not account for the possibility that evolution is an open-ended process. The distinction between ‘closed’ and ‘open-ended’ models of evolution will be discussed in more detail below, but in recent years there have been several interesting studies published that are beginning to push the boundaries of analyses towards what we might naturally call open-ended models. These studies include models of abstract replicator populations [3,6-9], models exploring the space of evolutionary possibilities [10-12], analyses of evolutionary transitions [13,14], models for predicting the distribution of allelic effects during evolution [15-18] and studies of evolvability [4]. Similarly, there have also been many in silico and artificial life experiments that explore generic, emergent, properties of evolution [3,19-27]. In general, these analyses have demonstrated that, once we allow for more open-ended evolution, a much richer suite of evolutionary possibilities arises. The above studies collectively suggest that accounting for open-ended evolution in theory can yield interesting new insights, and it can also yield new testable predictions [15-18]. Nevertheless, there is still a relative paucity of theoretical studies that allow for open-ended evolution, and so we might expect that much is yet to be learned by broadening evolutionary theory further in this way. My purpose with this article is therefore twofold. First, I simply wish to highlight the fact that there is an important distinction to be made between open-ended and closed models of evolution (defined more precisely below), and to suggest that open-ended models might more faithfully represent the evolutionary process. Second, and more significantly, I wish to consider whether a push towards a predictive theory that embraces the potential open-endedness of evolution is likely to face additional obstacles, over and above those faced by closed models of evolution. Put another way, I ask the question: to what extent is the development of a predictive, open-ended evolutionary theory possible? Although a complete answer to the above question is not possible, in what follows I will provide at least a partial answer. Furthermore, I demonstrate that this answer has interesting connections to the halting problem from computability theory and to Gödel's incompleteness theorem from mathematical logic. In particular, I will use results from these areas to prove a theorem that formally links the concept of progressive evolution to the possibility of developing such a predictive open-ended theory. There remains debate over if, and when, evolution might be progressive [28-30] and part of this debate stems from the lack of a precise yet general definition of progression. Thus, another way to view the results presented here is as providing such a definition. I will return to this point more fully in §4.

A motivating example

To sharpen the focus on these somewhat abstract ideas, it is worth beginning with a concrete motivating example involving evolutionary prediction. This section does so, focusing primarily on the broad conceptual issues involved. The section that follows then addresses these issues more precisely. Consider trying to use evolutionary theory to predict the dynamics of human influenza. Specifically, consider trying to answer the following question: is it likely that a pandemic with the 1918 Spanish influenza strain will ever occur again? This is obviously a difficult, and still somewhat loosely defined, question so let us narrow things down further. One reason we might be sceptical about our ability to make such predictions is because of uncertainty in initial conditions and parameter values, as well as uncertainty about the evolutionary processes involved. In other words, perhaps we lack all of the information required to make such predictions. Furthermore, unexpected contingencies might thwart what would otherwise be accurate predictions. For example, an unanticipated volcanic eruption might temporarily alter commercial air travel patterns, and this might thereby alter the epidemiological and evolutionary dynamics of influenza. These practical limitations are clearly important, but are they the only obstacle in making accurate evolutionary predictions or are there other, ‘inherent’, limitations as well. Does the difficulty in making evolutionary predictions stem simply from our lack of knowledge of the evolutionary processes involved or are there reasons why, even in principle, such evolutionary predictions are not possible? It is the last question that is the focus of this article, and therefore I will, at least temporarily, put the above practical concerns aside. Specifically, let us assume that we can build a model that adequately captures all of the relevant evolutionary processes, and that we can obtain all parameter estimates necessary to use such a model. Without getting too much into the specifics, one of the first things we would need to decide is the relevant strain space for the model. The simplest scenario would consider only two strains (e.g. the 1918 strain and the current, predominant, strain). More sophisticated scenarios might instead include several strains that are thought to be important in the dynamics. In either case, both such resulting models would be ‘closed’ in the sense described in §1 because they focus only on a finite (and relatively small) number of strains. Furthermore, given that there is a discrete and finite number of people who can be infected at any given time, there is then also a finite (and relatively small) number of possible evolutionary outcomes. As will be detailed more precisely later, this then implies that the process either will reach a steady state or will display periodic behaviour (see appendix E). Hence, if a closed model is an accurate description of the evolutionary process, then, in principle, we can answer the above question by simply running the model until one of these two outcomes occurs. At that point, we need only observe whether or not a 1918 Spanish flu pandemic ever occurred during the run of the model (or if it occurred with significant probability). But what if the evolutionary process is, instead, open-ended? To explore this possibility we need to be more specific about what is meant by open-ended. Consider again the example of influenza. Influenza A has a genome size of more that 12 000 nucleotides, and therefore the number of possible genotypes is enormous. To gain some perspective on just how many genotypes are possible, let us restrict attention to only the smallest of the eight genomic segments of influenza. In this case, there are then only approximately 800 nucleotides and therefore approximately 4800 different possible genotypes. To put this number in perspective, it is approximately 10400 times larger than the estimated number of atoms in the universe. For a model to be open-ended, it would have to allow for such a vast set of possible evolutionary outcomes so that, as in reality, evolutionary change could continue unabated, producing potentially novel outcomes essentially indefinitely. The simplest way we might try to capture this theoretically is to assume that the space of possible genotypes is infinite. Given these considerations, if evolutionary theory is to capture an open-ended evolutionary process, then its state space must be effectively infinite. This is necessary but it is not a sufficient condition for open-ended evolution. For example, many stochastic Markovian models in population genetics have an infinite state space (e.g. the infinite alleles model [31]) but, nevertheless, do not display open-ended evolution. Rather, further assumptions are often made, such as the assumption that the Markov chain is irreducible and positively recurrent. These assumptions are usually made primarily for mathematical convenience but they rule out the possibility of open-ended evolution as they then guarantee the existence of a single unique equilibrium or stationary distribution. As a result, such models cannot capture the possibility that evolutionary change might continue indefinitely. What if we relax these assumptions and allow for truly open-ended evolution in the theory that we develop? Are there then even further problems associated with making evolutionary predictions? For example, does this make answering the question about influenza evolution laid out at the start of this section more difficult? You might suspect that the answer is ‘yes’; at least, the approach suggested above for closed models will no longer suffice because the evolutionary process is no longer guaranteed to settle down to an equilibrium or stationary distribution. Thus, the best we can possibly hope for is that there is some way to prove, using the structure of the model, whether or not such an outcome will occur. Thus, all practical difficulties of predicting evolution aside, it is not obvious whether we can answer the above sort of question about influenza evolution, even in principle. These issues are now starting to tread heavily into the fields of computability and mathematical logic and, roughly speaking, a theory that can answer the above kind of question about influenza evolution is referred to as a negation-complete theory. This terminology reflects the idea that the theory is complete in the sense of one being able to determine whether a given statement is true, or whether its formal negation is true instead. For example, in the context of influenza, a negation-complete theory would be able to predict whether the statement ‘the Spanish flu will happen again’ is true or whether its formal negation ‘it is not true that the Spanish flu will happen again’ is true instead. More generally, a negation-complete evolutionary theory would be one from which we could determine those parts of genotypic space that will be explored by evolution and those that will not. Is such a negation-complete theory possible once we allow for open-ended evolution? In the remainder of this article, I show that the answer to this question is closely related to the idea of progressive evolution. In particular, even if the system of evolution was simple enough for us to understand everything about how its genetic composition changes from one generation to the next, the following is proven: A negation-complete evolutionary theory is possible if, and only if, the evolutionary process is progressive. The above statement will be made more precise shortly, but, as already alluded to above, it stems from the fact that DNA affords evolution a mechanism of digital inheritance. As Maynard Smith & Szathmáry [13] have noted the combinatorial complexity that arises thereby allows evolution to be effectively open-ended. Indeed, as will be argued below, digital inheritance allows one to characterize evolution (i.e. the change in genetic composition of a population) as a dynamical system on the natural numbers, and therefore the theorem proved below holds for any such dynamical system, not just those meant to model evolution. As a result, the theorem is closely related to other results from mathematics and computer science; namely Gödel's incompleteness theorem [32-35] and to the halting problem from computability theory [36,37].

Statement and proof of theorem

In order to give precision to the above statement, we must specify what is meant by ‘the evolutionary process’, as well as what it means for evolutionary theory to be negation-complete. The goal is to determine whether, even in extremely simple evolutionary processes, there is some inherent limitation on evolutionary theory. To this end, consider a simplified evolutionary process in which there is a well-mixed population of replicators with some maximal population size, and in which each replicator contains a single piece of DNA. This genetic code can mutate in both composition and length, with no pre-imposed bounds. Suppose that each replicator survives and reproduces in a way that depends only on the current genetic composition of the population. For additional simplicity, suppose that generations are discrete. All conclusions hold if events occur in continuous time instead (appendix E). Finally, for simplicity of exposition, I will usually assume that the evolutionary dynamics are deterministic in the main text. Again, all results generalize to the case of stochastic evolutionary dynamics, albeit with a few additional assumptions (appendix E). With the above evolutionary dynamics, the genetic composition of the system will evolve over time, and we can characterize the state of the system at any time by the number of each type of replicator (e.g. the number of infections with each possible genotype of influenza). The goal then is to determine if it is possible to construct an evolutionary theory that can predict which parts of the space of potential evolutionary outcomes will be explored during evolutionary diversification, and which will not. Formally, the results presented below are valid for any theory whose derived statements are recursively enumerable. Axiomatic theories are one such example but (roughly speaking) any theoretical approach that can, in principle, be implemented by a computer falls into this category (appendix A). Indeed, the statement and proof of the theorem rely on several ideas from computability theory (appendix B). The digital nature of inheritance provided by DNA means that, in principle, the number of distinct kinds of replicators that are possible is discrete and unbounded, a property Maynard Smith & Szathmáry [13] refer to as ‘indefinite’ heredity. It is indefinite heredity that allows for open-ended evolution. As a result, in principle, the set of possible population states during evolution is isomorphic to the positive integers, i.e. there exists a one-to-one correspondence between the set of possible population states and the positive integers. Such sets are called denumerable, and in fact the set of population states is effectively denumerable in a computability sense (appendix C). Thus, we can effectively assign a unique integer-valued ‘code’ to every possible population state. In practice, of course, there are limits on the number of kinds of replicators possible, if only because of a finite pool of the required chemical building blocks. Nevertheless, as mentioned earlier, the combinatorial nature of indefinite heredity means that the actual number of possible population states is so large as to be effectively infinite. For simplicity of exposition, it is assumed in the main text that the set of possible population states is truly infinite; however, appendix F makes the notion of ‘effectively infinite’ precise and provides the analogous results for this case. With the above coding, we can formalize evolution mathematically as a mapping of the positive integers to themselves. For example, in the deterministic case, we might start with a model (e.g. a mapping F) that tells us the number of individuals of each genotype in the next time step, as a function of the current numbers. Then, under the above coding, if E(n) denotes the population state (formally, its integer code number) at time n, the model can be recast as a single-variable, integer, mapping E(n + 1) = G(E(n)) for some function G, along with some initial condition. Similarly, in the stochastic case, if we start with a probabilistic mapping F, then it can be recast as a mapping E(n + 1) = H(E(n)), where H gives the probability distribution over the set of code numbers in the next time step as a function of its current distribution (and E is then a vector of probabilities over the integers). Therefore, in general, we can view the evolutionary trajectory as being simply an integer-valued function with an integer-valued argument. Of course, different ways of coding the population states will correspond to different maps, G or H, and thus different functions E(n). Also note that the domain of G or H need not be all of the positive integers, and in fact different initial conditions might give rise to different domains as well. This would correspond to there being different basins of attraction in the evolutionary process. It is also worth noting that, although we have assumed the evolutionary mapping (i.e. G or H) is a function of the current genetic composition of the population only, we can relax this assumption and allow evolutionary change to depend on other aspects of the environment as well. In particular, we might expand our definition of ‘population state’ to include both the genetic state and the state of other variables associated with the environment in which the genes exist. Again, as long as such generalized processes can be recast as dynamical systems on the natural numbers, all of the results presented here continue to hold. The above arguments illustrate how we can view evolution as a dynamical system on the natural numbers, and they also now allow us to formalize the notion of open-ended evolution. In the deterministic setting, evolution is open-ended if the mapping G never revisits a previously visited state. Likewise, in the stochastic setting, evolution is open-ended if the mapping H always admits at least one new state in each generation with positive probability. Because we can view evolution as a dynamical system on the natural numbers, evolutionary theory can be viewed as a set of specific rules for manipulating and deducing statements about such numbers. Computability theory deals with functions that map positive integers to themselves, and thus provides a natural set of tools to analyse the problem. A function is called ‘computable’ if there exists some algorithmic procedure that can be followed to evaluate the function in a finite number of steps (appendix B). Again, focusing on the deterministic case, given the assumption that we are able to predict the state of the population from one time step to the next, the function E(n) is computable (see appendix B). Furthermore, the set of all computable functions is denumerable [37]. Therefore, denoting the kth such function by ϕ(n), it is clear that the evolutionary process, E(n), must correspond to a member of this set. Denote this specific member by ϕ(n), and again note that, if we change the integer coding used to identify specific population states, we will obtain a different function , and thus a different member of the set, (figure 1).

Figure 1.

A schematic of the coding of population states, and the theorem. The middle irregular shape represents the space of population states, S, with four states depicted (the ovals). Roman numerals indicate the time when each state is visited during evolution (the silver-shaded state, s = {T,T,T}, is never visited). Vertical ovals on the right and left represent two different codings by the positive integers, along with their respective evolutionary mappings, ϕ(n) and , over the first three time steps. If evolution is progressive, then coding 2 is possible, and the theorem then says that we can ‘decide’ any population state, s ∈ S. For example, we can decide state ‘T,T,T’ by finding its code (i.e. ‘1’), and then iterating the map, , until we obtain an output greater than ‘1’ (this occurs at time step 1 because ). If ‘1’ has not yet been visited by this time, it never will be. Conversely, if all population states are decidable, then, under coding 1, we can apply the algorithm provided in part 2 of the theorem's proof to obtain coding 2, thereby demonstrating that evolution is progressive. During evolution, a set of population states will be visited over time (in the stochastic case, we consider a state as being visited if the probability of it occurring at some point is larger than a threshold value; appendix E). These will be referred to as ‘evolutionarily attainable’ states. In terms of our formalism, this corresponds to the function ϕ(n) taking on various values of its range, R, as n increases (figure 1). A negation-complete evolutionary theory would be one that can determine whether a code, x, satisfies x ∈ R or whether it satisfies x ∉ R instead. In the language of computability theory, this corresponds to asking whether the predicate ‘x ∈ R’ is decidable (appendix B; [37]). In terms of the influenza example presented earlier, if x is the population state corresponding to a pandemic with the 1918 strain, then the statement ‘the Spanish flu will happen again’ corresponds to the number-theoretic statement x ∈ R. Likewise, the statement ‘it is not true that the Spanish flu will happen again’ corresponds to the number-theoretic statement x ∉ R. Finally, we can give a precise definition of progressive evolution. Intuitively, evolution is progressive if there is some quantifiable characteristic of the population that increases through evolutionary time. In terms of the above formalization, this means that there is a way to recode the population states such that the code number increases during evolution. Formally, evolution is progressive if there exists a computable, one-to-one, coding of the population states by positive integers, , such that the corresponding description of the evolutionary process, , satisfies for all n. Again, in terms of the influenza example presented earlier, if evolution were progressive, then there would be some way to a priori code the population states such that, as influenza evolution occurs, the code number of the population increases (I will return to this definition of progression in more detail in §4). We can now state a theorem in terms of precise, technical language: Theorem 3.1. ‘x ∈ R’ is decidable if, and only if, there exists a computable, one-to-one, coding of the population states by positive integers, such that the corresponding description of the evolutionary process, satisfies for all n. Proof (figure 1; see appendices B and D for additional details). Part 1: if there exists a coding such that for all n, then the predicate ‘x ∈ R’ is decidable. By hypothesis, there exists a computable bijection such that, for the corresponding description of the evolutionary process, for all n. For any population state, x, in the original coding, let be the corresponding code under the bijection , and define , where μi(H(i)) denotes the minimum value of i for which the argument H(i) is true (appendix B). Further, define R(n) = {x:ϕ(i) = x,i ≤ n} (i.e. the range of ϕ(n) visited by step n; appendix B). Clearly, ‘’ is decidable since is finite and can be enumerated, and, furthermore, owing to the progressive nature of evolution. Therefore, ‘’ is decidable as well. Finally, using S to denote the set of population states that are evolutionarily attainable; we have that . Noting that, by definition, , we obtain . Thus, ‘x ∈ R’ is decidable as well. Part 2: if the predicate ‘x ∈ R’ is decidable then there exists a coding such that for all n. We can construct the required computable bijection between population states and an appropriate coding as follows. First, take any effective coding of population states. By hypothesis ‘x ∈ R’ is decidable and therefore we can proceed through the population states, x, in increasing order, applying the following algorithm: if x ∉ R and it is the kth such state up to that point, use the kth odd number as its new code; if x ∈ R, calculate μi(ϕ(i) = x), and use the ith even number as its new code. Thus, is the set of even numbers, and they are visited in increasing order as evolution proceeds. In particular, using to denote the above mapping described in points (i) and (ii), where C−1 is the inverse mapping of the coding that generated x (i.e. it takes code x and returns the corresponding population state, s), we have . The last equality follows from the fact that determines the time at which state ϕ(n + 1) occurs (which is n + 1), and assigns it a new code equal to twice this value (point (ii) above). Therefore, . ▪

Discussion

This article has two main goals. The first goal is to highlight the distinction between open-ended and closed models of evolution, and to suggest that open-ended models might better capture real evolutionary processes. The second goal is to explore the extent to which the development of a predictive, open-ended theory of evolution is possible. The above theorem illustrates that there is an interesting connection between this question and analyses from computability theory and mathematical logic. It also draws a formal connection between the extent to which such a theory is possible and the notion of progressive evolution. Because the theorem states an equivalence relationship between the possibility of developing a negation-complete theory and progressive evolution, it can be read in two distinct ways. First, it states that if evolution is progressive, then a negation-complete theory is possible. This is, perhaps, not too surprising. If evolution is progressive, then there would be a good deal of regularity to the process that one ought to be able to exploit in constructing theory. The second way to read the theorem is from the perspective of the reverse implication. This is somewhat more surprising; it states that if evolution is not progressive then a negation-complete theory will not be possible. These results rest on the fact that digital inheritance allows evolution to be open-ended [13]. If, instead, the hereditary system allowed for only a finite number of discrete possible types, then evolution would either display periodic behaviour or would reach an equilibrium (possibly with stochastic fluctuations; appendix E). A negation-complete theory of evolution would then be trivially possible in such cases because, in principle, we could simply develop a finite list of all evolutionary outcomes that can occur (as described in the influenza example earlier). Of course, despite the existence of digital inheritance, there is nevertheless presumably a bound on the number of population states possible for a variety of reasons. Even so, however, the combinatorial nature of digital inheritance means that the number of possible population states might be considered effectively infinite. An analogous theorem can be proven in such cases by replacing the notion of infinite with a precise notion of effectively infinite instead (appendix F). Likewise, although the main results of the text assume that evolution is deterministic, an analogous theorem holds that accounts for the inherently stochastic nature of the evolutionary process (appendix E). The notion of progressive evolution is somewhat slippery, and there does not exist a general yet precise definition of progression that is universally agreed upon. As a result, this has led to disagreement over the extent to which progressive evolution occurs [28,29]. A complete discussion of the idea of progressive evolution is beyond the scope of this article but a few points are worth making here. Most discussions of progressive evolution involve quantities such as mean fitness, body size, complexity or other relatively conspicuous biological measurements. Many such discussions are also retrospective in the sense that they look at historical patterns when attempting to find patterns of progression. But both of these aspects of discussions of progression are problematic. First, although it would be nice to readily identify some obvious, and biologically meaningful, characteristic of a population that changes in a directional way, there is no reason to expect that we have currently thought of all the possibilities. Thus, when defining progression, it would seem desirable to do so in a very general way, leaving open the possibility that some biologically interesting, but as yet undiscovered, quantity increases over time. Second, looking towards historical patterns for definitions of progression is essentially looking at data and then designing an hypothesis to fit. Progression ought to be defined prospectively rather than retrospectively, meaning that it ought to have predictive value; if evolution is progressive, then we ought to be able to define, a priori, a quantity that will increase. The definition of progression used here was purposefully chosen to deal with the abovementioned difficulties. Thus, as it stands, it is necessarily not linked to any specific biological measurement. By the definition used here, the quantity that might increase over time need not have any obvious biological interpretation outside of the role that it plays in progressive evolution. This level of generality seems desirable if we are asking questions about the existence of such a quantity without necessarily knowing anything specific about what it might be. Such generality does mean, however, that if evolution is progressive in this sense, then the progressive trait might well be some highly complicated characteristic of the population that does not necessarily correspond to any biological attribute of an organism that is a priori natural. In this way, some readers might prefer to view the theorem presented here as a definition of progressive evolution rather than as a statement about the limitation of theory. In other words, we might define progressive evolution as an evolutionary process for which we could, in principle, construct a negation-complete evolutionary theory. The theorem then says that this definition is equivalent to there existing some quantity that increases over evolutionary time. Decidability results, such as those presented here, are often prone to misinterpretation [38]. Therefore, it is important to be clear about what the above theorem says as well as what it does not say. First, the theorem does not imply that developing a predictive theory of evolution is impossible. A very large portion of current research in evolutionary biology is directed towards developing such predictive capacity and therefore the theorem takes the existence of such a theory as a starting point. The rationale is to determine whether there might still be other, inherent, limits on the kinds of questions that can be answered even if we are successful in pushing the development of current research in this direction. The theorem demonstrates that there are such inherent limits, and, in essence, the problem arises from a difficulty in predicting the places that evolution does not go. In other words, although a predictive theory can always be used to map out the course of evolution, interestingly, it cannot always be used to map out the courses that evolution does not take. The theorem presented here, in effect, demonstrates that doing the latter is not possible unless evolution is progressive. How are these considerations to be interpreted in the context of examples like that of influenza evolution discussed earlier? First, as already mentioned in that example, the analysis would begin by taking what is essentially a best case scenario, and supposing that we have enough knowledge of the system to develop an open-ended model that perfectly predicts (possibly in a probabilistic way) the genetic composition of the influenza population in the next time step, as a function of its current composition. Then we ask, is there a significant probability that another flu pandemic with the 1918 strain will ever occur? The above theorem states that, even if we had such a perfect model, this kind of question is unanswerable unless influenza evolution is progressive. In other words, unless some characteristic of the influenza population changes directionally during evolution (e.g. some aspect of the antigenicity profile changes directionally), such a prediction will not be possible. Moreover, this limitation arises because, even though we can use our perfect model to map out the course of influenza evolution over time, this need not be enough to map out the parts of genotype space that influenza will not explore. The above limitations apply to predictions about the genetic evolution of the population, but what if we are interested only in phenotypic predictions? For example, could we predict whether or not an influenza pandemic similar in severity to that of 1918 will ever occur again, regardless of which strain(s) causes the pandemic? Likewise, could we predict whether or not resistance to antiviral medication will ever evolve, regardless of its genetic underpinnings? If the genotype–phenotype map is one-to-one, then predicting phenotypic evolution will be no different from predicting genotypic evolution. Even if many different genotypes can produce the same phenotype, however, predicting phenotypic evolution still involves predicting whether or not certain subsets of genotype space are visited during evolution. As a result, all of the aforementioned limitations should still apply to such cases. The only exception is if the genotype–phenotype map resulted in the dimension of phenotype space being finite even though the dimension of the genotype space was effectively infinite. Even in this case, however, the above limitations to prediction would still apply unless phenotypic knowledge alone was sufficient to predict the state of the population from one time step to the next (i.e. if we need not consider genetic state to understand evolution). While this might be possible for some phenotypes of interest, it seems unlikely that it would be possible for all phenotypes. One might argue, however, that some patterns of phenotypic evolution are very predictable. For example, the application of drug pressure to populations seems inevitable to lead to the evolution of resistance to the drug. How are these sorts of findings reconciled with the results presented here? First, although the evolution of resistance does appear to be somewhat predictable, we must distinguish between inductive and deductive predictions. One reason we feel confident about predicting the evolution of drug resistance is that we have seen it occur repeatedly. Therefore, by an inductive argument, we expect it to occur again. Such inductive predictions are conceptually similar to extrapolating predictions from a statistical model beyond the range of data available. On the other hand, deductive predictions are made by deducing a prediction from an underlying set of principles or mechanistic processes. In a sense, inductive predictions require no understanding of the phenomenon in question, whereas deductive predictions are based on some underlying model of how things work. The results presented here apply solely to deductive predictions. A second possibility with respect to the evolution of things like drug resistance, however, is that evolution is progressive (at least at this ‘local’ scale). For example, it might well be that if we formulated an accurate underlying model for how influenza evolution proceeds in the presence of antiviral drug pressure, there would be some population-level quantity that changes in a directional way during evolution. Indeed, it seems plausible that it is precisely this kind of directionality that makes us somewhat confident that we can predict evolution in such cases. It should be noted, however, that, even if evolution if not progressive, the theorem presented here does not rule out the possibility that some predictions can be made. For example, it is entirely possible that a theory could still be developed to make negation-complete predictions about the evolution of drug resistance. The theorem simply says that it will not be possible to make negation-complete predictions about any arbitrary aspect of evolution unless the evolutionary process is progressive. As already mentioned, all of the results presented here begin with the assumption that we can develop a theory to predict evolution from one time step to the next. Whether or not current theoretical approaches can be pushed to the point where this is true remains a separate, and open, question. There are certainly considerable obstacles to doing so unless the evolutionary system of interest is very simple [39]. In addition to the problem that historical contingencies raise, the role of uncertainty in initial conditions, much like those in weather forecasting, might preclude long-term predictions (although probabilistic statements might still be possible). This remains an important and active area of research on which the theorem presented here offers no perspective. Rather, it simply reveals that, in the event that theory is eventually developed to do so, it will still face inherent limitations on the kinds of questions it can answer unless evolution is progressive. Although a negation-complete theory for the entire evolutionary process of interest is not possible unless evolution is progressive, this also does not preclude the possibility that a perfectly acceptable negation-complete theory might be developed for short-term and/or local predictions. Indeed, just as similar inherent limitations in computability theory and mathematical logic have not prevented people from making astonishing progress in these areas of research, so too is the case for evolutionary biology. As mentioned in §1, many theoretical advances have already been made by focusing on subsets of the space of potential evolutionary outcomes. Continuing to push theoretical development in this direction by broadening the space considered will be possible regardless of the nature of the evolutionary process. The theorem does imply, however, that, unless evolution is progressive, it will not be possible to encompass all such developments within a single unified set of principles from which all negation-complete evolutionary predictions can be drawn. There are some previous theoretical results in the literature that consider the extent to which evolution exhibits a directional tendency and it is useful to consider how the present results relate to these previous works. For example, it has been shown previously with quite general stochastic models of evolution that a quantity termed ‘free fitness’ is always non-decreasing during evolutionary change [40]. The analysis, however, did not allow for open-ended evolution because the state space was assumed to be finite, and the Markov model used was (implicitly) assumed to be positively recurrent. As a result, a unique stationary distribution existed and thus continual evolution was precluded. It might be reasonably argued however that, although such analyses [40] do not allow for truly open-ended evolution, if the state space is large enough, and if the transient dynamics are long enough, then it is effectively an open-ended model. As such, should not the results with respect to free fitness still apply? In other words, does this not then suggest that there is some quantity (free fitness) that increases during evolution, and thus that a negation-complete theory is possible? The answer is no, and the reason is subtle but important. The definition of free fitness in the study of Iwasa [40], like other quantities that have been suggested to change directionally during evolution [30], is based on measures closely related to entropy. Importantly, the mapping between these measures of entropy and population states is not one-to-one because there are many (indeed, potentially infinitely many) biologically distinct population states that have the same value of entropy (or the same value of ‘free fitness’). As a result, even though measures such as free fitness might not decrease during evolution, an indefinite amount of biologically interesting and significant evolutionary change can still occur without any change in free fitness. Roughly speaking, although measures related to things like entropy provide an interesting physical quantity that might change directionally, the relationship between entropy and quantities that are of biological interest need not be simple. In a similar vein one might argue that, because biological evolution takes place within a physical system that is subject to the second law of thermodynamics, ultimately a general measure entropy must provide a directionality to the system. Again, while this is true in terms of the system as a whole, the mapping between entropy and the population states of biological interest is not one-to-one. Thus, even though the total entropy of the entire physical system must always increase, the entropy of any component part (e.g. the biological part of interest) need not change in this way. What do all these considerations have to say about how the process of evolution is studied, or how current theoretical research is done? Should evolutionary biologists care about such results? For instance, do the results point to new ideas that might help us do theory better? Although there is no single answer to this question, there are two points worth making in this regard. First, the distinction between open and closed models seems like a useful, and currently somewhat under-appreciated, way to categorize models of evolution. As such, it does suggest some new directions in which evolutionary theory might be taken, particularly given that open-ended models are sometimes amenable to asking novel, and potentially very important, evolutionary questions that cannot be addressed with closed models [3,10,11,19-24]. Second, to the extent that one cares about developing theory for open-ended evolutionary processes, the theorem presented here then reveals that there is an inherent ‘upper bound’ on how far we can push the predictive capability of such theory. In particular, although such theory opens the door to asking new evolutionary questions, unless evolution is progressive, there will remain some such questions that are unanswerable. Furthermore, although it will probably be difficult to use the theorem as a means of proving that evolution is progressive (i.e. by developing a negation-complete theory) or to use the theorem to prove that a complete evolutionary theory is possible (i.e. by determining that evolution is progressive), the result does nevertheless reveal that these two important, and somewhat distinct, biological ideas are fundamentally one and the same thing. The theorem presented here has close ties to Gödel's incompleteness theorem for axiomatic theories of the natural numbers [32-35,41]. An axiomatic theory consists of a set of symbols, a logical apparatus (e.g. the predicate calculus), a set of axioms involving the symbols and a set of rules of deduction through which new statements involving the symbols can be derived (termed ‘theorems’ [41]). Given such a system, theorems can be derived through the repeated algorithmic application of the rules of deduction. In the early 1900s there was a concerted attempt to produce such an axiomatic theory that was meant to represent the natural numbers, with the proviso that it yields all true statements about the natural numbers, and no false ones [41,42]. Gödel's incompleteness theorem [32-35,41], however, revealed that this is impossible for any axiomatic system sufficiently rich that it can make simple number-theoretic statements. For example, it shows that if the axiomatic system is rich enough that it can express the number-theoretic statement corresponding to the predicate ‘x ∈ R’, then it cannot produce all true number-theoretic statements and no false ones [41]. For if it could, then it could always produce the number-theoretic statement corresponding to either ‘x ∈ R’ or ‘x ∉ R’ as a theorem, because one of the two must be true. But if it can do this, then it provides an algorithmic procedure for deciding the predicate ‘x ∈ R’, and we know that this is not always possible as the results presented here illustrate. The halting problem from computability theory [36,37] is also intimately related to the results presented here. As already detailed, the question of whether a population state is evolutionarily attainable is equivalent to the question of whether a given positive integer is in the range of a particular computable function. Moreover, this last question is directly connected to the analogous question of whether a given integer is in the domain of a computable function (i.e. whether, given a particular integer input, the function returns a value in finite time). The last problem is precisely the halting problem, and it is known that there is no general algorithmic procedure for solving the halting problem for arbitrary computable functions [36,37]. As mentioned earlier, in a very general sense, the results presented here are applicable to any system that can be faithfully described by a Markov dynamical system over an infinite set of discrete possibilities (i.e. an open-ended dynamical system). Therefore, one might ask whether there is anything in the results presented that is particular to evolution per se? In one sense, the answer is ‘no’, but therein lies the power of such mathematical abstraction; it reveals the underlying key structure of the process. Evolution will be an open-ended dynamical system whenever heredity is indefinite, and it therefore shares a fundamental similarity with all other processes that are also such open-ended dynamical systems. At the same time, however, the results do have special significance for evolution. There are, perhaps, relatively few other kinds of processes of interest that share the property of being such an open-ended dynamical system in a meaningful way. For example, a great many processes of interest have a relatively small space of potential outcomes, and are thus clearly not open-ended. Furthermore, for those processes that are potentially open-ended, it is sometimes of little theoretical interest to distinguish among all possible outcomes, and therefore the space of relevant outcomes can still be relatively small. Moreover, even when the space of potential outcomes of interest truly is open-ended, some processes (e.g. some physical processes) obey simple enough dynamics that such negation-complete predictions can readily be made (i.e. the system is ‘progressive’ in the sense considered here). Thus, the limitations detailed by the theorem are of interest, primarily for those processes that are both open-ended and complex enough that the question of progression is unresolved (appendix D). Evolution under indefinite heredity might be a somewhat unique process in satisfying both of these criteria. There are, however, other processes of interest for which such decidability results might be of interest. After all, in an important sense, biological evolution is nothing more than the emergent properties of physics and chemistry. In fact, such limitations on theory have been discussed previously, particularly as they relate to the so-called theory of everything in physics [43]. It is probably safe to say that no general concensus on this issue has yet been reached [38]; however, the theorem presented here has implications for any physical or chemical theory that aims to explain evolutionary phenomena. It demonstrates that a rational, deductive, approach to such theory will necessarily face some inherent limitations on the answers that it can provide.

24 in total

1. Evolution of biological complexity.

Authors: C Adami; C Ofria; T C Collier
Journal: Proc Natl Acad Sci U S A Date: 2000-04-25 Impact factor: 11.205

2. The evolutionary origin of complex features.

Authors: Richard E Lenski; Charles Ofria; Robert T Pennock; Christoph Adami
Journal: Nature Date: 2003-05-08 Impact factor: 49.962

3. Macroevolution simulated with autonomously replicating computer programs.

Authors: Gabriel Yedid; Graham Bell
Journal: Nature Date: 2002 Dec 19-26 Impact factor: 49.962

4. THE NUMBER OF ALLELES THAT CAN BE MAINTAINED IN A FINITE POPULATION.

Authors: M KIMURA; J F CROW
Journal: Genetics Date: 1964-04 Impact factor: 4.562

5. Adaptive radiation from resource competition in digital organisms.

Authors: Stephanie S Chow; Claus O Wilke; Charles Ofria; Richard E Lenski; Christoph Adami
Journal: Science Date: 2004-07-02 Impact factor: 47.728

6. Microevolution in an electronic microcosm.

Authors: G Yedid; G Bell
Journal: Am Nat Date: 2001-05 Impact factor: 3.926

7. A general extreme value theory model for the adaptation of DNA sequences under strong selection and weak mutation.

Authors: Paul Joyce; Darin R Rokyta; Craig J Beisel; H Allen Orr
Journal: Genetics Date: 2008-09-14 Impact factor: 4.562

8. Prevolutionary dynamics and the origin of evolution.

Authors: Martin A Nowak; Hisashi Ohtsuki
Journal: Proc Natl Acad Sci U S A Date: 2008-09-12 Impact factor: 11.205

9. Continuity in evolution: on the nature of transitions.

Authors: W Fontana; P Schuster
Journal: Science Date: 1998-05-29 Impact factor: 47.728

10. A classification of replicators and lambda-calculus models of biological organization.

Authors: E Szathmáry
Journal: Proc Biol Sci Date: 1995-06-22 Impact factor: 5.349

3 in total

1. Zipf's Law, unbounded complexity and open-ended evolution.

Authors: Bernat Corominas-Murtra; Luís F Seoane; Ricard Solé
Journal: J R Soc Interface Date: 2018-12-21 Impact factor: 4.118

2. Are molecular alphabets universal enabling factors for the evolution of complex life?

Authors: Ian S Dunn
Journal: Orig Life Evol Biosph Date: 2014-02-09 Impact factor: 1.950

3. Driven progressive evolution of genome sequence complexity in Cyanobacteria.

Authors: Andrés Moya; José L Oliver; Miguel Verdú; Luis Delaye; Vicente Arnau; Pedro Bernaola-Galván; Rebeca de la Fuente; Wladimiro Díaz; Cristina Gómez-Martín; Francisco M González; Amparo Latorre; Ricardo Lebrón; Ramón Román-Roldán
Journal: Sci Rep Date: 2020-11-04 Impact factor: 4.379

3 in total