Literature DB >> 35626507

A Family of Fitness Landscapes Modeled through Gene Regulatory Networks.

Chia-Hung Yang¹, Samuel V Scarpino^1,2,3,4,5,6.

Abstract

Fitness landscapes are a powerful metaphor for understanding the evolution of biological systems. These landscapes describe how genotypes are connected to each other through mutation and related through fitness. Empirical studies of fitness landscapes have increasingly revealed conserved topographical features across diverse taxa, e.g., the accessibility of genotypes and "ruggedness". As a result, theoretical studies are needed to investigate how evolution proceeds on fitness landscapes with such conserved features. Here, we develop and study a model of evolution on fitness landscapes using the lens of Gene Regulatory Networks (GRNs), where the regulatory products are computed from multiple genes and collectively treated as phenotypes. With the assumption that regulation is a binary process, we prove the existence of empirically observed, topographical features such as accessibility and connectivity. We further show that these results hold across arbitrary fitness functions and that a trade-off between accessibility and ruggedness need not exist. Then, using graph theory and a coarse-graining approach, we deduce a mesoscopic structure underlying GRN fitness landscapes where the information necessary to predict a population's evolutionary trajectory is retained with minimal complexity. Using this coarse-graining, we develop a bottom-up algorithm to construct such mesoscopic backbones, which does not require computing the genotype network and is therefore far more efficient than brute-force approaches. Altogether, this work provides mathematical results of high-dimensional fitness landscapes and a path toward connecting theory to empirical studies.

Entities: Chemical

Keywords: biological computation; coarse-graining; fitness landscapes; gene regulatory networks; graph theory

Year: 2022 PMID： 35626507 PMCID： PMC9141513 DOI： 10.3390/e24050622

Source DB: PubMed Journal: Entropy (Basel) ISSN： 1099-4300 Impact factor: 2.738

1. Introduction

Since its introduction by Wright [1], the concept of fitness landscapes has grown and matured into a cornerstone of biology [2,3,4]. A fitness landscape consists of a space of genotypes that are mutually accessible through mutations and a fitness value associated with the phenotype each genotype encodes. In this context, fitness describes the evolutionary potential of each genotype, and the set of navigable genotypes on these landscapes is termed the genotype network [5]. Continuing with this metaphor, the evolution of a population can be depicted as a trajectory wandering on the fitness landscape. As a consequence, the topography of a fitness landscape sheds light on various evolutionary processes, including constraints on adaptation [6,7,8,9], speciation via genetic incompatibilities [10,11], (dis)advantages of sexual reproduction and recombination [12,13,14], the repeatability/reversibility (or not) of evolutionary trajectories [15,16,17,18], and the role of neutral networks—components of the genotype network with the same fitness—in epochal evolution [19,20,21,22,23,24]. Despite being introduced by Wright [1], Fisher’s 1930 geometric model of adaptation is the first mathematical model of evolution on what we now call fitness landscapes [15,25,26]. Later work by Kingman [27] and Kauffman and Levin [28] constructed what they termed a “house of cards” (HoC) model where fitness values for each genotype are drawn independently from a specified probability distribution. Building on the HoC model, Kauffman and Weinberger [29] introduced the NK model, which forces each locus to interact with a fixed number of other loci and where a genotype’s fitness becomes the sum of the fitness contributions of every interaction group. More recently, the “rough Mount Fuji model” [30,31] combines the HoC landscape with an additional field penalizing a genotype’s Hamming distance away from a referenced genotype with the optimal fitness. The dependence of a genotype’s fitness on that of neighboring genotypes is thought to be a key feature of empirical fitness landscapes. Over the past three decades, the fitness landscapes for various organisms, including bacteria [32,33,34], fungi [35,36], and fruit flies [37], have been empirically reconstructed. While the number of genotypes included in these early landscapes was limited, modern sequencing techniques and high-throughput analyses have enabled the construction of many large landscapes. Notable studies have been conducted in HIV [38,39], yeast [40], E. coli [41], jellyfish [42], human cancers [43], human stem cells [44], and DNA/RNA networks [45,46,47,48,49]. Comprehensive landscapes for multiple eukaryotic species have also been analyzed based on the binding affinity of transcription factors [50] and after accounting for the ecological context the species experiences [51]. What emerged from these studies is a set of prominent topographical features conserved across diverged taxa [52]. Together, empirical and modeled fitness landscapes exhibit three key topographical features. First, fitness landscapes are more often “rugged” than smooth [52]. The degree of ruggedness can be assessed via a variety of measures, such as the roughness of the slope ratio [53,54] and the number of local fitness maxima [55], which are often strongly correlated with each other [4]. Empirical studies typically show moderate ruggedness in the observed fitness landscapes [32,33,34,36,50,56]. The degree of ruggedness in these empirical landscapes is less than the HoC model assumes and comparable to a fine-tuned NK model or rough Mount Fuji model [4]. Second, fitness landscapes reveal mutational trajectories from one genotype to another where the fitness is non-decreasing, which implies accessibility (typically to a fitness optimum) across the landscape [32,57,58,59,60]. Lastly, whereas the inaccessible region in the HoC model expands when distant from the fitness optimum [61], other models find accessible trajectories despite high genotypic dimensionality [62,63]. Due to the often pervasive interaction between loci, determining phenotype from genotype can have a high degree of computational complexity [64,65]. Many existing fitness landscape models have dealt with this complexity by strongly constraining the state-space of possible genotypic interactions and/or reducing the complexity of how information is processed when mapping genotype to phenotype. For example, studies have focused on the folded structure of short RNA sequences, where the resulting stability or affinity is a fitness proxy [66,67,68,69,70], networks of molecular/genetic pathways whose expression pattern or homeostasis determines fitness [71,72,73,74,75], and modular mutational effects at different loci in Fisher’s geometric model [76]. Here, we model the genotype–phenotype map using the pathway framework of gene regulatory networks (GRNs), where mechanistic knowledge of how phenotypes are computed from genotypes is encoded in the GRN (see [77,78] for a more formal introduction). To study the fitness landscapes induced by GRN evolution, we integrate the pathway framework into a family of fitness landscape models where the fitness value is uniquely determined by the phenotype corresponding to the regulatory outcome of a genotype. For a fitness landscape of GRNs, we first prove the existence of two key topographical features: (a) GRNs with the same phenotype are themselves connected in the underlying genotypic network, and (b) there exists accessible trajectories between all pairs of GRNs with similar phenotypes. Second, utilizing the idea of symmetries and automorphisms in the genotype network, we coarse-grain GRNs into groups with equivalent roles in the fitness landscapes and deduce an underlying mesoscopic structure with which we can predict the trajectory of evolution with minimal complexity. Lastly, using this coarse-graining, we develop a bottom-up algorithm for constructing the underlying fitness landscape of GRNs, which does not require computing the genotype network and is thus more efficient than the conventional brute-force approach.

2. Methods

Here, we introduce a family of fitness landscape models where the genotype-phenotype mapping is constructed from regulatory interactions. We first summarize a modeling framework of GRNs proposed in our previous work [77,78], termed the pathway framework, and then fitness landscape models of GRNs built upon the pathway framework.

2.1. Pathway Framework of GRNs

Genotypes in the pathway framework of GRNs contain all necessary information to construct a regulatory network [77,78]. More specifically, alleles at each locus include both a transcription activator and a protein product, which means the regulatory interactions among the loci can be deduced by connecting genes whose expression product corresponds to the activator of another. Compared to existing work on regulatory circuits—where mutations are modeled as rewiring a single interaction between genes [79,80]—the pathway framework considers a mutation as changing the activator/product of a gene. Lastly, the phenotype is determined by the set of loci reached in a regulatory cascade induced by external stimuli. These stimuli could be completely external to the individual or simply come from another regulatory network in the organism. For additional details on the pathway framework, see [77,78] and Figure 1 for illustration.

Figure 1

Cartoon illustration of the pathway framework of GRNs adapted from [77,78]. Under our four simplified assumptions, a GRN (genotype) consists of a fixed number of proteins as nodes and a constant number of directed edges depicting the activator/product pairs of genes. The phenotype is modeled as the Boolean states of proteins (colored), which are determined by their reachability from the external stimulus (lightning icon).

In this work, when building the family of fitness landscape models, we restrict the pathway framework with four assumptions. First, we consider a fixed set of genes underlying the genotypes, i.e., gene duplication and deletion events are excluded. Second, we assume a fixed underlying collection of proteins that can possibly exist in the organism. Third, we consider the case where a gene’s expression is activated by a specific protein, and it generates only one protein product. Fourth, we assume that the associated chemical state of each protein is modeled as a Boolean/binary variable (present or absent), and external environmental signals stimulate the existence of specific proteins in the organism. As a consequence, the Boolean state of a phenotype-related protein is determined by whether it is reached by a regulatory cascade starting from an initial stimulus. While the above assumptions seem naive, as we will show in Section 3, this simplified model still predicts the topographical features observed in empirical landscapes (see Section 1). As a result of these assumptions, we are able to derive rigorous theoretical insights into GRN evolution and obtain fitness landscapes consistent with far more complicated models. We believe these assumptions are conservative with respect to the biology and a justified starting point for modeling fitness landscapes, and we discuss the implications of these assumptions and possible extensions to the model in Section 4.

2.2. Fitness Landscape of GRNs under the Pathway Framework

Let and be the fixed, underlying collection of loci and proteins, respectively. A genotype is represented by its GRN g such that every locus is associated with a protein activator/product pair , . Equivalently, any GRN is a directed graph with nodes labeled by the proteins and edges labeled by the loci . In the rest of this paper, we will use the terminology “source/target” node of edge interchangeably to refer to the protein activator/product of locus . We also write to be the set of all GRNs with the underlying loci and proteins . The backbone of a fitness landscape of GRNs, i.e., the genotype network, is an undirected network of networks encoding the mutational relationship between the GRNs. Let G be the genotype network, and we denote its mega-nodes by and its edges by . There is an edge between any GRNs when they only differ by the allele of a single locus , . In other words, and are connected in G when they can be transformed into each other through one edge rewiring. Furthermore, we write to be the binary state of protein , where indicates the presence of , and designates its absence. We also partition into three disjoint groups: (a) proteins whose presence is externally stimulated by the given environment, (b) proteins whose states influence the fitness value, and (c) the remaining ones, which we call the dummy proteins since their specific identities are irrelevant to the external environment and the resultant phenotypes/fitness. (In this paper, we assume that the stimuli must be proteins that cannot be produced by expression, and we leave no constraint to the fitness-relevant and dummy proteins and ). A phenotype is then treated as a vector of zeros and ones, where each entry corresponds to the binary state of a protein in . The resultant phenotype of a GRN g is determined by the reachability in g: For any , if and only if there is a stimulus and a path from to in g, which represents a chain of sequentially expressed genes that generates protein . Finally, the fitness f is simply a function of the phenotype . Combined, a fitness landscape of GRNs is characterized by three key elements: the genotype network G, the external stimuli , and the fitness function of phenotype f (which implicitly identifies the fitness-relevant proteins ). The genotype network G serves as the skeleton of the fitness landscape, whereas the environment-dependent stimuli and fitness function f determine the phenotypes of GRNs and their selective advantages.

3. Results

In this work, we derive three theoretical insights into fitness landscape models using GRNs as the embedded genotype–phenotype mapping. First, we show that the resulting family of fitness landscapes must always contain two topographical properties: connectivity, i.e., GRNs with the same phenotype can be mutually reached via mutations, and accessibility, i.e., that any GRN can be reached from an arbitrary less-fit GRN (once certain similarity criterion is met). Second, we propose a mesoscopic coarse-graining for fitness landscapes, which is a more compact alternative to analyzing evolutionary processes than the original landscape. This mesoscopic backbone recognizes “symmetries” in the genotype network, and it aggregates GRNs with the same role in the fitness landscape into a single representative genotype. Third, we provide a bottom-up approach to algorithmically construct this mesoscopic backbone and demonstrate its efficiency over coarse-graining the genotype network using brute force.

3.1. Connectivity and Accessibility in a Fitness Landscape of GRNs

A fitness landscape model of GRNs features a handful of properties that have either been discovered in empirical fitness landscapes or investigated mathematically. First, its underlying space, i.e., the genotype network G, presents immense dimensionality. Second, the fitness function f is flexible and can effectively tune the ruggedness of the fitness landscape. For example, a highly rugged “holely” landscape can be modeled by a binary f such that any GRN has high fitness once some single protein is present, , and otherwise, g has low/zero fitness. Because one can always find several mutational neighbors of g whose phenotype shows an opposite state , the resultant fitness landscape is inevitably rugged. In what follows, we further show that fitness landscape models of GRNs must hold the characteristics of connectivity and accessibility. Let be a phenotype and denote by the set of all GRNs with phenotype , i.e., for , under the given external stimuli . We also write to be the required-present proteins in the phenotype , so for and for any other . Note that the number of required-present proteins is bounded from above by the number of loci since any present protein that is not a stimulus must be triggered by the expression of some locus. We observe that some GRNs play a “central” role among GRNs with the same phenotype . Specifically, for any , all the edges in point from the stimuli to the required-present proteins , and each is targeted by at least one edge in . We demonstrate an example of such in Figure 2a. These are deemed central because they can be reached by any GRN through mutations among themselves: First, for every edge in g that points to an , we rewire the edge such that it still points to but now from an . Arbitrarily rewiring the remaining edges between and then leads to some central GRN in (see Figure 2a).

Figure 2

Connectivity exists between all GRNs of the same phenotype. (a) Any GRN can be rewired/mutated into a “central” GRN (shown on the right). (b) A redundant edge (dark green) makes it feasible to turn any central GRN into another via edge rewiring. (c) There is a mutational trajectory between any GRNs of the same phenotype through the central GRNs.

In addition, if the phenotype has strictly less required proteins than the number of loci, the central GRNs are mutually reachable by edge rewiring among . There is always a redundant edge whose rewiring makes no change to the phenotype, and it helps us rewire each edge to any desired source/target pair between and (see Figure 2b), which subsequently creates a chain of mutations between any . These results implicate that, for any phenotype with and any , there is always a mutational trajectory between and that only traverses over GRNs in , especially through the central ones (see Figure 2c). In the extreme case where , however, fragments into multiple connected components (detailed in Appendix A). Next, we turn to accessibility between GRNs of different phenotypes and , where without loss of generality . We observe that, if , there are always two “peripheral” GRNs and , which only differ by one edge rewiring. To be more specific, there are two independent chains in , one of which begins with a stimulus and sequentially connects the proteins required to be present in but not in , i.e., , while the other consecutively joins . The rest of the edges in merely point from to , and each is targeted by at least one edge (see example in Figure 3, left). The other GRN only differ from g by the first edge in the chain of , which is rewired such that it points from the stimulus to the first node in the chain of (Figure 3, right).

Figure 3

Example of peripheral GRNs connecting two different phenotypes—a peripheral GRN of phenotype in this example. There is a chain that triggers the presence state of proteins . However, the other peripheral GRN of phenotype contains a chain of proteins . and are mutational neighbors since they only differ by rewiring the dark green edge, i.e., the first edge in either chain.

Our observation suggests that there is a sequence of mutations with non-decreasing fitness from any GRN to any GRN , as long as . In particular, when , the mutational trajectory starting at g first traverses within to a peripheral GRN and then transitions into to reach . An analogous trajectory exists even under the extreme scenario (see Appendix A). We also note that if the number of fitness-relevant proteins is , then the condition is assuredly satisfied for any two phenotypes and . As a corollary, if , the fitness optimum will always be accessible.

3.2. Mesoscopic Skeleton Derived from “Symmetries” in the Genotype Network of GRNs

Because the number of possible GRNs grows super-exponentially as the underlying loci and proteins expand, constructing the genotype network becomes extremely challenging beyond a small and . Here, we present a more compact skeleton of the fitness landscape of GRNs based on “symmetries” in the genotype network. As the underlying space of a fitness landscape of GRNs, the genotype network G appears to contain redundant information. On the one hand, GRNs leading to the identical phenotype are deemed to have equal fitness. On the other hand, given any GRN, for example, the mega-node rounded by orange in Figure 4a, one can always find some other GRN such that their neighborhoods in G are locally similar, e.g., the mega-node rounded by blue. This simple demonstration suggests that the structure of the genotype network G is not arbitrary; instead, some structural symmetries exist.

Figure 4

The genotype network has symmetry such that multiple GRNs have similar local neighborhoods, as we demonstrate in (a) since the corresponding GRNs only differ by exchanging the role of loci A and B. More formally, these GRNs constitute an equivalence class under phenotype-preserving automorphisms, which can be found by graphical operations of (b) permuting loci, (c) permuting dummy proteins (circles), (d) exchanging edges pointing from two different stimuli (squares), and (e) exchanging self-loops at two different nodes.

In graph theory, symmetries in a network are formally described through the network’s automorphisms. An automorphism of a graph is a way to shuffle the labels of its nodes such that the graph remains identical before and after shuffling. For instance, in Figure 5b, exchanging nodes 2 and 3 generates the same network and is thus an automorphism, whereas exchanging nodes 2 and 4 is not because there is an edge from 2 to 3 after shuffling. Formally, an automorphism of the genotype network G is a permutation of all plausible GRNs such that, for any , if and only if we also have . (A permutation of is a mapping where no two GRNs are mapped to the same GRN, i.e., if for any .) Once two GRNs g and are related through an automorphism of G, e.g., , they share the same mega-node properties that are fully determined by the connections in the genotype network (see Proposition A1).

Figure 5

(a) As a minimal example, imagine an operation that rotates a geometric object 90 degrees clockwise. The rotation maps one object onto another (dashed arrows), and it leads to equivalence classes where objects are grouped by their symmetry under rotation (pink rectangles). (b) An automorphism of a graph is a permutation of nodes that retains the same graph. (c) Equivalence classes under graph automorphisms bring together nodes that have similar roles connection-wise in the graph.

Furthermore, automorphisms partition the GRNs by their roles in the genotype network through the mathematical concept of equivalence classes. For a high-level and general description, imagine a set of elements and a group of operations acting on them. Each operation turns one element into another, and these two elements are related by the operation, which describes the similarity between them. An equivalence class consists of elements that are mutually related by any operation, and the set of elements is said to be partitioned into equivalence classes under the action of the operations (see Figure 5a for an illustrative example). For automorphisms of a graph , the equivalence classes of nodes under the action of then gather nodes with a similar “structural position” in (Figure 5c). However, to reveal GRNs with identical roles in a fitness landscape, these automorphisms also need to preserve the phenotype. Denote by the set of such automorphisms of G, i.e., for any , and GRN , and g have the same phenotype. The equivalence classes of mega-nodes under the action of phenotype-preserving automorphisms then unite GRNs that (a) show similar mutational relationships with others and (b) lead to the same fitness due to their identical phenotype. We will mildly abuse the terminology to call them the equivalence classes of GRNs, which we denote by , and each is a set of GRNs related through . Crucially, since the mutational relationship and the resultant phenotype are the two components that characterize a GRN in the fitness landscape, GRNs in a are deemed equivalent semantically, and they can be reduced to an arbitrary representative among them. Therefore, the equivalence classes of GRNs provide an efficient way to depict the underlying space of the fitness landscape. However, what exactly composes the phenotype-preserving automorphisms of the genotype network? From a sufficiency direction, we show that there exist a few graphical operations on the GRNs that produce phenotype-preserving automorphisms. These graphical operations involve permuting/shuffling different sorts of elements in a GRN: The identities of loci , e.g., exchanging edge labels of loci A and B in Figure 4b; The identities of dummy proteins , e.g., exchanging node labels of proteins 3 and 4 in Figure 4c. Then, potentially rewiring a given edge (see details in Definitions A1 and A2): Change the source node of an edge from one stimulus to another stimulus and vice versa, e.g., in Figure 4d, moving an edge pointing from node 1 to node 3 to pointing from node 2. (Note that this operation is not necessarily equivalent to permuting the identities of stimuli since at most only the single focal edge will be affected.) Move a self-loop at one node to another node and vice versa, for example, re-allocating a self-loop at node 3 to node 4 in Figure 4e. For the formal proofs, we point the reader to Theorem A1. Additionally, from a necessity direction, one can computationally obtain a partition of the GRNs that is coarser than the equivalence classes . (A partition is coarser than another partition if any group in is included in some group in .) Specifically, start with a partition where GRNs with the same resultant phenotype are grouped together. We create a sequence of partitions of through the following iterative procedure: Given the partition , the next partition is obtained by further dividing groups into (if needed) such that for each group and , any two GRNs in have the same number of neighbors among . This iterative procedure is terminated when no further division is required, i.e., for some integer k (see Figure 6a for an illustrative cartoon of the iterative procedure). We then have to be our desired partition of GRNs.

Figure 6

(a) Consider a toy example genotype network of GRNs. (Here, we omit the exact content of GRNs.) Given the partition , note that mega-nodes in a group (dashed orange rectangle) may share a different number of connections among other groups (blue shaded circles), and they are further divided to generate the next partition . (b) Both the equivalence classes of GRNs and the stationary partition from our iterative procedure are equitable, e.g., each mega-node in group (1) has one connection among (1), another connection with (2), and none with other groups.

To see why the proposed iterative procedure generates a coarser partition than the equivalence classes of GRNs, we stress that the equivalence classes under automorphisms always form an equitable partition. A partition of nodes of a graph is equitable [81] if for every , any two nodes in group have the same number of neighbors in (Figure 6b). Since GRNs in an equivalence class must have the same amount of neighbors for each different phenotype, we inductively show that any two GRNs are never separated during the iterative procedure that generates (see Theorem A2). Therefore, any equivalence class must be included in a computationally acquired group . Figure 7 demonstrates the coarser partition generated by the iterative procedure for an arbitrary toy example. The obtained contains 154 groups of GRNs, and the size of groups ranges from 2 to 96. We also count the number of different kinds of GRNs that can not be transformed through graphical operations (i) and (ii), and this number varies from 1 to 4 in our example . Moreover, for every group in , we observe that those different kinds of GRNs can be related by changing the stimulus that an edge is pointing from and re-allocating self-loops (e.g., see Figure 7b). is thus not simply a coarser partition than the equivalence classes; according to (i)–(iv), we know that groups in are exactly the equivalence classes . This arguably general toy example implicates that there is no need for other graphical operations to determine the equivalence classes of GRNs.

Figure 7

Example partition coarser than the equivalences classes of GRNs. We run the proposed iterative procedure with and , where stimuli are drawn as squares and the present/absent state of fitness-relevant proteins are colored by orange/blue. (a) The number of GRNs and the number of isomorphism classes of GRNs in each group of the obtained partition , where the dashed lines separate groups of different phenotypes and (b) isomorphism classes of GRNs in a group.

As a result, we conjecture that all the phenotype-preserving automorphisms of the genotype network can be generated by combining graphical operations (i) to (iv) on the GRNs. In other words, two GRNs and belong to the same equivalence class if and only if, after removing all the self-loops and merging stimuli into a single node, there exist permutations of loci and dummy proteins that jointly transform into . This condition reconciles with the concept of isomorphisms between graphs. Whereas an automorphism is a mapping of nodes such that a graph preserves itself, an isomorphism is a mapping of nodes that transform one graph into another. We will borrow the terminology and call the two permutations of and together a phenotype-preserving isomorphism from to .

3.3. Algorithmic Construction of the Mesoscopic Backbone of GRN Fitness Landscape

Next, we investigate algorithmic approaches to construct the mesoscopic backbone of a fitness landscape based on equivalence classes, where a representative GRN replaces all other GRNs in an equivalence class due to their identical role. In particular, the desired algorithm must (a) acquire the equivalence classes from scratch and (b), for a representative GRN in any equivalence class, count the number of its mutational neighbors in other equivalence classes and also within the class it belongs to. To avoid any confusion, we emphasize that, although drawing mutational connections between equivalence classes can be achieved by grouping mega-nodes in the genotype network G, this naive exercise is unsuitable. First and foremost, grouping mega-nodes demands prior knowledge of the genotype network itself, but its construction is computationally heavy. Second, in contrast to coarse-graining nodes in a graph where the groups of nodes are pre-specified, listing all GRNs in an equivalence class requires examining pairs of GRNs and assuring a phenotype-preserving isomorphism between them after removing self-loops and merging stimuli. Determining the equivalence classes from all the GRNs can thus be costly as well. These reasons again show the value of the equivalence classes , which consolidate GRNs into their equivalent representatives. Here, we present a bottom-up approach that enumerates each equivalence class of GRNs and simultaneously computes the number of mutational connections among them. To begin, recall from Section 2.2 that a mutation from a GRN to another corresponds to rewiring a single edge in , where may rewire a self-loop/non-self-loop edge to a self-loop/non-self-loop edge in . We observe that the number of non-self-loop edges in mutational neighbors and differ at most by one. We denote by the loci representing the non-self-loop edges in the GRN g, and the number of those non-self-loop edges. In other words, given equivalence classes and representative GRNs and , g has no mutational neighbors in if . We can therefore build the mesoscopic backbone by incrementally examining each equivalence class with an increasing number of non-self-loop edges in the representative GRN. This strategy is envisioned in Figure 8, where the backbone can be viewed as “layers” of equivalence classes of GRNs. Let be the set of equivalence classes where for every , the representative GRN has exactly k non-self-loop edges, . We start with layer , which consists of the only equivalence class with no non-self-loop edges. Then, with layers and all the mutational connections among them, we will find the equivalence classes in the next layer and their mutational connections with layer and within themselves up until , where all the edges are non-self-loops.

Figure 8

Layering the GRNs by their number of non-self-loop edges. A GRN’s mutational neighbors must fall into the same or the adjacent layers. For ease of illustration, we only show the non-self-loop edges and neglect the protein states in GRNs.

To be more precise, we introduce the concept of neighborhood: For any GRN , denote by the mutational neighbors of g that have one more non-self-loop edge than g. neighborhoods are sufficient to capture the relationship between two mutational neighbors g and : If has one more non-self-loop edge than g, then ; If has one less non-self-loop edge than g, then we have ; If has the same number of non-self-loop edges as g, and then they share a common mutational neighbor , where the only different edge between g and is rewired to a self-loop and thus . The mutational connections between equivalence classes can hence be uncovered by examining the neighborhood of the representative GRNs. Moreover, the neighborhood of representative GRNs in layer reveals the equivalence classes in layer because any GRN must have a mutational neighbor with one less non-self-loop edge. All that remains is to join different neighbors into equivalence classes. In particular: For an equivalence class and its representative GRN , under what condition will belong to the same equivalence class in layer ? For two distinct equivalence classes and their representative GRNs and , under what condition will and belong to the same equivalence class in layer ? For our ease of illustration, we hereafter choose the GRNs g, , , and such that only one stimulus node is incident to out-going edges. To address (A), let belong to the same equivalence class, so there is a phenotype-preserving isomorphism from to after self-loop removal. Recalling from Section 2.2, denotes that “the source–target pair of edge is in GRN g.” Furthermore, we write and , where and are the non-self-loop edges “added” to g that forms and , respectively. A few observations follow: There is an integer p such that and ; There is another integer such that and ; for ; for ; For any locus and non-self-loop source–target pair such that for , we have if and only if . We detail the reasoning behind these observations in Lemma A1–A3. Critically, our fifth observation implies that, after self-loop removal, the isomorphism between and is in fact a phenotype-preserving automorphism of a subgraph of the GRN g. In addition, observations 3. and 4. show that those edges in g—but not in —are sequentially mapped from one to another via this automorphism , i.e., , and they bridge the newly added edges and . We show that the converse is also true (see Theorem A3): After self-loop removal, if we find a phenotype-preserving automorphism of a subgraph of g where is consecutively mapped to through the edge differences , is guaranteed a phenotype-preserving isomorphism from to . The sufficient and necessary condition for two neighbors of g to be in the same equivalence class, intriguingly, lies in the phenotype-preserving automorphisms of subgraphs of the representative GRN g. Here, we demonstrate a few simple examples in Figure 9a. In the top row, an automorphism of g directly maps between the two additional edges and . In the middle row, the two edges are consecutively mapped to through edge , and is consecutively mapped back to through the non-edge , so we have and . As a mixture of both, in the bottom row, is consecutively mapped to through edge , and this isomorphism is exactly an automorphism of a subgraph of g where edge is removed.

Figure 9

Sufficient conditions that two neighbors belong to an equivalence class. For illustration purposes, we only show the dummy proteins and omit the protein states, edge labels and self-loops in (a) three examples such that two neighbors of a GRN g are isomorphic, and (b) an example where the neighbors and of GRNs and in different equivalence classes are isomorphic.

Switching gears to the remaining question (B), suppose that and are the representative GRN in two different equivalence classes where and that and belong to the same equivalence class. Let and be the newly added edges to and that generate and , respectively, where and , and let be a phenotype-preserving isomorphism from to after self-loop removal. We observe that applying the permutation on transforms it into another GRN in the same equivalence class. Since simply has one less edge than , and and only differ by a missing edge . Namely, we have with the additional edge . Moreover, since also belongs to the neighborhood of with the additional edge , by removing both the extra edges from , we find a GRN such that . We again present an illustrative example in Figure 9b. Here, a GRN in the equivalence class of can be found via the isomorphism between and . We note that the newly added edge is transformed into in , which is missing in . Removing both and from produces a GRN , which is a common neighbor of and with one less non-self-loop edge. Our observation resolves the necessary condition of (B): For the representative GRNs of two different equivalence classes and , if their neighbor and belong to the same equivalence class, then we can always find two GRNs and such that (a) falls into the equivalence class of , and (b) and are neighbors of . Moreover, the converse is true as well (Theorem A4). Therefore, whether the neighborhood of and reveal a common equivalence class depends on the existence of a GRN that both the equivalence classes and are rooted from. Our strategy to build the mesoscopic backbone is now complete, and here, we detail our algorithm that incrementally generates the equivalence classes of GRNs and establishes the mutational connections among them. Suppose that we have already built layers of equivalence classes and determined the mutational connections among them. For each representative GRN g in layer and every , we will view as the combination of g and an additional, non-self-loop edge , for which we write . All such combinations form a collection of neighbors of the representative GRNs in layer , for which we abuse the notation . We initially put each into an individual group, and we define a collection of operations that join groups of neighbors: For every representative GRN g in and every phenotype-preserving automorphism of g, there is an operation that joins together the groups of and , where and ; For every representative GRN g in and every phenotype-preserving automorphism of each subgraph of g such that the edge differences are sequentially connected via , there is an operation that joins together the groups of and , where automorphism consecutively transforms edge into through ; For every representative GRN in and each and in two different equivalence classes and , such that we have phenotype-preserving isomorphisms / from / to the representative GRN / after self-loop removal, there is an operation that joins together the groups of , and . The resulting groups of neighbors, after applying the joining operations , constitute the equivalence classes in the next layer . We hereafter denote by the corresponding consequent group of an equivalence class . We then choose an arbitrary neighbor in as the representative GRN of the equivalence class , such that only one stimulus node is incident to out-going edges in the chosen representative GRN. The joining operations further provide useful information to count the number of mutation neighbors that a representative GRN in layer has among any equivalence class , which we will denote by . Let us first consider . For any , is a mutational neighbor of g if it can be viewed as a combination of g and an arbitrary extra non-self-loop edge, and hence Note that, in this case, is easily acquired when building up the layer through . Second, for , can be computed given , where is the representative GRN of . Since the equivalence classes generate an equitable partition of the genotype network G (see Section 3.2), we have equal to the total number of mutational connections between and . Moreover, the size of the equivalence class is (see Appendix D) where (a) we denote by the set of all permutations of dummy proteins and denote by the set of automorphisms of the representative GRN g after self-loop removal that only permutes ; (b) is the number of ways to allocate labeled self-loops among the proteins ; (c) is the number of ways to re-distribute the edges pointing from stimuli in g; and (d) is the number of ways to divide loci into self-loops, non-self-loop edges pointing from stimuli, and others. As a result, Third, we turn to the case where but . Recall that, if any is a mutational neighbor of g, then there is a GRN in layer , where , and such is unique up to arbitrary self-loop re-allocation. Additionally, the extra edge in g and must correspond to the same locus, so in which we use to be the representative GRN of equivalence class . Lastly, if , we also need to include the scenario that the mutational neighbor of g is generated by rewiring a self-loop to another self-loop. Therefore, In Algorithm 1, we summarize our proposed approach that constructs the mesoscopic backbone. It is apparent that the core of our algorithm is determining the joining operations for a given layer . This task can be achieved by pre-computing the phenotype-preserving automorphisms of every representative GRN once it is chosen. In addition, since these joining operations reflect the mutational neighbors and the phenotype-preserving isomorphisms in previous layers, the type-(III) for layer is generated as a composition of the already uncovered operations. Furthermore, the remaining of type (II) consists of combinations of the uncovered joining operations and the newly computed automorphisms of representative GRNs in layer . As a result, the only prerequisite in our proposed algorithm is producing the phenotype-preserving automorphisms of a GRN. ▹ initialization a GRN with no self-loop, where is the only equivalence class in layer Store the phenotype-preserving automorphisms . Compute via Equation (5). whiledo ▹ incrementally find Construct and store the joining operations for layer . grouping of acted by corresponds to the groups in . for all do a GRN in ▹ choose the representative GRN Store the phenotype-preserving automorphisms . end for for all do ▹ count the number of mutational neighbors Compute and via Equations (1) and (3). end for for all do Compute via Equations (4) and (5). end for end while Set any remaining, not computed to zero.

4. Conclusions

In this work, we integrate mechanistic knowledge of how phenotypes are computed from genotypes via regulatory interactions into fitness landscape models. The resulting family of fitness landscape models features flexibility for tunable ruggedness and accessibility among phenotypes. Furthermore, we introduce the concept of equivalence classes of GRNs, where GRNs of the same phenotype and with similar structural positions in the genotype network are coarse-grained into a group. These equivalence classes of GRNs lead to a compact and informative description of the fundamental space of a fitness landscape. Using this coarse-graining, we develop a bottom-up, efficient algorithm for constructing the underlying space of a fitness landscape based on the equivalence classes. Critically, this algorithm does not require pre-computing the genotype network and therefore permits the exploration of substantially larger GRNs. Naively, ruggedness and accessibility would seem to be contradictory characteristics of a fitness landscape. Indeed, reciprocal sign epistasis has been shown to yield a strong influence on a landscape’s ruggedness and was regarded as an impediment to evolutionary accessibility when first introduced [2,32,55]. Nevertheless, recent studies suggest that fitness landscape models most closely aligned with empirical observations show that sign epistasis (and thus ruggedness) can co-exist with accessibility [63,82]. In addition to demonstrating that ruggedness and accessibility are not mutually exclusive, our model is compatible with three additional empirical observations. First, GRNs result in high dimensional genotype–phenotype maps [63]. Second, selection acts on the superposition of mutations and the background GRN rather than a few pairs of mutations [60]. Third, and perhaps most importantly, a GRN may experience a series of neutral mutations and then evolve into a nearby phenotype [3,8,83,84]. The accessibility induced in fitness landscapes of GRNs via neutral evolution agrees with the phenomenon of punctuated equilibrium/epochal evolution [23,85,86]. Our derived equivalence classes for GRNs provide a novel, mesoscopic, and optimally descriptive skeleton of a fitness landscape. Neither the genotypic space nor the phenotypic space alone fully characterize a fitness landscape; however, models with even a relatively simple genotype–phenotype map are computationally intensive because they must retain all plausible genotypes [70,73,74,75]. Intuitively, the complexity of a genotype–phenotype map can be reduced by combining similar phenotypes into high-level descriptors [87]. The equivalence classes of GRNs, on the other hand, serve as an intermediate level between the genotypic and phenotypic space, which provides an optimal coarse-graining that encodes all necessary information to predict the evolutionary trajectory on the fitness landscape. We argue that our proposed algorithm for coarse-graining GRN fitness landscapes is more efficient than brute-force approaches. First, because we consolidate an equivalence class into a single representative GRN, our method is less costly in memory and requires fewer computations when finding mutational neighbors. Second, suppose all plausible GRNs were organized into layers by the number of non-self-loop edges (see Section 3.3), every layer would still super-exponentially contain many GRNs. Our algorithm instead finds the equivalence classes in each layer iteratively. To construct the ()-th layer, we only have to exhaust the representative GRNs in the k-th layer–along with any plausible additional non-self-loop edge(s)–this amount will be significantly fewer than the number of GRNs in the ()-th layer. Lastly, existing heuristics for graph automorphisms [88,89] can be used to produce the phenotype-preserving automorphisms of the representative GRNs, which is the only prerequisite when joining together different GRN–edge pairs. Because the set of automorphisms becomes more limited as the complexity of GRNs increases, we expect only a minor overhead in the joining procedure as compared to the exhaustive, brute-force approach. Despite our model being constrained to the pathway framework of GRNs [77,78] and a few naive assumptions described in Section 2, we believe our methodology to be flexible and, in what follows, we outline some potential directions to extend the framework. First, when GRNs are modeled through more complex computation, e.g., with different logic gates connecting multiple expression activators/suppressors/products, those GRNs that only consist of naive interactions are never excluded. Thus, the current model represents a subset of the complete landscape built by more complex gene regulation. The derived connectivity and accessibility among the naive GRNs still hold, and we expect these topographical features to manifest for complex GRNs if mutations between the simple and complex expressions are permitted. Second, hypergraphs [90] could be used to describe the expression behavior of genes where multiple activators/products appear. Third, stable motif identification [91] and target control [92] for Boolean network models could be used to explore the phenotypes of mutational neighbors of a focal complex GRN. Lastly, our methodologies are likely applicable to other classes of genotype–phenotype maps [93,94]. In particular, once the mapping and the genotype network are determined, one can simply follow the proposed iterative procedure (Figure 6) to obtain a genotype partition coarser than the equivalence classes. More broadly, this work showcases the potential of combining biological computation across different scales along the hierarchy of living systems. Computing biological functionality on the organism level with genotype–phenotype mapping provides a blueprint of the overall fitness landscape, where evolutionary processes occur/compute on the population level. Furthermore, several intriguing perspectives arise from the proposed mesoscopic backbone if we consider evolution to be a random walk on the fitness landscape. The process of evolution not only manifests genotypes with higher fitness values but also reveals genotypes whose mutational neighbors are more fit [19,23,78]; in other words, the prevalence of different genotypes would reflect the connection counts between equivalence classes of GRNs. In addition, these “connection counts” could become associated with an analogous theory of computation in evolution that addresses questions such as how likely a genotype in an equivalence class is to evolve into a specified phenotype, as well as how likely it is to “reset” to another genotype in the same equivalence class and recover its position in the fitness landscape.

78 in total

1. Metastable evolutionary dynamics: crossing fitness barriers or escaping via neutral paths?

Authors: E van Nimwegen; J P Crutchfield
Journal: Bull Math Biol Date: 2000-09 Impact factor: 1.758

Review 2. The evolutionary enigma of sex.

Authors: Sarah P Otto
Journal: Am Nat Date: 2009-07 Impact factor: 3.926

A Family of Fitness Landscapes Modeled through Gene Regulatory Networks.

1. Introduction

2. Methods

2.1. Pathway Framework of GRNs

2.2. Fitness Landscape of GRNs under the Pathway Framework

3. Results

3.1. Connectivity and Accessibility in a Fitness Landscape of GRNs

3.2. Mesoscopic Skeleton Derived from “Symmetries” in the Genotype Network of GRNs

3.3. Algorithmic Construction of the Mesoscopic Backbone of GRN Fitness Landscape

4. Conclusions

1. Metastable evolutionary dynamics: crossing fitness barriers or escaping via neutral paths?

Review 2. The evolutionary enigma of sex.

3. Exploring the effect of sex on empirical fitness landscapes.

4. Comprehensive experimental fitness landscape and evolutionary network for small RNA.

5. Negative epistasis between beneficial mutations in an evolving bacterial population.

Review 6. Topological features of rugged fitness landscapes in sequence space.

7. A thousand empirical adaptive landscapes and their navigability.

8. Stepwise acquisition of pyrimethamine resistance in the malaria parasite.

9. The evolutionary dynamics and fitness landscape of clonal hematopoiesis.

Review 10. On the networked architecture of genotype spaces and its critical effects on molecular evolution.