| Literature DB >> 32448114 |
Katharina C Wollenberg Valero1.
Abstract
BACKGROUND: Functional constraint through genomic architecture is suggested to be an important dimension of genome evolution, but quantitative evidence for this idea is rare. In this contribution, existing evidence and discussions on genomic architecture as constraint for convergent evolution, rapid adaptation, and genic adaptation are summarized into alternative, testable hypotheses. Network architecture statistics from protein-protein interaction networks are then used to calculate differences in evolutionary outcomes on the example of genomic evolution in yeast, and the results are used to evaluate statistical support for these longstanding hypotheses.Entities:
Keywords: Adaptation; Constraint; Evolution; Systems biology
Year: 2020 PMID: 32448114 PMCID: PMC7245893 DOI: 10.1186/s12862-020-01613-8
Source DB: PubMed Journal: BMC Evol Biol ISSN: 1471-2148 Impact factor: 3.260
Glossary of terms
| [ | |
| The portion of evolutionary constraint, which is determined at the level of genes or their gene products, for example codon constraint or developmental genetic pathways. | |
| The portion of network constraint attributed to the structure or architecture of gene interactions that can be expressed in the form of a network. Networks consist of nodes (genes) and edges (functional interactions between these genes). | |
| The phenomenon of different evolutionary outcomes being the outcome of independent mutation and selection events in different genes. For example, the occurrence of convergent evolution in diverging populations, both of which are caused by evolution in distinct genes. | |
| The phenomenon of adaptive change in allele frequencies of a population to natural selection, taking place within just a few generations. | |
| Traditionally defined as similar phenotypes evolving from similar selective pressure in response to similar environments [ | |
| A variable to estimate gene essentiality. The less dispensable a gene is for organismal growth and function, the more essential it is. An estimator for the mean fitness effect of all possible mutations of a gene across environments the cell is likely to encounter. In yeast, this is experimentally determined through knockouts. | |
| Traditionally defined as one gene influencing more than one trait. In the papers cited in this study, has been defined as gene products with more than one functional interactions with other gene products, with the link to pleiotropy of phenotypic traits being implied. It is therefore called “gene pleiotropy”. | |
| The amount of mRNA produced by each gene in regular somatic cells. CAI (Codon Adaptation Index) is used as a substitute variable in this paper, and is derived from codon use bias in yeast that correlates with mRNA levels. | |
| The ratio of nonsynonymous to synonymous substitutions dN/dS. It is assumed that dS remains constant, and dN is used here as an estimator for directional evolution. | |
| A score developed for estimating events of rewiring functional connections between network nodes over the course of evolution. Developed on the example of five species of yeasts. | |
| A network statistic used to describe the structure of a functional genetic network. Describes the number of connections of all neighbors of each node. Highest values are expected in intermediately located nodes within a network. | |
| A network statistic used to describe the structure of a functional genetic network, describing where a node lies within paths between other nodes. Nodes with many paths progressing through them may be important in transmitting information. Highest values are expected in nodes central to a network. | |
| A network statistic used to describe the structure of a functional genetic network. Shortest distance between a node and other nodes. Highest values are expected in peripheral nodes of a network. |
Fig. 1Examples for different levels of genetic constraint. Linkage is a transient constraint, which is broken up through recombination or other chromosome rearrangements. If a gene arises through duplication, phylogenetic constraint means that the function of its gene product may be non-independent with relation to the ancestral gene product. Codon constraint describes the likelihood of the different codon positions to produce beneficial mutations. Protein functional site constraint describes constraint located in genomic regions that code for functional sites of proteins versus other regions of the proteins. This is related to the idea that gene products form a functional genomic network. Within this network, interactions of these gene products also pose an element of constraint on evolution, but this is not well researched
Fig. 2Proposed testable relationship between functional genomic network architecture, network node position, and evolutionary outcomes. SN are subnetworks within the functional genomic network of a population with distinct functions (e.g., metabolic pathways). Standing genetic variation exists within nodes, but depends on their position within the network. Black nodes (H) are essential for organismal function and not likely to accumulate non-synonymous mutations; Grey nodes (I) are functionally connected with many others and constrained in accumulating non-synonymous mutations. White nodes (P) are functionally connected to fewest others and most likely to accumulate non-synonymous mutations. Resulting from this, three evolutionary outcomes can be explained: Rapid adaptation is facilitated in white nodes through their high standing genetic variation. Selection being constrained to operate on these nodes in a specific subnetwork increases the speed of adaptation. Convergent evolution is facilitated through the finite number of networks that are related to specific functions and shared among species through common ancestry. The likelihood of convergent evolution within one subnetwork in response to selection increases through the moderate level of genetic variance, combined with constraint posed by the high number of connections to other nodes. Genic evolution is facilitated through the selection pressure only having an effect in the subnetwork with organismal functions related to it but not in others. Selection is likely to operate on standing genetic variation, which is likely concentrated in white nodes (shown as blue squares). These different processes can explain the coexistence of convergent and divergent (rapid, genic) evolution within the genomes of a population
Hypotheses relating network constraint to evolutionary outcomes and results of hypothesis assessment using a node classification scheme in yeast
| Evolutionary outcome | Hypothesis (H) | Alternative Hypothesis (HA) | Results in this paper following assessment with hierarchical node classification scheme. |
|---|---|---|---|
| Indispensable or essential genes are more constrained and evolve slowly [ | Functionally important and thus functionally constrained genes evolve slowly, independent of dispensability [ Highly expressed genes evolve slowest [ | ||
Central nodes have the highest number of edges; evolve very slowly because any change will lead to maladaptive pleiotropic effects - causing balancing selection through cost of complexity. [ | Intermediate nodes evolve fastest as their higher number of edges allows for evolution through rewiring [ | ||
| Nodes with a low number of edges evolve fastest due to higher degrees of freedom, which allows for genetic adaptations minimizing pleiotropic effects [ | – | ||
| Nodes with a low number of edges should be the prime target of convergent evolution. Pleiotropic negative effects are expected to be low, and mutations in them can maximize adaptation [ | Peripheral nodes have the highest degrees of freedom and thus divergence is more likely than convergence in them. Convergent evolution should instead be favored in nodes that allow for genetic variance, while having reduced degrees of freedom (I-nodes) (This contribution). | ||
| Adaptations can be characterized (either causative or correlative for the speciation process) by any number of divergent genes within the genome, whereas other genes are not associated with adaptation [ | Only the complete phenotype is selected, the genic component is less important [ |
Fig. 3Distribution of yeast interactome nodes within network parameter space (neighborhood connectivity, average shortest path length, and betweenness centrality). The top values for each axis are colored in shades of red (light, filled: P-nodes; light, open: I-nodes; dark, filled: H-nodes). Convergent evolution nodes are indicated in dark blue. These top values for each axis formed the basis to classify the remaining nodes based on discriminant function analysis
Discriminant function analysis summary to assign node categories H, I, P to nodes within dataset. Wilks’ Lambda: 0.0704 approx. F (6,2152) = 992.780 p < 0.001
| Wilks Lambda | Partial Lambda | F-remove 21,076 | Toler. | 1-Toler. (R-sqr.) | ||
|---|---|---|---|---|---|---|
| Neighborhood connectivity | 0.137 | 0.514 | 507.835 | < 0.001 | 0.988 | 0.012 |
| Betweenness centrality | 0.105 | 0.673 | 261.039 | < 0.001 | 0.994 | 0.006 |
| Average shortest path length | 0.133 | 0.528 | 480.907 | < 0.001 | 0.983 | 0.017 |
List of yeast genes that were found to adapt to novel environments, and were additionally shown to evolve these adaptations convergently across populations or species of yeast. Node hierarchy categories after discriminant function analysis (DFA) are shown in the first column. P - peripheral nodes, I - intermediate nodes
| DFA estimated Node hierarchy | Gene symbol | ORF ID | Reference |
|---|---|---|---|
| STE11 | YLR362W | Lang et al., 2013 [ | |
| STE12 | YHR084W | Lang et al., 2013 [ | |
| STE4 | YOR212W | Lang et al., 2013 [ | |
| KRE6 | YPR159W | Lang et al., 2013 [ | |
| SFL1 | YOR140W | Lang et al., 2013 [ | |
| STE5 | YDR103W | Lang et al., 2013 [ | |
| ANP1 | YEL036C | Lang et al., 2013 [ | |
| GCN1 | YGL195W | Lang et al., 2013 [ | |
| ERG5 | YMR015C | Gerstein et al., 2012 [ | |
| ERG7 | YHR072W | Gerstein et al., 2012 [ | |
| CNE1 | YAL058W | Lang et al., 2013 [ | |
| GPB1 | YOR371C | Lang et al., 2013 [ | |
| KEG1 | YFR042W | Lang et al., 2013 [ | |
| KRE5 | YOR336W | Lang et al., 2013 [ | |
| TOH1 | YJL171C | Lang et al., 2013 [ | |
| SUL4 | YBR294W | Gresham et al. 2008 [ | |
| GAL3 | YDR009W | Hittinger et al., 2004 [ | |
| GIN4 | YDR507C | Gresham et al. 2008 [ | |
| PDR1 | YGL013C | Anderson et al. 2003 [ | |
| SGF73 | YGL066W | Gresham et al. 2008 [ | |
| SET4 | YJL105W | Lang et al., 2013 [ | |
| SIR1 | YKR101W | Gresham et al. 2008 [ | |
| ACE2 | YLR131C | Lang et al., 2013 [ | |
| GAS1 | YMR307W | Lang et al., 2013 [ | |
| WHI2 | YOR043W | Lang et al., 2013 [ | |
| CKA2 | YOR061W | Gresham et al. 2008 [ |
Fig. 4Visualization of node classification scheme in yeast interactome. Values of a) average shortest path length, b) neighborhood connectivity, and c) betweenness centrality within the yeast interactome (left panels), and values for the DFA-derived hierarchical node categories P, I and H, and for nodes known to be under convergent evolution in yeasts (C, N = 18). The small inset network shows the location of convergently evolved genes (C-nodes) within the interactome (yellow nodes)
Multivariate Wilks tests of significance and powers for network parameters to explain protein evolutionary rate (ω), gene expression (Codon Adaptation Index CAI), and evolutionary rewiring between species of yeast (γ). All predictors were significant
| Wilks’ Lambda | F | Effect df | Error df | p | Observed power (alpha) | |
|---|---|---|---|---|---|---|
| Intercept | 0.317 | 1569.597 | 3 | 2188 | < 0.001 | 1.000 |
| Neighborhood connectivity | 0.924 | 59.892 | 3 | 2188 | < 0.001 | 1.000 |
| Betweenness centrality | 0.995 | 3.931 | 3 | 2188 | 0.008 | 0.832 |
| Average shortest path length | 0.961 | 29.553 | 3 | 2188 | < 0.001 | 1.000 |
Fig. 5Relationship between hierarchical node structure of yeast interactome and evolutionary parameters. Node types are designated as peripheral (P), intermediate (I), or hub (H) based on discriminant function analysis, and nodes that were found to evolve convergently (C; N = 21) in yeasts. Three evolutionary outcomes (a) substitution rate, (b) expression level, approximated through Codon Adaptation Index (CAI), and (c) evolutionary rewiring score significantly differ among node categories (see text). C-node boxes are sorted by Median. Double red line: outliers above median not shown in figure but included in tests. Raw data points - triangles, circles - outliers, stars - extreme values, squares - Medians, boxes - 25-75% data, whiskers - non-outlier range