Literature DB >> 24981969

A structural perspective of compensatory evolution.

Dmitry N Ivankov¹, Alexei V Finkelstein², Fyodor A Kondrashov³.

Abstract

The study of molecular evolution is important because it reveals how protein functions emerge and evolve. Recently, several types of studies indicated that substitutions in molecular evolution occur in a compensatory manner, whereby the occurrence of a substitution depends on the amino acid residues at other sites. However, a molecular or structural basis behind the compensation often remains obscure. Here, we review studies on the interface of structural biology and molecular evolution that revealed novel aspects of compensatory evolution. In many cases structural studies benefit from evolutionary data while structural data often add a functional dimension to the study of molecular evolution.

Entities: Chemical

Mesh：

Substances：
Proteins

Year: 2014 PMID： 24981969 PMCID： PMC4141909 DOI： 10.1016/j.sbi.2014.05.004

Source DB: PubMed Journal: Curr Opin Struct Biol ISSN： 0959-440X Impact factor: 6.809

Current Opinion in Structural Biology 2014, 26:104–112 This review comes from a themed issue on Sequences and topology Edited by L Aravind and Christine A Orengo For a complete overview see the and the Available online 28th June 2014 0959-440X/© 2014 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/).

Introduction

Efforts in the field of molecular evolution have recently focused on the role of epistasis among fixed substitutions (see [1,2]). Epistasis, a phenomenon when fitness is influenced by the interaction of amino acid residues at different sites, has been suggested to be prevalent and playing a role in determining the fate of which substitutions occur in evolution. Such interdependence of amino acids bears strong resemblance to the nature of protein structures. In principle, the amino acid sequence of a protein is sufficient to determine the tertiary structure [3] by specific amino acid interactions. Here, we review recent cases when evolutionary interaction between amino acid substitutions has been linked with their joint, non-independent impact on the protein structure. For the structural biologist we provide a brief introduction to the different evolutionary mechanisms that have been postulated to describe compensatory evolution. For the evolutionary biologist we provide the background on physical interactions in protein structures.

Mechanisms of compensatory evolution

Substitution of interdependent sites may proceed via several different evolutionary mechanisms. The first one, first described by the evolutionary geneticists Dobzhansky and Muller [4] considers substitutions occurring and achieving fixation in the population independently. In a simplified, haploid model with two loci (A and B) and two alleles (0 and 1) the Dobzhansky–Muller model holds three of the possible genotypes to be fit (00, 10 and 11, at loci A and B, respectively) with one genotype conferring low fitness (01). If evolution cannot proceed through the accumulation of deleterious events then considering the evolution of the genotype from 00 to 11 it is evident that the 0 -> 1 substitution at site A must occur first with the substitution at site B being dependent on that event. The substitution in site A is, therefore, permissive, in that it allows for evolution at site B. The second evolutionary mechanism rejects the assumption that deleterious substitutions cannot occur and reverses the order of substitutions described in the previous example. Under that scenario the conditionally deleterious 0 -> 1 substitution at site B is fixed first. Then, the 0 -> 1 substitution occurs at site A, such that this substitution is compensatory sensu stricto in that it reverses the deleterious effects of a previously fixed substitution. This scenario is unlikely to take place in nature due to the very low probability of fixation of a deleterious mutation [5]. It is conceivable, however, that the successive fixation of slightly deleterious alleles, those with a selection coefficient <1/(2N) (where N is the population size), may be compensated by a single compensatory substitution that negates the effects of the entire series of previous slightly deleterious substitutions. The large-effect compensatory substitution may thus be driven by positive selection. The third evolutionary scenario of compensatory evolution is based on population dynamics of the interacting sites leading to simultaneous fixation of two substitutions. Under a scenario when genotypes 00 and 11 confer high fitness and 01 and 10 confer low fitness, evolution of 00 -> 11 may occur when two mutations arise simultaneously in the same genotype, which then has a chance for fixation. Given the low rates of spontaneous mutation [6], a double mutation is thought to be so rare (10−14 to 10−20) as to prevent it from contributing substantially to molecular evolution. However, even a substantially deleterious mutation has a chance to segregate in the population, especially if the mutation is recessive [5]. The more likely mechanism of simultaneous fixation of interacting alleles is the acquisition of the second mutation in the haplotype that already possesses the first deleterious mutation that has yet to have been purged by selection [7]. What is the prevalence of these mechanisms in protein evolution remains unaddressed and is not discussed here. Furthermore, we thus use the term ‘compensatory evolution’ in the broad sense, meaning the evolution of any epistatically interacting amino acid substitutions without reference to the actual evolutionary mechanism behind their fixation.

Structural understanding of site interactions

Amino acid residues can interact in different ways providing for stabilizing physical interactions in proteins (see [8]). We list different types of physical interactions and describe which types of interactions can be broken and, potentially compensated, in protein structures (Table 1). In nucleic acid structures the interactions are straightforward with an exclusive role of hydrophobic interactions between nucleotide rings, commonly referred to as stacking interactions and hydrogen bonds responsible for Watson–Crick (WC) pairing. For ribozymes and riboswitches their three-dimensional structures are additionally stabilized by interactions that are more characteristic to proteins [9,10]. Specific interactions in proteins are rarer and usually stronger, while non-specific interactions are more numerous and subtle (Table 1).

Table 1

Typical physical interactions stabilizing protein structures.a

Type	Covalent/noncovalent	Occurrence	Abundance of interactions	Stabilizing free energy, kcal/M	Specificity	Comments
Van der Waals interactionsb,c	Noncovalent	All proteins	Numerous	For methyld and NH groups, and O, N and S atoms:In water: ∼0.0 to −0.05In vacuum: ∼−0.2 to −0.5	Nonspecific	Electrons oscillate around positively charged nucleus making atom an oscillating dipole. In two interacting atoms these oscillations are correlated and the atoms attract each other with energy proportional to r⁻⁶, r being the distance between atoms.
Hydrophobic interactionse	Noncovalent	All proteins	Numerous	For methyld groups:In water: ∼−0.3fIn vacuum: –	Less specific	The origin is that water molecules are partly constrained to minimize their interaction with hydrophobic (non-polar) groups in order to save hydrogen bonds.
Hydrogen bondse	Noncovalent	All proteins	Moderate	In water: ∼−1.5In vacuum: ∼−5	More specific	Electrostatic interactions between directed H-containing dipoles (–O⁻–H⁺ or –N⁻–H⁺ groups) and partially negatively charged O⁻ or N⁻ atoms.
Interaction of charged with uncharged groupse	Noncovalent	All proteins	Moderate	Unit charge interacting with methyl groupdAt the protein/water interface: ∼+0.1Inside the protein in water: ∼+1At the protein/vacuum interface or inside the protein in vacuum: ∼−1	Less specific	Electrostatic repulsion of charged Lys⁺, Arg⁺, His⁺, Asp⁻, Glu⁻ from weakly polarizable protein medium to more polarizable water, and attraction of these groups to weakly polarizable protein medium from non-polarizable vacuum
Salt bridgese of two charged atoms	Noncovalent	Most proteins	Few	At the protein/water interface: ∼−2At the protein/vacuum interface: ∼−40Inside the protein (water or vacuum): ∼−25	Specific	Electrostatic interaction between positively charged Lys⁺, Arg⁺ or His⁺ and negatively charged Asp⁻ or Glu⁻.
Coordinate bondse	Covalent	Metal-binding proteins	Very few	In water: ∼−6 and higherIn vacuum: very high (∼−100), as for usual covalent bond	Highly specific	One metal cation is coordinated by several (e.g. six) O and/or N atoms in the protein (or balanced by interacting H₂O molecules in water).
Disulfide bondse	Covalent	Mostly secreted proteins	Very few	Inside the cell: ∼0Outside the cell: very high(∼−100), as for usual covalent bond	Highly specific	Inside the cell a special enzyme and glutathiones make the formation of disulfide bonds reversible. Outside the cell disulfide bonds are fixed.

Data were compiled from or calculated after [8,70–73]. Covalent bonds (except for coordinate bonds and disulfides) are not included since they are the same in the native and unfolded protein structure and are canceled out. The strength of residue–residue and atom–atom contacts depends on the defined distance cutoff between interacting atoms. Usually, for VdW, hydrophobic interactions and interactions between charged and uncharged atoms the cutoff is defined as ≈4–8 Å (e.g. see [45,74]), ≈5 Å for salt bridges, ≈4 Å for H-bonds, ≈2.5 Å for coordinate and disulfide bonds. For in-vacuum interactions the stabilizing effect is energetic in nature, while for the in-water interactions the stabilizing free energy (i.e. mean force potentials) is mainly connected with entropy. Water is considered implicitly, as a medium rather than as particles.

Van der Waals interactions are the London dispersion forces, present in both the folded and unfolded state of the protein. In the folded state many interactions are between amino acid residues (and with water molecules at the surface). In the unfolded state the interactions are mostly between amino acid residues and the surrounding water molecules.

Van der Waals interactions are ‘sometimes used loosely for the totality of nonspecific attractive or repulsive intermolecular forces’ [72].

For aromatic rings, the strength of interaction is approximately twofold larger.

The stabilizing effect in water is mostly entropic in nature [8], which means that when a protein is folded, the entropy of water molecules increases stabilizing the folded protein structure. At the same time the enthalpy of the ‘protein-water’ system remains relatively unchanged. The formation of disulfide bonds inside the cell is assisted by thiol-disulfide exchange, which increases entropy of glutathione molecules and preserves enthalpy of S–S bonds.

The free energy of hydrophobic interaction of nonpolar atoms with water was estimated as 45 cal/M for Å2 of the molecular surface (i.e. 20–25 cal/M for Å2 of the water-accessible surface area) [71]. An isolated methyl group (approximated by a sphere with a radius of ∼2 Å) can contact about twelve neighboring groups of the same dimensions and, thus, has about −0.3 kcal/M per contact as the cost of exclusion of water from the contact.

All of the information necessary to make a tertiary structure is present in the protein sequence [3]. Nevertheless, the prediction of the tertiary structure from sequence is an unresolved problem in theoretical biophysics for a vast majority of proteins [11]. However, recent advances in computer hardware, software and algorithms have made a solution of the folding problem feasible for some very small proteins [11,12]. As computational power improved the insufficient accuracy of available force fields became the main obstacle for general solution of the protein (and RNA) folding [13]. A relatively old idea that has only recently been successfully applied is based on the usage of information from protein evolution to predict structural interactions. Using a multiple sequence alignment it is possible to identify amino acids that co-occur among homologous sequences. Amino acid substitutions that lead to co-occurring amino acid states are referred to as correlated substitutions, or, incorrectly from an evolutionary point of view, correlated mutations (e.g. [14]). Analysis of pair-wise correlated substitutions may be used similarly to an NMR analysis, where experimentally measured pair-wise interactions between amino acid residues are used as constraints to define protein tertiary structure [15]. If the underlying cause of correlated substitutions is the maintenance of close physical interactions in the tertiary structure in evolution [14,16] then such sites may be located in close proximity and the information of their co-evolution may be used to infer the tertiary structure of the aligned protein [17,18]. Alternatively, coordinated changes in multiple sequence alignment can be used for filtering and scoring predicted structures [19]. The first papers on the use of correlated substitutions for tertiary structure prediction appeared 20 years ago [14,20-22]. However, lack of sequence information and difficulties in distinguishing direct and indirect physical interactions initially prevented successful application of such methods. Recent improvements in the correlated substitution analysis resulted in the ability to identify co-evolving clusters of residues [23], to disentangle direct and indirect physical interactions [24,25] and, crucially, to predict structures of transmembrane proteins [18,26] and small globular proteins [17]. For a detailed review of the methods of using correlated substitutions for prediction of protein contacts and 3D protein structure see [27].

Classification of available approaches for the study of compensatory evolution

The interaction of amino acid residues in protein structures forms the basis of our understanding of structural biology yet our understanding of the contribution of such interactions to evolution is sparse (but see [28]). Recent approaches used to investigate instances or extent of compensatory evolution can be broadly classified into four categories. First, a reverse engineering approach which aims at reconstructing the sequence of substitutions that occurred in evolution in the laboratory. Second, a forward engineering approach which aims to find experimental compensations for a substitution that is suspected of being compensatory, regardless of whether or not the solutions found in the laboratory have taken place in nature. Third, an opportunistic approach that relates information on the phenotypic impact of specific changes in the genotype to the substitutions that occurred in evolution. Finally, an experimental evolution approach where compensatory interactions are detected among novel substitutions that occur in the confines of a laboratory experiment.

Reverse engineering

Reverse engineering is a powerful approach that has been used on several occasions to reveal the structural basis of adaptation and molecular evolution. The adaptation to different altitudes in deer mice populations through differences in hemoglobin-oxygen affinity [29] is among such concerted efforts. Hemoglobin in deer mice native to higher altitudes in the Rocky Mountains have higher affinity to oxygen [30], which leads to better performance under oxygen-deprived conditions [31]. In total twelve amino acid substitutions are known to separate the highland and the lowland populations, five in the α-hemoglobin exon 2 [32], three in the α-hemoglobin exon 3 and four in β-hemoglobin [33]. The amino acid substitutions within each of these groups are in almost perfect linkage disequilibrium, however, there is some recombination between the groups [29]. Structurally, each group of substitutions occurs near the oxygen-binding region of hemoglobin. The five substitutions in exon 2 of α-hemoglobin are located at or nearby the E-helix that directly interacts with the heme structure and one of the five amino acids is known to lead to an especially strong increase of binding affinity [32]. The three substitutions in exon 3 of α-hemoglobin are in the G-helix that is located on the interface of the α and β subunits. Interestingly, two out of four substitutions in the β-hemoglobin are located on the same E-helix, with one substitution occurring in the same site, and two more on the H-helix, which neighbors the E-helix and participates in the α–β subunit interface [30]. Because the substitutions within a group are in linkage disequilibrium the amino acid differences within one group are mixed in the population together. Therefore, to understand the population dynamics of interbreeding between the high (H) and low (L) altitude populations recombination of the groups of substitutions is sufficient [29]. All eight H–L combinations were created, from all amino acid states corresponding to the low altitude populations (LLL, the letters representing the amino acid sets found in α-hemoglobin exon 2, α-hemoglobin exon 3 and β-hemoglobin, respectively) to all states matching the high altitude populations (HHH). The oxygen binding affinity of different combinations was epistatic, such that the HHH, LLL and LHH combinations showed the highest affinity while for all other combinations lower values were reported. The worst performers were the LLH and HHL combinations, those that combined the low altitude amino acid states in α-hemoglobin with the high altitude states in β-hemoglobin and vice versa. Hemoglobin is a tetramer with two α and two β subunits. Given that many of the adaptive substitutions appear in the helices near the subunit interface [30,32] it is perhaps not surprising that there is a compensatory interaction between the groups of substitutions. A structural analysis of the α–β interface reveals that three of the L -> H substitutions, αH50P, αL113H and βS128A, lead to three hydrogen bond differences. Specifically, the LLL structure has a hydrogen bond between the H-helix of β-hemoglobin and the B-helix of the α-hemoglobin (β128S-α34C, respectively). One other hydrogen bond is present in α-hemoglobin between the H-helix and the loop structure right next to the E-helix (α50H-α30E). This interaction affects the orientation of the E-helix, which favors the oxygen binding affinity of the entire structure [29]. The HHH structure lacks both of these hydrogen bonds (β128A-α34C and α50P-α30E) but has a hydrogen bond between the B-helices and G-helices (α113H-α24Y vs α113L-α24Y in the L population). Interestingly, in all three instances only one of the two interacting amino acids is changed between the high and the low altitude populations. Therefore, the epistasis between the three groups of amino acid substitutions is mostly due to allosteric compensatory interactions that affect the interaction of the α-hemoglobin and β-hemoglobin subunits in tertiary structure. The large number of substitutions between the two populations made it impractical to investigate all pair-wise combinations of alleles, implying that the order in which the adaptive substitutions were accumulated may not be resolved at this time. The lack of a phylogenetic resolution to determine the exact order in which substitutions happen have been hindering other researchers as well. The reverse engineering approach has been used to great effect in the study of glucocorticoid receptor evolution in the early chordates [34,35]. In a series of studies the authors were focusing on a phylogenetic branch with 37 amino acid changes, which, as in the case with the deer mice, had too many combinations of individual changes to be surveyed simultaneously. As a result the authors could not identify all of the compensatory interactions initially [34] and only a more thorough search led to the identification of an important set of compensatory interactions among these 37 changes [35]. When the phylogenetic resolution of the order of amino acid substitutions is high enough, the reverse engineering approach has more power to study the evolutionary process in greater detail [36]. In this study [36], the authors took the sequence of influenza nucleoprotein from 1968 and taking advantage of the fact that the influenza sequence was available for all years between 1968 and 2007, they were able to reconstruct the sequence of substitutions with a reasonable precision solely using sequence data. In a few cases the order of a small number of substitutions along the sequence of events could not be reconciled. The 1968 and 2007 influenza nucleoprotein sequences used in the study [36] differed by only 39 amino acid substitutions. The authors first tested whether or not any of the amino acid states in the extant 2007 influenza are deleterious in isolation from other substitutions in the ancestral 1968 virus, finding three such instances. Having determined that the impact of these three mutations was independent of the genetic background beyond the nucleoprotein, the impact of these three substitutions was tested in different reconstructed ancestral states along the evolutionary trajectory from the 1968 to the 2007 virus. The time points in which the introduction the substitutions that are deleterious to the ancestral 1968 virus were tolerated were reasoned to contain the compensatory substitutions. Interestingly, the three substitutions deleterious to 1968 strain appeared to have occurred in close structural proximity to those mutations that negated their deleterious effect. Additionally, the three substitutions did not appear to directly affect the function of the protein but rather its stability with the compensatory mutations providing the compensatory increase in thermostability of the protein structure [36]. The study of recent virus evolution revealed other examples of compensatory interactions, including restoration of receptor-binding affinity in H1N1 influenza [37]. We predict that the reverse engineering approach will continue to be a powerful tool to study the evolutionary process, especially in instances when the order of amino acid substitutions can be resolved to a high degree of accuracy.

Forward engineering

When the number of combinations to test is too large a forward engineering approach can provide insights into the evolution of the system, as has been done recently in phosphoglycerate kinase, PGK [38]. PGK is one of the ATP-generating enzymes in glycolysis found in all living organisms and is essential for life [39]. The mechanism of action involves a hinge-bending motion that leads to domain closure once the substrate has bound to the active site. Several sites, including site 219 enumerated in the human sequence, play a crucial role in the domain closure process through the interaction with the substrate. Predictably, mutations in site 219 disrupt the function of the protein [40]. Site 219 is occupied by a Lysine in eukaryotes and bacteria; however, in the same site in archaea mostly Serine and Threonine are found [38]. In the human PGK sequence the K219S substitution results in loss of activity. Under the epistatic paradigm the Serine at site 219 allows the enzymes in Archaea to perform the same hinge-bending motion in the domain closure because other sites in the protein have a different amino acid residue and their interaction allows for Serine to perform the function. Introducing the K219S substitution in the human PGK sequence the laboratory evolution approach revealed several substitutions that contribute towards restoring wild type fitness [38]. Two of the substitutions, M239I and E403D, were a constant reproduced feature in the experiments. Only one of these sites (239) is in physical proximity to site 219 (3.5 Å), suggesting a direct structural compensation, while site 403 is distant, ∼9 Å suggesting an allosteric interaction. Although the residue at site 403 is not located in the vicinity of site 219 or the active site, it is known to play a role in inter-domain motions that relate to substrate binding [41]. Furthermore, amino acid states in site 403 appear to be correlated with those at site 219, with 535 out of 547 sequences that have L219 also have E403. Similarly, the amino acid states at site 239 are also correlated with those at site 219; however, the correlation does not appear to be as perfect as for 403 and 219. Taken together, these data suggest that the forward engineering approach that is blind to actual evolutionary trajectories nevertheless has the power to recapitulate solutions utilized in evolution. Such a situation may be common, especially if the number of available evolutionary trajectories in evolution is limited [42].

Opportunistic approach

The opportunistic approach has been widely used to relate information from disease mutations to that of the genotypes of non-human species. Many instances have been described when a mutation that causes disease in humans is found in other species without the disease manifestations [16,43,44,45]. In such instances the disease mutation is said to be compensated by other genetic changes in the species that harbor the human disease state. The general hypothesis proposed for such compensations is the maintenance of thermostability [16,47]. Compensatory evolution for the maintenance of thermostability is best documented in mt-tRNAs, which we will use as the leading example. The underlying structural mechanisms in proteins are the same (see e.g. [45]). Mitochondrial tRNAs harbor a disproportional number of pathogenic mutations that typically lead to muscular or neurological disorders [48]. A majority of the pathogenic mutations occur in stems of the mt-tRNA structure (Figure 1a), which typically reduce the thermostability of the RNA tertiary, and especially secondary structure [49]. When the disease mutations are mapped against the sequences found in other species compensatory mutations are typically found that restore the loss of thermostability incurred by the disease mutation [43]. Most cases of structural compensation involve a WC interaction in the double helix of the stem structure (Figure 1b); however, in some instances the loss of thermostability in one WC pair can be compensated by the increase of thermostability in another pair in the same stem (Figure 1c). Examples of compensations beyond the same stem-loop structure in tRNAs have not been described to our knowledge.

Figure 1

Thermostability compensation in mammalian mitochondrial tRNAs. (a) The secondary structure of the human mt-tRNATrp with the known pathogenic mutations from Mitomap [68] shown in red. All instances when the disease mutation is found to be compensated in another species, as judged by a secondary structure-based alignment [69] are shown in orange. (b) An example of two direct WC compensations in the secondary structure of the Philippine tarsier mt-tRNATrp. (c) An example of an allosteric compensation in the anticodon stem of the sei whale mt-tRNATrp.

Compensatory evolution in mt-tRNAs is prevalent [43,50] as is indicated by the observation that most pathogenic mutations can be found in a compensated state in some other vertebrate species with the exception of pathogenic mutations in the anticodon loop (Figure 1a). The maintenance of thermostability has also been hypothesized as the mechanism of evolution behind the compensation of human pathogenic mutations in proteins of other species [16,47]. However, structural integrity of proteins does not appear to be governed by simple secondary structure, as in the case of tRNAs, and examples of such compensations are more challenging to come by. In general, compensated disease mutations are more likely to have a milder effect on protein structure than an average disease mutation [44,45]. From a structural perspective the compensation of a mutation with a structurally mild effect is less likely to interrupt a specific interaction (Table 1), and, therefore, may be compensated by a greater diversity of other substitutions [23]. In contrast, a mutation that disrupts a strong specific interaction, such as a salt bridge or a disulfide bond, is likely to be compensated through a direct interaction in a similar manner as the WC compensations in the mt-tRNAs. Many of the compensated disease mutations harbor a substitution in the vicinity of the structure that may be recognized as the compensatory substitution [45]. Despite the co-localization of the compensatory interactions, at this point it is not clear whether or not these compensations are specific, such as those expected in strong interactions, or general, as may be expected for mutations of weak effects. Generally, mutations have a tendency to co-occur in time when they are located nearby in the protein structure [53] supporting the idea that compensation is a local structural phenomenon. However, only local interactions were assayed [44,45], in part due to the statistical challenges in the search for a signal of distant interactions. If general, non-specific interactions predominate then compensatory evolution is more likely to be driven by co-interacting substitutions in a complicated network [23]. In that case, much greater statistical power is necessary to detect such complex compensatory interactions than is typically available. Some examples of distant compensations have been described [38]; however, the overall extent of non-specific or allosteric interactions in compensatory evolution remains unclear.

Experimental evolution approach

A revolutionary set of experiments were performed by Poon and Chao [56]. The authors introduced several deleterious mutations one by one into the genome of DNA bacteriophage φX174 and then selected the mutant genomes on the Escherichia coli host to recover fitness. By sequencing the selected strains the authors were able to measure the fraction of selection by a reversion mutation versus a compensatory one. The experiments were largely ahead of their time as they allowed to address questions that are only now entering the mainstream. For example, modeling the frequency of emergence of compensatory mutations in the experiments the authors concluded that an average deleterious mutation can be compensated by ∼9 different substitutions [57]. This question is still relevant and far from being resolved today. Regarding structural considerations they showed that the compensation of at least one deleterious mutation in the φX174 procapsid closely interacts with three compensatory mutations. Similarly, a deleterious mutation affecting an α-helix important in the interaction of two proteins was compensated by several different mutations in or in the vicinity of the same α-helix. In general, the compensated and compensatory mutations were found to be nearby on the sequence of the genome suggesting a structural basis for the compensation. However, lack of further structural data prevented a more detailed analysis. The experimental approach goes beyond our intent to describe the role of structural interactions in evolution as it is not necessarily clear that the compensations observed in the laboratory adequately represent those that occur in nature. However, compensations observed in nature and in the laboratory may have a similar structural basis and certainly lessons from experimental evolution will enrich our understanding of evolution (e.g. [57]). In any case, it is clear that experimental evolution and the understanding of the principles of compensatory interactions has impacted the field of protein engineering [58] and function (e.g. [59]).

Conclusions

Despite the apparent connection between structural biology and molecular evolution, few studies work on the interface of these disciplines. Furthermore, those few that have been published so far tend to focus on specific examples rather than general issues. Nevertheless, several tentative conclusions may be articulated. It appears that local structural interactions supporting either thermostability or function of the protein play a major role in compensatory evolution. On the other hand, despite just a handful of examples of allosteric interactions playing a role in evolution the possibility that complex and distant structural interactions has a major evolutionary influence cannot be discarded at present. The statistical complications involved with a comprehensive search for allosteric interactions, especially when considering long-term evolution, prohibit from making a general conclusion. The issue of the relative importance of local versus allosteric interactions in evolution may be one of the next big questions for both molecular evolution and structural biology. The extent of compensatory changes in the course of evolution remains a controversial issue [1,2]. Further incorporation of a structural component to the study of molecular evolution (such as [45]) may help to resolve the standing controversy. Alternatively, the incorporation of an evolutionary approach towards structural studies may advance the understanding of structural biology. Specifically, the global interdependence of amino acid residue interaction may emerge as an important feature of protein structure evolution. Protein structures may be guiding compensatory evolution through inherent cascades of substitutions, whereby substitution at site A initiates a substitution at site B, which in turn influences a substitution at site C and so on. Therefore, site A may be interdependent with practically all sites in the protein although not directly but rather through cascades of site interdependencies. The relationship between the protein surface and core has been suggested to show such cascade properties [23], which concurs with the macroevolutionary observations of a complex, net-like fitness landscape [61]. Compensatory interactions in protein complexes [62-64], networks [65,66] or environmentally-dependent epistasis [37,67] may extend such cascades beyond single structures implying that everything in the cell, and in evolution, may be interdependent.

References and recommended reading

Papers of particular interest, published within the period of review, have been highlighted as: • of special interest •• of outstanding interest

69 in total

1. NMR studies of structure and function of biological macromolecules (Nobel Lecture).

Authors: Kurt Wüthrich
Journal: J Biomol NMR Date: 2003-09 Impact factor: 2.835

2. Development and testing of PFFSol1.1, a new polarizable atomic force field for calculation of molecular interactions in implicit water environment.

Authors: Leonid B Pereyaslavets; Alexey V Finkelstein
Journal: J Phys Chem B Date: 2012-04-09 Impact factor: 2.991

A structural perspective of compensatory evolution.

Introduction

Mechanisms of compensatory evolution

Structural understanding of site interactions

Classification of available approaches for the study of compensatory evolution

Reverse engineering

Forward engineering

Opportunistic approach

Experimental evolution approach

Conclusions

References and recommended reading

1. NMR studies of structure and function of biological macromolecules (Nobel Lecture).

2. Development and testing of PFFSol1.1, a new polarizable atomic force field for calculation of molecular interactions in implicit water environment.

Review 3. Mitochondrial DNA mutations in human disease.

4. Characterization of compensated mutations in terms of structural and physico-chemical properties.

5. Energy landscape of knotted protein folding.

Review 6. Emerging methods in protein co-evolution.

7. Can three-dimensional contacts in protein structures be predicted by analysis of correlated mutations?

8. Compensated pathogenic deviations: analysis of structural effects.

9. Widespread compensatory evolution conserves DNA-encoded nucleosome organization in yeast.

10. An enhanced MITOMAP with a global mtDNA mutational phylogeny.

Review 1. Changing preferences: deformation of single position amino acid fitness landscapes and evolution of proteins.

2. Biophysics of protein evolution and evolutionary protein biophysics.

3. Evolutionary pathways to SARS-CoV-2 resistance are opened and closed by epistasis acting on ACE2.

4. Local fitness and epistatic effects lead to distinct patterns of linkage disequilibrium in protein-coding genes.

5. Asymmetrical dose responses shape the evolutionary trade-off between antifungal resistance and nutrient use.

6. Gene loss and compensatory evolution promotes the emergence of morphological novelties in budding yeast.

7. Limits to Compensatory Mutations: Insights from Temperature-Sensitive Alleles.

Review 8. Causes of molecular convergence and parallelism in protein evolution.

Review 9. Applications of sequence coevolution in membrane protein biochemistry.

Review 10. Compensatory mutations and epistasis for protein function.