Literature DB >> 32282205

Bottom-Up Nonempirical Approach To Reducing Search Space in Enzyme Design Guided by Catalytic Fields.

Abstract

Currently developed protocols of theozyme design still lead to biocatalysts with much lower catalytic activity than enzymes existing in nature, and, so far, the only avenue of improvement was the in vitro laboratory-directed evolution (LDE) experiments. In this paper, we propose a different strategy based on "reversed" methodology of mutation prediction. Instead of common "top-down" approach, requiring numerous assumptions and vast computational effort, we argue for a "bottom-up" approach that is based on the catalytic fields derived directly from transition state and reactant complex wave functions. This enables direct one-step determination of the general quantitative angular characteristics of optimal catalytic site and simultaneously encompasses both the transition-state stabilization (TSS) and ground-state destabilization (GSD) effects. We further extend the static catalytic field approach by introducing a library of atomic multipoles for amino acid side-chain rotamers, which, together with the catalytic field, allow one to determine the optimal side-chain orientations of charged amino acids constituting the elusive structure of a preorganized catalytic environment. Obtained qualitative agreement with experimental LDE data for Kemp eliminase KE07 mutants validates the proposed procedure, yielding, in addition, a detailed insight into possible dynamic and epistatic effects.

Entities: Chemical Disease Gene Species

Mesh：

Substances：
Amino Acids
Lyases

Year: 2020 PMID： 32282205 PMCID： PMC7467639 DOI： 10.1021/acs.jctc.0c00139

Source DB: PubMed Journal: J Chem Theory Comput ISSN： 1549-9618 Impact factor: 6.006

Introduction

In recent years, intense research has been conducted in pursuit of harnessing the versatile protein capabilites for application in industrial biocatalysis. The major challenge arises from the fact that living organisms produce enzymes tailored for sustaining their own existence, rather than the needs of human society. Therefore, the key to the future of rational biocatalyst engineering lays in the redesigning of existing enzymes and, ultimately, de novo design with the aid of new computational methods.[1−4] Major breakthroughs in the theoretical design of enzymes for new, non-natural reactions (so-called “theozymes”) have occurred in the past two decades.[5−12] However, after initial success, the progress stalled,[13,14] because theozymes that were designed to maximize transition-state stabilization (TSS) displayed rather low catalytic activity and conventional top-down models requiring numerous assumptions were not able to fully explain the role of additional mutations in the second coordination sphere, introduced by LDE experiments.[13,14] Therefore, such LDE-mutated theozymes represent an excellent proving ground for testing new methodologies and interpreting the deficiencies of older methods. This problem was investigated by several research groups with various methods, including extensive molecular dynamics,[15,16] QM/MM,[17−19] and EVB[20−23] simulations. All previously mentioned methods belonging to the top-down category require consideriation of the entire protein and numerous assumptions regarding protonation states, QM/MM boundaries, always arbitrary selection of force field, etc. Although commonly used methods are able to explain some of the differences between designed and evolved enzymes, they are rather unsuitable for efficient scan over vast combinatorial space. Therefore, a method simple enough for high-throughput prescreening and yet deeply rooted in the physics of the problem without involving any empirical factors is needed to improve the existing protocols. The catalytic field technique that has been used since we developed it in our laboratory in 1981[24−26] has been recently supplemented by Dittner and Hartke with extensive optimization techniques, allowing one to generate and explore various characteristics of the optimal catalytic environment.[27,28] However, extensive brute force optimization of all variables involved would still make theoretical design of theozyme composed of several hundreds of amino acids extremely costly and impractical. We believe that systematic nonempirical analysis of the physical nature of interactions involved in the catalysis and development of additional techniques specific for enzymes presented in this contribution could lead to rational construction of simpler models useful in theozyme design. Other rare inverse catalyst design attempts that have been recently summarized in a review[29] are mostly based on artificial intelligence or data mining. Although such methods could be very useful in the automated scanning of vast chemical spaces, it would be surprising to obtain a deeper understanding of physical phenomena responsible for extraordinary catalytic activity of enzymes this way, while many essential structural details are missing in currently available databases. Recent experimental results that employ Stark effects for a specific band of infrared (IR) spectra for mutated enzymes indicate the electrostatic nature of KSI catalytic activity.[30] This permits one to monitor electric fields exerted by specific mutations on certain reactant bonds in top-down fashion, yielding valuable but still limited information.[16,31] This perspective is reversed here in bottom-up fashion by the use of a catalytic field derived from the reactant and transition state wave functions, yielding general characteristics of charge distribution of the optimal catalytic environment for every point in space outside reagents and enabling one to monitor lowering of activation barrier directly, as a result of amino acid mutations, side-chain rotations, rate-promoting vibrations, or proton transfers in hydrogen bond chains. Although the catalytic field technique has been derived in our laboratory[24−26] using the theory of intermolecular interactions, it could be also applied to systems involving intramolecular interactions. The ability to predict catalytic effects resulting from hydrogen to fluorine (H → F) substitutions has been illustrated for two simple model organic reactions,[32] which demonstrates qualitative applicability of this technique to model substrate-assisted catalysis.[33] The present paper constitutes the first application of a catalytic field approach to real case intermolecular catalysis. For this purpose, here, we have chosen the artificial Kemp eliminase KE07 and its mutants.[9] Among other Kemp eliminases based on different scaffolds, this system has the largest number of mutants with gradually increasing activities. Several KE07 mutants introduced by LDE experiments, which apparently were missed in earlier theozyme designs,[13,14] constitute a suitable benchmark to examine new biocatalyst design methodologies. The mutants obtained from LDE experiments differ mainly by the number of charged amino acids located in the second coordination sphere,[13,14] which may change conformation of its side chain upon docking with the substrate or nearby mutation.[34,35] The importance of considering amino acid side-chain rotamers in biocatalysis has been recently demonstrated for Kemp eliminase[36] and triosephosphate isomerase mutants.[37] A more general catalytic field technique,[25,26] considering simultaneously both TSS[8,9] and ground-state destabilization,[23] combined with the atomic multipole rotamer library scan, allows one to obtain valuable insight into active site preorganization. This contribution also constitutes the first attempt to test and validate the catalytic field technique and apply it to explore possible mutation nonadditivity, as well as dynamic and epistatic effects in biocatalysis resulting from side-chain rotations.

Methods

Here, we present a bottom-up strategy based on a concept of catalytic fields derived from the DTSS approach.[25,26] Within this ab initio-based framework, activation barrier lowering is partitioned into intrinsic (vacuum in the original paper[25]) contribution of reactants and the difference of enzyme-transition state(TS) and enzyme–substrate(S) interaction energies EDTSS. Whenever the DTSS approach is applied to enzyme reactions, the term “substrate” refers to the ground state (alternatively, the reactant complex or Michaelis complex). In principle, the DTSS approach is more general and could be applied to other catalytic systems, such as zeolites.[38] Technically speaking, the “intrinsic” contribution is a barrier calculated for reaction path coordinates as they are in catalysts, but without inclusion of molecular environment in quantum mechanics (QM) calculation. Despite the virtual character of this reference state and the DTSS energy for a single catalyst, it can be used as the reference in quantitative comparison of a sequence of mutants of the same enzyme. Provided that mutation does not change the reaction mechanism, taking the difference between mutant and wild-type (WT) DTSS energies cancels out the “intrinsic” contribution, leaving a value corresponding to the difference of reaction barriers of mutated and WT enzyme (see Figure ):

Figure 1

Differential transition state stabilization (DTSS) energy EDTSS ≡ EIE(TS) – EIE(S), as a measure of catalytic efficiency; under the assumption that mutation does not change reaction mechanism, the difference EDTSSmut – EDTSSWT corresponds to the difference of activation barriers Bmut – BWT. Furthermore, DTSS could be partitioned into the electrostatic multipole ΔEEL,MTP, electrostatic penetration ΔEEL,PEN exchange ΔEEX, delocalization (induction) ΔEDEL, and correlation (dispersion) ΔECORR terms defined within symmetry-adapted perturbation or hybrid variation-perturbation theories of intermolecular interactions.[39,40] This provides not only insight into the physical nature of catalytic interactions induced upon mutation, but it also allows one to derive, in a systematic way, more approximate methods based on leading terms. Recent experimental results[30] and ab initio analysis of several enzyme systems[26,41,42] indicate clearly the dominant contribution of electrostatic term ΔEEL and, in particular, its long-range multipolar electrostatic component ΔEEL,MTP accurately reflecting, at an atomic level, the molecular charge redistribution during the progress of the reaction.[43] The electrostatic multipolar DTSS term,[26] constituting a more-general approach, represents TSS[44] and, at the same time, ground-state destabilization (GSD)[23] effects, postulated earlier as the important contributions to enzyme catalysis. Our recent analysis of several mutants of ketosteroid isomerase[45] validated the utility of simple model, based on cumulative atomic multipole expansion (CAMM),[46] in agreement with recent experimental data,[30] confirming the electrostatic nature of catalytic activity.

Catalytic Fields

Further consideration of electrostatic component of DTSS energy leads to the static catalytic field Δ(r⃗), which is defined[25,26] as the difference between electrostatic potentials V(r⃗) of TS and substrate S (see eq ). Its value denotes activation barrier change resulting from the presence of unit point charge in any location of catalytic site outside reactants. Because of the additive nature of electrostatic interactions, catalytic field values obtained in one computational step constitute the solution of inverse catalysis problem in the form of Δ(r⃗), i.e., complementary charge distribution of optimal molecular environment q(r⃗).[25,26] Therefore, a catalytic field may serve as a general guide for rational selection of mutations or side chain orientations or rate-promoting vibrations by providing clues about optimal static and dynamic charge distribution of catalytic environment reducing the activation barrier. The values of Δ(r⃗) could be rapidly calculated from CAMM derived directly from transition-state and substrate quantum-chemical wave functions.[46] Therefore, calculations involving catalytic field scale as O(NR · NA · Nrot) only, where NR and NA indicate the number of atoms in reactants and considered mutated amino acid residue, respectively. Nrot denotes the average number of rotamers per mutation site (typically a few dozens). In the case when charged amino acid is represented by single unit charge located at the charge gravity center (for example, located near the terminal side chain atom of lysine or arginine), scaling could be further reduced to O(NR · Nrot), which must be calculated only once for the mutation site. The validity of such approximation is illustrated in Figure .

Figure 2

Activation energy changes obtained using cumulative atomic multipole moments DTSS(CAMM) versus products of catalytic fields Δ and formal entire amino acid charges – qf located at its side-chain charge gravity center (Arg- guanidyl carbon, Lys- 6N, Asp- Cγ, Glu- Cδ). This way, the search space in biocatalyst design is considerably reduced within purely nonempirical approach. The use of atomic multipoles is essential to represent the anisotropy of molecular charge distribution properly at the atomic level, in particular for transition states, where bonds are formed or broken.[43] In addition, always arbitrary atomic charge definitions are compensated by the use of higher-order atomic multipoles, considerably reducing the basis set dependency of atomic charges.[46] Note that Δ gradient field defined as dynamic catalytic field Δ, indicates direction of the favorable unit probe charge q(r⃗) movement coupled with the reaction coordinate and resulting in additional decrease of the activation barrier.[47] Such coupling may involve proton dislocations in hydrogen-bonded networks. Recent application of the dynamic electrostatic catalytic field (DECF) approach allowed us to determine that each of two steps of reaction proceeding in ketosteroid isomerase (KSI) is catalyzed by concerted proton dislocations proceeding in reverse directions in the network of six hydrogen bonds.[48] Therefore, observed exceptional KSI catalytic activity could be related to ultrafast proton switching.[48]

Exploration of Rotamer Space

In this study, we will concentrate on some unexplained mutations introduced by LDE into artificial Kemp eliminase, which are not in direct contact with the substrate.[13,14] According to the aformentioned assumption about second-coordination sphere mutations (that is, their negligible effect on the reaction mechanism), the following calculations base on transition state obtained for KE07 enzyme with Gaussian09[49] ONIOM implementation[50] (details are provided in section S2 of the Supporting Information). The QM region of that simulation, consisting of 5-nitrobenzoixazole (5NBI) and acetate representing part of Glu101 residue, was superimposed on 5NBI-enzyme complexes of the mutants. The space of possible amino acid conformations was explored in two ways. In the first, conventional molecular dynamics (MD) has been used with solvent effects modeled within the Molecular Mechanics Poisson–Boltzmann Surface Area (MMPBSA) model,[51] which is regarded as a reliable free-energy simulation method to investigate protein–ligand interactions. This provides a baseline representing a commonly used approach to the problem. In the second one, the library of atomic multipoles for amino acid side chain rotamers has been scanned using catalytic fields, providing a proof of concept of a proposed approach (note that, with proper implementation, this involves much less computational time than MD simulation and avoids one having to always use arbitrary force fields).

MMPBSA Approach

First, 30-ns MD trajectories were prepared for each of eight mutants, followed by MMPBSA calculation with the usage of the gMMPBSA program[51] MD simulations were performed in Gromacs 4.6.[52] For the sake of consistency with ONIOM calculations, Amber94 force field with TIP3 water model was used. Mutant structures were generated from KE07 by a homemade script, according to published data[9](see Table S1 in the Supporting Information). All structures were submerged in water box at least 1 nm from the protein atoms, with Na and Cl ions for charge neutralization. In all simulations, periodic boundary conditions and Particle Mesh Ewald long-range electrostatics (1 nm cutoff) were used. For MD simulations, a leapfrog algorithm was used, and the temperature and pressure were set to 300 K and 1 bar, respectively. After initial steepest descent minimization, NVT and NPT equilibration stages were performed, each one 200 ps long. In this phase, a modified Berendsen thermostat and a Parinello-Rahman isotropic barostat were chosen. In subsequent 30 ns production run, distance restraints between Glu101 OE2 and ligand C3 and H3 atoms were used, in order to prevent random dissociation of the complex. Then, trajectories ware used in MMPBSA calculations done with gMMPBSA program.[51] MD simulations performed for KE07 and its mutants concerned only the structure of the enzyme–substrate complex. It was assumed that the reaction event, when it occurs, is much faster than conformational changes of the surrounding amino acid residues. Therefore, these simulations were supposed to provide an ensemble of protein conformations rather than to simulate the reaction itself. Furthermore, the application of catalytic field Δ could be tested in this way. Taking advantage of the minor changes in geometry between reactant and TS (see Figure S3b in the Supporting Information) and keeping the assumption mentioned above, the same trajectory was assumed for TS. Different energies were obtained by substitution of charges assigned to the “QM region” (as defined in ONIOM calculations) in the topology files. Since g_mmpbsa evaluates electrostatic and implicit solvation contributions to binding energy (we included only electrostatic components), DTSS energy with implicit solvation was obtained by putting differential charges (the difference between each atomic charge in TS and the reactant), which effectively are an atomic monopole representation of Δ. Atomic charges for RS and TS were calculated using the Merz–Kollman scheme[53] in the same basis as that used in ONIOM calculation. The contribution of QM region to solvation energy was subtracted from the result, since this method of calculation provides incorrect value of this term; on the other hand, it may be assumed to be negligible or at least constant among mutants (for a more-detailed derivation, see Section S3 of the Supporting Information).

Scanning the CAMM Library of Rotamers

As the major approach in this study, we used the Dynameonics Rotamer Database (backbone-independent, updated in 2015)[54] to prepare a library of CAMM for each amino acid rotamer. All CAMMs were calculated with GAMESS (US)[55] with HF method and 6-311G(3d,2p) basis set. CAMMs for the QM layer in both RS and TS states were obtained separately. The CAMM expansion was then used in the calculation of catalytic fields and interaction energies. For each mutation, rotamers with the best and worst TSS/DTSS contribution were determined (hereafter, we will refer to the rotamer with the best DTSS contribution as optimal). Finally, we explored the configuration space resulting from combining rotamers at different sites (mathematically speaking, a Cartesian product of individual site rotamer spaces in a given mutant). In order to test the importance of solvation effects on electrostatic interaction energies, we examine the version of library incorporating Generalized Born (GB) model, where the solvation energy is modeled according to eq . The Born radii were calculated using bornRadius software developed by Onufriev’s group, utilizing an improved algorithm described in refs (56 and 57). The atomic charges for amino acid residues were taken from Amber94 force field, whereas charges on the QM region were calculated with the Merz–Kollman method for the HF/6-311G(3d,2p) wavefunction.

Algorithms for the Exploration of Rotamer Configurational Space

Since the number of rotamer combinations grows exponentialy with the number of amino acids, it is important to restrict the size search space, especially in the early steps of calculation. In particular, elimination of rotamers that are unlikely to participate in final combination is of great importance. In our experiments, we found that the average number of rotamers per site is in the range of 60–80. Thus, elimination of highly improbable rotamer (e.g., having short contact with the protein backbone) reduces the size of search space by 70(N – 1) configurations (with N being the number of included residues). Here, two criteria were applied: (i) geometric and (ii) energetic. The first one basically involves exclusion of rotamers which side chains have close contanct (≤1 Å) with protein backbone (neglecting, obviously, contacts corresponding to chemical bonds). The second one is the well-known Goldstein criterion for singles elimination.[58]Here, subscripts k,l ∈ [1,N] denote the position in the amino acid sequence, whereas superscripts “A” and “B” indicate the rotamer at a particular site. However, the value ϵ in the original work was equal to zero; by choosing a positive nonzero value, one can “soften” the criterion and thus obtain a larger final space. Such an approach may be useful when one is interested in a set of low-energy conformers, which may be particularly desirable when considering dynamic effects. The search algorithm generally can be described by the following steps. Loading of the amino acid rotamers into specified positions in a protein structure. At this stage, criterion (i) is applied. Calculation of an interaction energy table. Elimination of the rotamers according to different criteria (e.g., criterion (ii)). Systematic search over resulting space. Throughout the presented work, we refer to the method without additional elimination criteria in point 3 as “multiscan”. The second algorithm tested, denoted as Dead End Elimination (DEE), involves criterion (ii) with negligence of the rotamer self-energy (E(r)) and ϵ set to either 0.0 or 0.005 hartree. The performance of both approaches was tested on P6 mutant represented by five charged residues: Asp7, Glu19, Lys146, Arg202, and Asp224. A potential bottleneck may be the calculation of CAMM interaction energies (which can be several times slower than that observed for the point-charge model) or loading CAMM from library (since the rotation of large multipole arrays may be computationally expensive). Therefore, we compared the performance of the RESP model (Table S1 in the Supporting Information) and CAMM (Table S2 in the Supporting Information). We ensured that both approaches lead to the same rotamer configuration. Two major observations can be drawn from the presented results. Elimination of rotamers according to Goldstein criterion reduces the search space by 2–4 orders of magnitude, depending on the assumed threshold. This, in turn, leads to a similar reduction of time spent on scanning the rotamer space. Application of CAMM mutipoles leads to an ∼2-fold increase of loading time (due to the required transformation of multipole arrays) and an ∼6-fold increase in the time spent on the computation of interaction energy table. In fact, this time is comparable to that required to scan the entire (unreduced) rotamer space. However, for larger sets of amino acid residues, the problems with application of CAMM expansion can be tackled by initial reduction of search space, based on atomic charges, maybe with a nonzero epsilon. This should allow one to avoid situations when inclusion of the charge anisotropy leads to different configurations. Furthermore, there are at least two possibilities, with regard to further improvement of the presented algorithm: The elimination of unlikely pairs from the search space (analogous to the Goldstein criterion for singles). Further approximation of the energy table. For larger rotamer sets, this could become a serious bottleneck, since its full preparation is required in all presented algorithms. However, for residues distant in space, there should be a little difference in interaction energy between their different rotamers (in other words, the rotation of side chains should not change the interaction energy). In such a case, an entire block of the interaction energy table could be filled at once.

Flowchart of Computational Procedure

(A) Structure preparation Preparation of CAMM amino acid library (HF/6-31G*); structures of rotamers taken from the Dynameonics database[54] Calculation of reaction path in KE07 (ONIOM B3LYP/6-311+G**:Amber94) MD simulations of KE07 with distance restraint put on hydrogen bonds between Asp and 5-nitrobenzoisoxazole (5NBI) Clusterization of production trajectory with g-clust algorithm (available in Gromacs). The main cluster was then used for analysis of the conformational space (B) Exploration of rotamer space Only amino acid residues that varied in directed evolution experiments[9] were included. Rotamers from the CAMM library were loaded into these positions, excluding those having short contacts with either backbone or ligand (short-contact is defined as the shortest interatomic distance below 1.25 Å) Evaluation of interaction energy table in either the GB model (see below) or with the corresponding point charge model without solvation Elimination of pairs of rotamers (of different residues) yielding short contacts Search of the lowest energy configuration with different algorithms brute-force algorithm (direct iteraton over all possible configurations) (referred to as “multiscan”) Dead End Elimination, DEE (the elimination of rotamers, according to Goldstein criterion for singles.[58] Mean-Field algorithm.[59] Evaluation of DTSS(CAMM) or DTSS(GB) energy for the found configurations

Results

The Catalytic Field and Optimal Locations of Crucial Amino Acids

The three charged residues—Asp7, Lys146, and Arg202, conserved in directed evolution[9] (see Table S1)—seem to play the most important role. What’s even more remarkable, these findings are consistent with the catalytic field corresponding to this reaction. As can be seen in Figure , charged amino acids conserved in directed evolution reside exactly where they should be located according to catalytic field Δ, and the range of their possible rotamers fits the corresponding “basins” well. The position of Asp7 on the edge between the two areas of Δ provides a straightforward exploration of both its low contribution and range varying from positive to negative (the best and worst DTSS from the rotamer scan are −1.1 kcal/mol and 0.0 kcal/mol, respectively).

Figure 3

Catalytic field in Kemp eliminase. (a) 3D view of surface colored with the value of Δ (red indicates that positively charged catalyst residues will be beneficial, blue indicates that negatively charged residues will be beneficial); (b) Kohonen map[60] of the catalytic field (values given in kcal/(mol e)) (positions of amino acids at a distance of 7 Å above the van der Waals surface are marked with circles (optimal DTSS) and squares (the worst DTSS); residues predicted to provide further improvement are marked with stars (optimum DTSS) and diamonds (the worst DTSS); the arrows indicate the transition from the position with the worst DTSS to that with the best one.

Side-Chain Movements and the Ranges of DTSS Contributions

Charged amino acid side chains, especially of lysines and arginines, belong to most flexible upon ligand binding,[34,35] so it is necessary to consider entire range of their dislocations in more detail. Results obtained using library of atomic multipoles for amino acid side chains indicate that activation energy changes may vary between 1 kcal/mol and 4 kcal/mol for charged residues, in contrast to neutral amino acids (0–0.5 kcal/mol). The possible movements of charged amino acid terminal atoms representing extreme values of catalytic activity measured by DTSS are illustrated on a 2D Kohonen map of the Kemp eliminase catalytic field in Figure b. In order to make a 2D representation of catalytic field, we used a self-organizing map (SOM) based on the Kohonen neural network.[60] First, a surface at a distance of 7 Å from the vdW surface of TS was generated with the point density set to 25 points/Å2. It was then used to train a neural network consisting of 3600 neurons arranged into a square grid with a toroidal topology. This special topology means that the grid (and thus the resulting map) does not have an edge: the “rightmost” point is topologically adjacent to the “leftmost” point, while the “topmost” point neighbors the “bottom-most” point. Another consequence of this fact is that replicas of such map may be placed next to each other, similar to tiles, showing the topological surroundings of each point. After training, the map was colored according to the value of catalytic field Δ, and terminal atoms of amino acids (chosen in the same manner as presented on Figure ) were projected onto resulting SOM. These side chain conformation changes could be regarded as the important part of a preorganized catalytic environment. Relatively large activation barrier changes resulting from amino acid side chain movements reinforce the conclusions from previous experimental and computational studies, indicating the importance of the precise positioning of catalytic residues.[61]

Multiscan Results Including Solvent Effects—Comparison with Experiment

Enzyme catalytic activity could be expressed in terms of TS and S interaction energies with the enzyme treated as a rigid body. However, the docking substrate may induce significant changes in side-chain conformation of charged amino acids,[34,35] which may strongly interact between themselves as well as with TS and S. Whereas the interaction energy for typical neutral hydrogen-bonded dimer is 3–5 kcal/mol, it can be 1 or 2 orders of magnitude higher, if at least one of the monomers bear nonzero electric charge (specifically, the absolute value of interaction energy may reach 20 kcal/mol in the case of neutral molecules bonded to a charged one or even 100 kcal/mol when both partners are charged). Therefore, the interactions involving charged amino acids are expected to be dominant and a model relying solely on this contribution seems to be a reasonable first approximation. In the case of Kemp eliminase, six positions varied in LDE experiments involved charged residues: 7, 19, 123, 146, 202, and 224 as shown in Tables and 2. The change of the charged state of these residues located beyond the first coordination sphere and directly involved in reactant binding has the greatest chance to modulate a catalytic field without affecting the reactant binding pose. In addition, they may strongly interact with each other and substrate binding may affect the conformation of their charged side chains. Therefore, multiscan procedure seems to a reasonable way to model these effects, which could not be determined by currently available conventional experimental or theoretical methods.

Table 1

DTSS (CAMM) Values Resulting from Multiscan*

Values given in units of kcal/mol. For noncharged residues, DTSS has been assumed to be zero. The residue charge code: red, positive; blue, negative; black, neutral.

Table 2

Contributions of Individual Amino Acids to Total Activation Barrier Lowering DTSS (GB) Including Solvent Effects Estimated via the Generalized Born Approach*

Values given in units of kcal/mol.For noncharged residues, DTSS has been assumed to be zero. The residue charge code: red, positive; blue, negative; black, neutral.

For all mutants, residues at these positions have been taken into account in a search for the lowest-energy rotamer configuration using a CAMM rotamer library (http://camm.pwr.edu.pl/CAMM). In order to take into account solvent presence, the GB model has been used with selection of effective Born radii computed with the bornRadius program developed by Onufriev and co-workers.[56,62] With this energy function (denoted henceforth as GB), a search over rotamer space was performed in exactly the same way as the CAMM model. Interactions between amino acid residues were included. Finally, we compare those results to MMPBSA[51] calculations based on 30-ns MD simulations performed for each mutant separately (more details are given in the section devoted to the MMPBSA approach). Experimental activation free energies for Kemp eliminase mutants are compared in Figure with all three aforementioned models. Remarkably, both models involving a scan over rotamer space—that is, DTSS(CAMM) and DTSS(GB)—outperform the DTSS(MMPBSA) approach, in terms of the number of correctly ordered pairs Npred (See Tables and 2). Npred of a model[63] is a ratio of correctly predicted pairs to all pairs in a test set. A “correctly predicted” or concordant pair is a pair of mutants P,Q for which EDTSS < EDTSS and kcat > kcat or EDTSS > EDTSS and kcat < kcat; otherwise, such a pair is counted as nonconcordant. Given the total number of pairs Ntot in a test set consisting of N enzymes (eq ), the value of Npred (usually expressed as a percentage) is given by eq . It is related to Kendall’s tau τ by the relation described in eq ).While DTSS results obtained within GB, CAMM, and MMPBSA models, accounting for continuous solvent, correlate with experimentally observed trends (Spearman correlation coefficients 0.92, 0.80, or 0.79 and predictivity rates of 89.3%, 82.1%, or 78.6%), the TSS model anticorrelates (Spearman coefficients −0.65, −0.35, or −0.67 with predictivity rates of 25.0%, 35.7%, or 25%) confirming the essential role of neglected ground-state destabilization.[23] This was probably one of the reasons of limited success of initial theozyme design.[13,14]

Figure 4

Comparison of theoretical DTSS and TSS results with experimental data for Kemp eliminase mutants obtained from directed evolution experiments.[8] In the CAMM approach, reactant–amino acid side-chain interactions were calculated by employing cumulative atomic multipole moment expansion. In the GB approach, corresponding ESP atomic charges have been used, including solvent effects estimated from the Generalized Born method. MMPBSA denote mean interaction energies using MD trajectories where solvent effects were obrained from the Poisson–Boltzmann model.[51] Values given in units of kcal/mol. For noncharged residues, DTSS has been assumed to be zero. The residue charge code: red, positive; blue, negative; black, neutral. Values given in units of kcal/mol.For noncharged residues, DTSS has been assumed to be zero. The residue charge code: red, positive; blue, negative; black, neutral. Because of the approximations involved, the correspondence is not linear; however, this does not disqualify the overall approach, which is intended to determine the ranking or narrow the large set of possible mutations, so it would be manageable for more sophisticated methods. Thus, it seems to be sufficient to indicate a general qualitative trend, favoring the DTSS model over the TSS model.

Nonadditive Effects of Mutations

Activation barrier changes (DTSS) originating from specific residue could sometimes vary as much as 1.25 kcal/mol, indicating possible side-chain conformation changes, resulting from interactions between other residues (see Table S1 in the Supporting Information). Such variations responsible for nonadditivity of mutations could be examined in detail using the multiscan technique presented here. Let us discuss in detail ASP7 catalytic activity in P3, P4, P5, P6, and P7 mutants. Figure a–e presents schematic side-chain orientations obtained using a multidimensional scaling technique, preserving Euclidean distances in the space of lower dimensionality. Mutation GLU146LYS in P4 forces change of ASP7 conformation in P3 and, as a result, a loss of its activity measured by DTSS from −1.12 kcal/mol to +0.02 kcal/mol. Then, mutations LYS19GLU and LYS146XX (where XX rerpesents a neutral amino acid) introduced into P4 restore ASP7 activity in P5. In P6, the previous mutation is reversed, resulting in the same activity loss as that observed in P4. Finally, the LYS19XX mutation in P6 restores ASP7 activity in P7 again. All of these results indicate the essential role of molecular context, which can be explored by the multiscan technique presented in the previous chapter. Additional analysis of nonadditive effects of LYS146 and Arg202 is presented in section S4 of the Supporting Information.

Figure 5

Optimal side-chain orientations in Kemp eliminase active site determined for P3–P7 mutants. DTSS values are given in units of kcal/mol.

Discussion

The presented scheme correctly explains the effect of mutations leading from KE07 to R7–R10/11G, providing an atomistic insight into the possible role of each residue contribution to the activation barrier lowering, emerging from first principles. The most significant result of this analysis is an effective approximate solution of the “inverse catalysis problem” confronted and confirmed for the first time by experimental results. We have shown that, based on the information provided by the catalytic field, the search space could be narrowed down to a limited number of computational steps. The underlying electrostatic differential transition state stabilization DTSS concept[25,26] has been confirmed later in recent independent studies.[18,23,31] Jindal and co-workers analyzed the differences between evolutionary pathways of two Kemp eliminase families: descendants of KE07 and HG3.[23] They noted that, in the first group, the improvement arises from ground-state destabilization, whereas, in the second one, te improvement results from an increase in transition-state stabilization (TSS). This is consistent with our more general DTSS and catalytic field approach, which covers both effects simultaneously.[25,26] Moreover, some elements of differential stabilization of the transition state approach appeared later in two other works,[18,22] which provides further reinforcement for this idea. In summary, the role of residues 7, 19, 123, 146, 202, and 224 modified in laboratory-directed evolution experiments, which are not in direct contact with reactants, seems to be modulation of the catalytic field to minimize activation barrier, without changing the reactant binding pose determined primarily by the amino acids constituting the first coordination sphere. The essential advantage of the present approach is the use of cumulative atomic multipole moments to describe molecular charge redistribution during the progress of the reaction, capturing the anisotropic character of molecular electrostatic potentials,[64] which may be more significant in neutral systems.[43] It may be still crucial in proper description of the catalytic field, since Δ has at least dipole character, because of the charge conservation principle. Another important feature of the catalytic field is that, despite special emphasis placed on charged residues in the present analysis, it can provide more general information; by considering its gradient Δ⃗, one gains insight into the optimal distribution of catalytic dipoles in a way analogous to that presented above for charges (see Figures S1 and S2 in the Supporting Information). Finally, as mentioned earlier, whenever necessary, the DTSS energy can always be generalized to include remaining interaction energy terms defined in interaction energy decomposition schemes, such as SAPT[39] or HVPT.[40] For example, exchange repulsion and dispersion terms could be estimated from the corresponding nonempirical atom–atom functions approximating accurate SAPT results.[65,66] An important issue is a nonadditivity of mutations reported recently.[23] Within the presented framework, it could be included by exploration of the effect of mutation on rotamer distribution of other residues (and corresponding DTSS energy changes). An example of such analysis is presented in Figure , where we found that considering the mutual interactions of residues Asp7, Lys19, Lys146, Arg202, and Arg2246 in P3, P4, P5, P6, and P7 mutants changes the optimal DTSS contribution of Asp7 several times from −1.1 (P3), 0.0 (P4), −1.1 (P5), 0.0 (P6), and −1.1 (P7) kcal/mol. Such an approach can be effectively used to obtain rapid estimates of nonadditive effects in multiple mutants. The general idea presented in this work may be easily introduced into other existing protocols, such as an ensemble of short MD[67] or Monte Carlo simulations.[36] Application of static catalytic field gradients (dynamic electrostatic catalytic fields DECF)[47] allows one to predict the catalytic role of proton transfer in enzyme hydrogen-bond networks.[48]

45 in total

1. Exploring challenges in rational enzyme design by simulating the catalysis in artificial kemp eliminase.

Authors: Maria P Frushicheva; Jie Cao; Zhen T Chu; Arieh Warshel
Journal: Proc Natl Acad Sci U S A Date: 2010-09-09 Impact factor: 11.205

2. Computational redesign of a mononuclear zinc metalloenzyme for organophosphate hydrolysis.

Authors: Sagar D Khare; Yakov Kipnis; Per Greisen; Ryo Takeuchi; Yacov Ashani; Moshe Goldsmith; Yifan Song; Jasmine L Gallaher; Israel Silman; Haim Leader; Joel L Sussman; Barry L Stoddard; Dan S Tawfik; David Baker
Journal: Nat Chem Biol Date: 2012-02-05 Impact factor: 15.040

3. Low cost prediction of relative stabilities of hydrogen bonded complexes from atomic multipole moments for overly short intermolecular distances.

Authors: Wiktor Beker; Karol M Langner; Edyta Dyguda-Kazimierowicz; Mikołaj Feliks; W Andrzej Sokalski
Journal: J Comput Chem Date: 2013-05-21 Impact factor: 3.376

4. Protein design automation.

Authors: B I Dahiyat; S L Mayo
Journal: Protein Sci Date: 1996-05 Impact factor: 6.725

5. The Importance of the Scaffold for de Novo Enzymes: A Case Study with Kemp Eliminase.

Authors: Asmit Bhowmick; Sudhir C Sharma; Teresa Head-Gordon
Journal: J Am Chem Soc Date: 2017-04-17 Impact factor: 15.419

Review 6. The coming of age of de novo protein design.

Authors: Po-Ssu Huang; Scott E Boyken; David Baker
Journal: Nature Date: 2016-09-15 Impact factor: 49.962

7. Holy Grails for Computational Organic Chemistry and Biochemistry.

Authors: K N Houk; Fang Liu
Journal: Acc Chem Res Date: 2017-03-21 Impact factor: 22.384

8. Application of a self-consistent mean field theory to predict protein side-chains conformation and estimate their conformational entropy.

Authors: P Koehl; M Delarue
Journal: J Mol Biol Date: 1994-06-03 Impact factor: 5.469

Review 9. Optimization of reorganization energy drives evolution of the designed Kemp eliminase KE07.

Authors: A Labas; E Szabo; L Mones; M Fuxreiter
Journal: Biochim Biophys Acta Date: 2013-02-01

10. Catalytic mechanism and performance of computationally designed enzymes for Kemp elimination.

Authors: Anastassia N Alexandrova; Daniela Röthlisberger; David Baker; William L Jorgensen
Journal: J Am Chem Soc Date: 2008-11-26 Impact factor: 15.419

3 in total

Review 1. Advances in optimizing enzyme electrostatic preorganization.

Authors: Matthew R Hennefarth; Anastassia N Alexandrova
Journal: Curr Opin Struct Biol Date: 2021-07-17 Impact factor: 6.809

2. Extreme Catalytic Power of Ketosteroid Isomerase Related to the Reversal of Proton Dislocations in Hydrogen-Bond Network.

Authors: Paweł Kędzierski; Maria Zaczkowska; W Andrzej Sokalski
Journal: J Phys Chem B Date: 2020-04-27 Impact factor: 2.991

3. Extension of an Atom-Atom Dispersion Function to Halogen Bonds and Its Use for Rational Design of Drugs and Biocatalysts.

Authors: Wiktoria Jedwabny; Edyta Dyguda-Kazimierowicz; Katarzyna Pernal; Krzysztof Szalewicz; Konrad Patkowski
Journal: J Phys Chem A Date: 2021-02-23 Impact factor: 2.781

3 in total