Currently developed protocols of theozyme design still lead to biocatalysts with much lower catalytic activity than enzymes existing in nature, and, so far, the only avenue of improvement was the in vitro laboratory-directed evolution (LDE) experiments. In this paper, we propose a different strategy based on "reversed" methodology of mutation prediction. Instead of common "top-down" approach, requiring numerous assumptions and vast computational effort, we argue for a "bottom-up" approach that is based on the catalytic fields derived directly from transition state and reactant complex wave functions. This enables direct one-step determination of the general quantitative angular characteristics of optimal catalytic site and simultaneously encompasses both the transition-state stabilization (TSS) and ground-state destabilization (GSD) effects. We further extend the static catalytic field approach by introducing a library of atomic multipoles for amino acid side-chain rotamers, which, together with the catalytic field, allow one to determine the optimal side-chain orientations of charged amino acids constituting the elusive structure of a preorganized catalytic environment. Obtained qualitative agreement with experimental LDE data for Kemp eliminase KE07 mutants validates the proposed procedure, yielding, in addition, a detailed insight into possible dynamic and epistatic effects.
Currently developed protocols of theozyme design still lead to biocatalysts with much lower catalytic activity than enzymes existing in nature, and, so far, the only avenue of improvement was the in vitro laboratory-directed evolution (LDE) experiments. In this paper, we propose a different strategy based on "reversed" methodology of mutation prediction. Instead of common "top-down" approach, requiring numerous assumptions and vast computational effort, we argue for a "bottom-up" approach that is based on the catalytic fields derived directly from transition state and reactant complex wave functions. This enables direct one-step determination of the general quantitative angular characteristics of optimal catalytic site and simultaneously encompasses both the transition-state stabilization (TSS) and ground-state destabilization (GSD) effects. We further extend the static catalytic field approach by introducing a library of atomic multipoles for amino acid side-chain rotamers, which, together with the catalytic field, allow one to determine the optimal side-chain orientations of charged amino acids constituting the elusive structure of a preorganized catalytic environment. Obtained qualitative agreement with experimental LDE data for Kemp eliminase KE07 mutants validates the proposed procedure, yielding, in addition, a detailed insight into possible dynamic and epistatic effects.
In
recent years, intense research has been conducted in pursuit
of harnessing the versatile protein capabilites for application in
industrial biocatalysis.The major challenge arises from the
fact that living organisms
produce enzymes tailored for sustaining their own existence, rather
than the needs of human society. Therefore, the key to the future
of rational biocatalyst engineering lays in the redesigning of existing
enzymes and, ultimately, de novo design with the aid of new computational
methods.[1−4]Major breakthroughs in the theoretical design of enzymes for
new,
non-natural reactions (so-called “theozymes”) have occurred
in the past two decades.[5−12] However, after initial success, the progress stalled,[13,14] because theozymes that were designed to maximize transition-state
stabilization (TSS) displayed rather low catalytic activity and conventional
top-down models requiring numerous assumptions were not able to fully
explain the role of additional mutations in the second coordination
sphere, introduced by LDE experiments.[13,14] Therefore,
such LDE-mutated theozymes represent an excellent proving ground for
testing new methodologies and interpreting the deficiencies of older
methods. This problem was investigated by several research groups
with various methods, including extensive molecular dynamics,[15,16] QM/MM,[17−19] and EVB[20−23] simulations. All previously mentioned methods belonging
to the top-down category require consideriation of the entire protein
and numerous assumptions regarding protonation states, QM/MM boundaries,
always arbitrary selection of force field, etc. Although commonly
used methods are able to explain some of the differences between designed
and evolved enzymes, they are rather unsuitable for efficient scan
over vast combinatorial space. Therefore, a method simple enough for
high-throughput prescreening and yet deeply rooted in the physics
of the problem without involving any empirical factors is needed to
improve the existing protocols.The catalytic field technique
that has been used since we developed
it in our laboratory in 1981[24−26] has been recently supplemented
by Dittner and Hartke with extensive optimization techniques, allowing
one to generate and explore various characteristics of the optimal
catalytic environment.[27,28] However, extensive brute force
optimization of all variables involved would still make theoretical
design of theozyme composed of several hundreds of amino acids extremely
costly and impractical. We believe that systematic nonempirical analysis
of the physical nature of interactions involved in the catalysis and
development of additional techniques specific for enzymes presented
in this contribution could lead to rational construction of simpler
models useful in theozyme design. Other rare inverse catalyst design
attempts that have been recently summarized in a review[29] are mostly based on artificial intelligence
or data mining. Although such methods could be very useful in the
automated scanning of vast chemical spaces, it would be surprising
to obtain a deeper understanding of physical phenomena responsible
for extraordinary catalytic activity of enzymes this way, while many
essential structural details are missing in currently available databases.Recent experimental results that employ Stark effects for a specific
band of infrared (IR) spectra for mutated enzymes indicate the electrostatic
nature of KSI catalytic activity.[30] This
permits one to monitor electric fields exerted by specific mutations
on certain reactant bonds in top-down fashion, yielding valuable but
still limited information.[16,31] This perspective is
reversed here in bottom-up fashion by the use of a catalytic field
derived from the reactant and transition state wave functions, yielding
general characteristics of charge distribution of the optimal catalytic
environment for every point in space outside reagents and enabling
one to monitor lowering of activation barrier directly, as a result
of amino acid mutations, side-chain rotations, rate-promoting vibrations,
or proton transfers in hydrogen bond chains.Although the catalytic
field technique has been derived in our
laboratory[24−26] using the theory of intermolecular interactions,
it could be also applied to systems involving intramolecular interactions.
The ability to predict catalytic effects resulting from hydrogen to
fluorine (H → F) substitutions has been illustrated
for two simple model organic reactions,[32] which demonstrates qualitative applicability of this technique to
model substrate-assisted catalysis.[33]The present paper constitutes the first application of a catalytic
field approach to real case intermolecular catalysis. For this purpose,
here, we have chosen the artificial Kemp eliminase KE07 and its mutants.[9] Among other Kemp eliminases based on different
scaffolds, this system has the largest number of mutants with gradually
increasing activities. Several KE07 mutants introduced by LDE experiments,
which apparently were missed in earlier theozyme designs,[13,14] constitute a suitable benchmark to examine new biocatalyst design
methodologies.The mutants obtained from LDE experiments differ
mainly by the
number of charged amino acids located in the second coordination sphere,[13,14] which may change conformation of its side chain upon docking with
the substrate or nearby mutation.[34,35] The importance
of considering amino acid side-chain rotamers in biocatalysis has
been recently demonstrated for Kemp eliminase[36] and triosephosphate isomerase mutants.[37] A more general catalytic field technique,[25,26] considering simultaneously both TSS[8,9] and ground-state
destabilization,[23] combined with the atomic
multipole rotamer library scan, allows one to obtain valuable insight
into active site preorganization.This contribution also constitutes
the first attempt to test and
validate the catalytic field technique and apply it to explore possible
mutation nonadditivity, as well as dynamic and epistatic effects in
biocatalysis resulting from side-chain rotations.
Methods
Here, we present a bottom-up strategy based on a concept of catalytic
fields derived from the DTSS approach.[25,26] Within this
ab initio-based framework, activation barrier lowering is partitioned
into intrinsic (vacuum in the original
paper[25]) contribution of reactants and
the difference of enzyme-transition state(TS) and enzyme–substrate(S)
interaction energies EDTSS.Whenever
the DTSS approach is applied to enzyme reactions, the
term “substrate” refers to the ground state (alternatively,
the reactant complex or Michaelis complex). In principle, the DTSS
approach is more general and could be applied to other catalytic systems,
such as zeolites.[38]Technically speaking,
the “intrinsic” contribution
is a barrier calculated for reaction path coordinates as they are
in catalysts, but without inclusion of molecular environment in quantum
mechanics (QM) calculation. Despite the virtual character of this
reference state and the DTSS energy for a single catalyst, it can
be used as the reference in quantitative comparison of a sequence
of mutants of the same enzyme. Provided that mutation does not change
the reaction mechanism, taking the difference between mutant and wild-type
(WT) DTSS energies cancels out the “intrinsic” contribution,
leaving a value corresponding to the difference of reaction barriers
of mutated and WT enzyme (see Figure ):
Figure 1
Differential transition state stabilization (DTSS) energy EDTSS ≡ EIE(TS) – EIE(S), as a measure of
catalytic efficiency; under the assumption that mutation does not
change reaction mechanism, the difference EDTSSmut – EDTSSWT corresponds to the difference of activation barriers Bmut – BWT.
Differential transition state stabilization (DTSS) energy EDTSS ≡ EIE(TS) – EIE(S), as a measure of
catalytic efficiency; under the assumption that mutation does not
change reaction mechanism, the difference EDTSSmut – EDTSSWT corresponds to the difference of activation barriers Bmut – BWT.Furthermore,
DTSS could be partitioned into the electrostatic multipole
ΔEEL,MTP, electrostatic penetration
ΔEEL,PEN exchange ΔEEX, delocalization (induction) ΔEDEL, and correlation (dispersion) ΔECORR terms defined within symmetry-adapted perturbation
or hybrid variation-perturbation theories of intermolecular interactions.[39,40] This provides not only insight into the physical nature of catalytic
interactions induced upon mutation, but it also allows one to derive,
in a systematic way, more approximate methods based on leading terms.
Recent experimental results[30] and ab initio
analysis of several enzyme systems[26,41,42] indicate clearly the dominant contribution of electrostatic
term ΔEEL and, in particular, its
long-range multipolar electrostatic component ΔEEL,MTP accurately reflecting, at an atomic level, the
molecular charge redistribution during the progress of the reaction.[43] The electrostatic multipolar DTSS term,[26] constituting a more-general approach, represents
TSS[44] and, at the same time, ground-state
destabilization (GSD)[23] effects, postulated
earlier as the important contributions to enzyme catalysis. Our recent
analysis of several mutants of ketosteroid isomerase[45] validated the utility of simple model, based on cumulative
atomic multipole expansion (CAMM),[46] in
agreement with recent experimental data,[30] confirming the electrostatic nature of catalytic activity.
Catalytic
Fields
Further consideration of electrostatic component of
DTSS energy
leads to the static catalytic field Δ(r⃗),
which is defined[25,26] as the difference between electrostatic
potentials V(r⃗) of TS and substrate S (see eq ). Its value denotes activation barrier change
resulting from the presence of unit point charge in any location of
catalytic site outside reactants. Because of the additive nature of
electrostatic interactions, catalytic field values obtained in one
computational step constitute the solution of inverse catalysis problem
in the form of Δ(r⃗), i.e., complementary charge
distribution of optimal molecular environment q(r⃗).[25,26]Therefore, a catalytic field may serve as a
general guide for rational
selection of mutations or side chain orientations or rate-promoting
vibrations by providing clues about optimal static and dynamic charge
distribution of catalytic environment reducing the activation barrier.
The values of Δ(r⃗) could be rapidly calculated
from CAMM derived directly from transition-state and substrate quantum-chemical
wave functions.[46] Therefore, calculations
involving catalytic field scale as O(NR · NA · Nrot) only, where NR and NA indicate the number of atoms
in reactants and considered mutated amino acid residue, respectively. Nrot denotes the average number of rotamers per
mutation site (typically a few dozens). In the case when charged amino
acid is represented by single unit charge located at the charge gravity
center (for example, located near the terminal side chain atom of
lysine or arginine), scaling could be further reduced to O(NR · Nrot), which must be calculated only once for the mutation site. The
validity of such approximation is illustrated in Figure .
Figure 2
Activation energy changes
obtained using cumulative atomic multipole
moments DTSS(CAMM) versus products of catalytic fields Δ and formal entire amino acid charges – qf located at its side-chain charge gravity center
(Arg- guanidyl carbon, Lys- 6N, Asp- Cγ, Glu- Cδ).
Activation energy changes
obtained using cumulative atomic multipole
moments DTSS(CAMM) versus products of catalytic fields Δ and formal entire amino acid charges – qf located at its side-chain charge gravity center
(Arg- guanidyl carbon, Lys- 6N, Asp- Cγ, Glu- Cδ).This way, the search space in biocatalyst design is considerably
reduced within purely nonempirical approach. The use of atomic multipoles
is essential to represent the anisotropy of molecular charge distribution
properly at the atomic level, in particular for transition states,
where bonds are formed or broken.[43] In
addition, always arbitrary atomic charge definitions are compensated
by the use of higher-order atomic multipoles, considerably reducing
the basis set dependency of atomic charges.[46]Note that Δ gradient field
defined
as dynamic catalytic field Δ, indicates
direction of the favorable unit probe charge q(r⃗) movement coupled with the reaction coordinate and resulting
in additional decrease of the activation barrier.[47] Such coupling may involve proton dislocations in hydrogen-bonded
networks. Recent application of the dynamic electrostatic catalytic
field (DECF) approach allowed us to determine that each of two steps
of reaction proceeding in ketosteroid isomerase (KSI) is catalyzed
by concerted proton dislocations proceeding in reverse directions
in the network of six hydrogen bonds.[48] Therefore, observed exceptional KSI catalytic activity could be
related to ultrafast proton switching.[48]
Exploration of Rotamer Space
In this study, we will concentrate
on some unexplained mutations
introduced by LDE into artificial Kemp eliminase, which are not in
direct contact with the substrate.[13,14] According
to the aformentioned assumption about second-coordination sphere mutations
(that is, their negligible effect on the reaction mechanism), the
following calculations base on transition state obtained for KE07
enzyme with Gaussian09[49] ONIOM implementation[50] (details are provided in section S2 of the Supporting Information). The QM region of
that simulation, consisting of 5-nitrobenzoixazole (5NBI) and acetate
representing part of Glu101 residue, was superimposed on 5NBI-enzyme
complexes of the mutants.The space of possible amino acid conformations
was explored in
two ways. In the first, conventional molecular dynamics (MD) has been
used with solvent effects modeled within the Molecular Mechanics Poisson–Boltzmann
Surface Area (MMPBSA) model,[51] which is
regarded as a reliable free-energy simulation method to investigate
protein–ligand interactions. This provides a baseline representing
a commonly used approach to the problem. In the second one, the library
of atomic multipoles for amino acid side chain rotamers has been scanned
using catalytic fields, providing a proof of concept of a proposed
approach (note that, with proper implementation, this involves much
less computational time than MD simulation and avoids one having to
always use arbitrary force fields).
MMPBSA Approach
First, 30-ns MD trajectories were prepared for each of eight mutants,
followed by MMPBSA calculation with the usage of the gMMPBSA program[51] MD simulations were performed in Gromacs 4.6.[52] For the sake of consistency with ONIOM calculations,
Amber94 force field with TIP3water model was used. Mutant structures
were generated from KE07 by a homemade script, according to published
data[9](see Table S1 in the Supporting Information). All structures were submerged in
water box at least 1 nm from the protein atoms, with Na and Cl ions
for charge neutralization. In all simulations, periodic boundary conditions
and Particle Mesh Ewald long-range electrostatics (1 nm cutoff) were
used. For MD simulations, a leapfrog algorithm was used, and the temperature
and pressure were set to 300 K and 1 bar, respectively. After initial
steepest descent minimization, NVT and NPT equilibration stages were
performed, each one 200 ps long. In this phase, a modified Berendsen
thermostat and a Parinello-Rahman isotropic barostat were chosen.
In subsequent 30 ns production run, distance restraints between Glu101
OE2 and ligand C3 and H3 atoms were used, in order to prevent random
dissociation of the complex. Then, trajectories ware used in MMPBSA
calculations done with gMMPBSA program.[51]MD simulations performed for KE07 and its mutants concerned
only
the structure of the enzyme–substrate complex. It was assumed
that the reaction event, when it occurs, is much faster than conformational
changes of the surrounding amino acid residues. Therefore, these simulations
were supposed to provide an ensemble of protein conformations rather
than to simulate the reaction itself. Furthermore, the application
of catalytic field Δ could be tested
in this way. Taking advantage of the minor changes in geometry between
reactant and TS (see Figure S3b in the
Supporting Information) and keeping the assumption mentioned above,
the same trajectory was assumed for TS. Different energies were obtained
by substitution of charges assigned to the “QM region”
(as defined in ONIOM calculations) in the topology files. Since g_mmpbsa
evaluates electrostatic and implicit solvation contributions to binding
energy (we included only electrostatic components), DTSS energy with
implicit solvation was obtained by putting differential charges (the difference between each atomic charge in TS and the reactant),
which effectively are an atomic monopole representation of Δ. Atomic charges for RS and TS were calculated
using the Merz–Kollman scheme[53] in
the same basis as that used in ONIOM calculation. The contribution
of QM region to solvation energy was subtracted from the result, since
this method of calculation provides incorrect value of this term;
on the other hand, it may be assumed to be negligible or at least
constant among mutants (for a more-detailed derivation, see Section S3 of the Supporting Information).
Scanning
the CAMM Library of Rotamers
As the major approach in this
study, we used the Dynameonics Rotamer
Database (backbone-independent, updated in 2015)[54] to prepare a library of CAMM for each amino acid rotamer.All CAMMs were calculated with GAMESS (US)[55] with HF method and 6-311G(3d,2p) basis set. CAMMs for the QM layer
in both RS and TS states were obtained separately. The CAMM expansion
was then used in the calculation of catalytic fields and interaction
energies. For each mutation, rotamers with the best and worst TSS/DTSS
contribution were determined (hereafter, we will refer to the rotamer
with the best DTSS contribution as optimal). Finally, we explored
the configuration space resulting from combining rotamers at different
sites (mathematically speaking, a Cartesian product of individual
site rotamer spaces in a given mutant).In order to test the
importance of solvation effects on electrostatic
interaction energies, we examine the version of library incorporating
Generalized Born (GB) model, where the solvation energy is modeled
according to eq . The
Born radii were calculated using bornRadius software
developed by Onufriev’s group, utilizing an improved algorithm
described in refs (56 and 57). The atomic charges for amino acid residues were taken from Amber94
force field, whereas charges on the QM region were calculated with
the Merz–Kollman method for the HF/6-311G(3d,2p) wavefunction.
Algorithms for the Exploration of Rotamer Configurational Space
Since the number of rotamer combinations grows exponentialy with
the number of amino acids, it is important to restrict the size search
space, especially in the early steps of calculation. In particular,
elimination of rotamers that are unlikely to participate in final
combination is of great importance. In our experiments, we found that
the average number of rotamers per site is in the range of 60–80.
Thus, elimination of highly improbable rotamer (e.g., having short
contact with the protein backbone) reduces the size of search space
by 70(N – 1) configurations (with N being the number of included residues). Here, two criteria
were applied: (i) geometric and (ii) energetic. The first one basically
involves exclusion of rotamers which side chains have close contanct
(≤1 Å) with protein backbone (neglecting, obviously, contacts
corresponding to chemical bonds). The second one is the well-known
Goldstein criterion for singles elimination.[58]Here, subscripts k,l ∈ [1,N] denote the position
in the amino acid sequence, whereas superscripts “A”
and “B” indicate the rotamer at a particular site. However,
the value ϵ in the original work was equal to zero; by choosing
a positive nonzero value, one can “soften” the criterion
and thus obtain a larger final space. Such an approach may be useful
when one is interested in a set of low-energy conformers, which may
be particularly desirable when considering dynamic effects.The search algorithm generally can be described by the following
steps.Loading
of the amino acid rotamers
into specified positions in a protein structure. At this stage, criterion
(i) is applied.Calculation
of an interaction energy
table.Elimination
of the rotamers according
to different criteria (e.g., criterion (ii)).Systematic search over resulting space.Throughout the presented work, we refer
to the method without additional
elimination criteria in point 3 as “multiscan”. The
second algorithm tested, denoted as Dead End Elimination (DEE), involves
criterion (ii) with negligence of the rotamer self-energy (E(r)) and ϵ set to either 0.0 or 0.005 hartree.
The performance of both approaches was tested on P6 mutant
represented by five charged residues: Asp7, Glu19, Lys146, Arg202,
and Asp224. A potential bottleneck may be the calculation of CAMM
interaction energies (which can be several times slower than that
observed for the point-charge model) or loading CAMM from library
(since the rotation of large multipole arrays may be computationally
expensive). Therefore, we compared the performance of the RESP model
(Table S1 in the Supporting Information)
and CAMM (Table S2 in the Supporting Information).
We ensured that both approaches lead to the same rotamer configuration.Two major observations can be drawn from the presented results.Elimination of rotamers according
to Goldstein criterion
reduces the search space by 2–4 orders of magnitude, depending
on the assumed threshold. This, in turn, leads to a similar reduction
of time spent on scanning the rotamer space.Application of CAMM mutipoles leads to an ∼2-fold
increase of loading time (due to the required transformation of multipole
arrays) and an ∼6-fold increase in the time spent on the computation
of interaction energy table. In fact, this time is comparable to that
required to scan the entire (unreduced) rotamer space.However, for larger sets of amino acid residues, the
problems with
application of CAMM expansion can be tackled by initial reduction
of search space, based on atomic charges, maybe with a nonzero epsilon. This should allow one to avoid situations when
inclusion of the charge anisotropy leads to different configurations.Furthermore, there are at least two possibilities, with regard
to further improvement of the presented algorithm:The elimination of unlikely pairs
from the search space
(analogous to the Goldstein criterion for singles).Further approximation of the energy table. For larger
rotamer sets, this could become a serious bottleneck, since its full
preparation is required in all presented algorithms. However, for
residues distant in space, there should be a little difference in
interaction energy between their different rotamers (in other words,
the rotation of side chains should not change the interaction energy).
In such a case, an entire block of the interaction energy table could
be filled at once.
Flowchart of Computational
Procedure
(A) Structure preparationPreparation of CAMM
amino acid library
(HF/6-31G*); structures of rotamers taken from the Dynameonics database[54]Calculation of reaction path in KE07
(ONIOM B3LYP/6-311+G**:Amber94)MD simulations of KE07 with distance
restraint put on hydrogen bonds between Asp and 5-nitrobenzoisoxazole
(5NBI)Clusterization
of production trajectory
with g-clust algorithm (available in Gromacs). The main cluster was
then used for analysis of the conformational space(B) Exploration of rotamer spaceOnly amino acid
residues that varied
in directed evolution experiments[9] were
included. Rotamers from the CAMM library were loaded into these positions,
excluding those having short contacts with either backbone or ligand
(short-contact is defined as the shortest interatomic distance below
1.25 Å)Evaluation
of interaction energy table
in either the GB model (see below) or with the corresponding point
charge model without solvationElimination of pairs of rotamers (of
different residues) yielding short contactsSearch of the lowest energy configuration
with different algorithmsbrute-force
algorithm (direct iteraton over all possible
configurations) (referred to as “multiscan”)Dead End Elimination, DEE (the elimination
of rotamers,
according to Goldstein criterion for singles.[58]Mean-Field algorithm.[59]Evaluation of DTSS(CAMM) or DTSS(GB)
energy for the found configurations
Results
The Catalytic
Field and Optimal Locations of Crucial Amino Acids
The three
charged residues—Asp7, Lys146, and Arg202, conserved
in directed evolution[9] (see Table S1)—seem to play the most important
role. What’s even more remarkable, these findings are consistent
with the catalytic field corresponding to this reaction. As can be
seen in Figure , charged
amino acids conserved in directed evolution reside exactly where they
should be located according to catalytic field Δ, and the range of their possible rotamers fits the
corresponding “basins” well. The position of Asp7 on
the edge between the two areas of Δ provides a straightforward exploration of both its low contribution
and range varying from positive to negative (the best and worst DTSS
from the rotamer scan are −1.1 kcal/mol and 0.0 kcal/mol, respectively).
Figure 3
Catalytic
field in Kemp eliminase. (a) 3D view of surface colored
with the value of Δ (red indicates
that positively charged catalyst residues will be beneficial, blue
indicates that negatively charged residues will be beneficial); (b)
Kohonen map[60] of the catalytic field (values
given in kcal/(mol e)) (positions of amino acids at a distance of
7 Å above the van der Waals surface are marked with circles (optimal
DTSS) and squares (the worst DTSS); residues predicted to provide
further improvement are marked with stars (optimum DTSS) and diamonds
(the worst DTSS); the arrows indicate the transition from the position
with the worst DTSS to that with the best one.
Catalytic
field in Kemp eliminase. (a) 3D view of surface colored
with the value of Δ (red indicates
that positively charged catalyst residues will be beneficial, blue
indicates that negatively charged residues will be beneficial); (b)
Kohonen map[60] of the catalytic field (values
given in kcal/(mol e)) (positions of amino acids at a distance of
7 Å above the van der Waals surface are marked with circles (optimal
DTSS) and squares (the worst DTSS); residues predicted to provide
further improvement are marked with stars (optimum DTSS) and diamonds
(the worst DTSS); the arrows indicate the transition from the position
with the worst DTSS to that with the best one.Figure a presents
catalytic field Δ(r⃗) for the Kemp eliminase, calculated
according to eq on
the van der Waals surface of reactants, as well as optimal rotamers
of crucial charged residues.
Side-Chain Movements and the Ranges of DTSS
Contributions
Charged amino acid side chains, especially
of lysines and arginines,
belong to most flexible upon ligand binding,[34,35] so it is necessary to consider entire range of their dislocations
in more detail. Results obtained using library of atomic multipoles
for amino acid side chains indicate that activation energy changes
may vary between 1 kcal/mol and 4 kcal/mol for charged residues, in
contrast to neutral amino acids (0–0.5 kcal/mol). The possible
movements of charged amino acid terminal atoms representing extreme
values of catalytic activity measured by DTSS are illustrated on a
2D Kohonen map of the Kemp eliminase catalytic field in Figure b.In order to make a
2D representation of catalytic field, we used a self-organizing map
(SOM) based on the Kohonen neural network.[60] First, a surface at a distance of 7 Å from the vdW surface
of TS was generated with the point density set to 25 points/Å2. It was then used to train a neural network consisting of
3600 neurons arranged into a square grid with a toroidal topology.
This special topology means that the grid (and thus the resulting
map) does not have an edge: the “rightmost” point is
topologically adjacent to the “leftmost” point, while
the “topmost” point neighbors the “bottom-most”
point. Another consequence of this fact is that replicas of such map
may be placed next to each other, similar to tiles, showing the topological
surroundings of each point. After training, the map was colored according
to the value of catalytic field Δ, and terminal atoms of amino acids (chosen in the same manner as
presented on Figure ) were projected onto resulting SOM.These side chain conformation changes could be regarded as the
important part of a preorganized catalytic environment. Relatively
large activation barrier changes resulting from amino acid side chain
movements reinforce the conclusions from previous experimental and
computational studies, indicating the importance of the precise positioning
of catalytic residues.[61]
Multiscan Results
Including Solvent Effects—Comparison
with Experiment
Enzyme catalytic activity could be expressed
in terms of TS and S interaction energies with the enzyme treated
as a rigid body. However, the docking substrate may induce significant
changes in side-chain conformation of charged amino acids,[34,35] which may strongly interact between themselves as well as with TS
and S. Whereas the interaction energy for typical neutral hydrogen-bonded
dimer is 3–5 kcal/mol, it can be 1 or 2 orders of magnitude
higher, if at least one of the monomers bear nonzero electric charge
(specifically, the absolute value of interaction energy may reach
20 kcal/mol in the case of neutral molecules bonded to a charged one
or even 100 kcal/mol when both partners are charged). Therefore, the
interactions involving charged amino acids are expected to be dominant
and a model relying solely on this contribution seems to be a reasonable
first approximation. In the case of Kemp eliminase, six positions
varied in LDE experiments involved charged residues: 7, 19, 123, 146,
202, and 224 as shown in Tables and 2. The change of the charged
state of these residues located beyond the first coordination sphere
and directly involved in reactant binding has the greatest chance
to modulate a catalytic field without affecting the reactant binding
pose. In addition, they may strongly interact with each other and
substrate binding may affect the conformation of their charged side
chains. Therefore, multiscan procedure seems to a reasonable way to
model these effects, which could not be determined by currently available
conventional experimental or theoretical methods.
Table 1
DTSS (CAMM) Values Resulting from
Multiscan*
Values given in units of kcal/mol.
For noncharged residues, DTSS has been assumed to be zero. The residue
charge code: red, positive; blue, negative; black, neutral.
Table 2
Contributions of
Individual Amino
Acids to Total Activation Barrier Lowering DTSS (GB) Including Solvent
Effects Estimated via the Generalized Born Approach*
Values given in units of kcal/mol.For
noncharged residues, DTSS has been assumed to be zero. The residue
charge code: red, positive; blue, negative; black, neutral.
For all mutants,
residues at these positions have been taken into account in a search
for the lowest-energy rotamer configuration using a CAMM rotamer library
(http://camm.pwr.edu.pl/CAMM).In order to take into account solvent presence, the GB model
has
been used with selection of effective Born radii computed with the bornRadius program developed by Onufriev and co-workers.[56,62] With this energy function (denoted henceforth as GB), a search over
rotamer space was performed in exactly the same way as the CAMM model.
Interactions between amino acid residues were included.Finally,
we compare those results to MMPBSA[51] calculations
based on 30-ns MD simulations performed for
each mutant separately (more details are given in the section devoted
to the MMPBSA approach). Experimental activation free energies for
Kemp eliminase mutants are compared in Figure with all three aforementioned models. Remarkably,
both models involving a scan over rotamer space—that is, DTSS(CAMM)
and DTSS(GB)—outperform the DTSS(MMPBSA) approach, in terms
of the number of correctly ordered pairs Npred (See Tables and 2). Npred of a model[63] is a ratio of correctly predicted pairs to all pairs in
a test set. A “correctly predicted” or concordant pair
is a pair of mutants P,Q for which EDTSS < EDTSS and kcat > kcat or EDTSS > EDTSS and kcat < kcat; otherwise, such a pair is
counted as nonconcordant. Given the total number of pairs Ntot in a test set consisting of N enzymes (eq ), the
value of Npred (usually expressed as a
percentage) is given by eq . It is related to Kendall’s tau τ by the relation described in eq ).While DTSS results
obtained
within GB, CAMM, and MMPBSA models, accounting for continuous solvent,
correlate with experimentally observed trends (Spearman correlation
coefficients 0.92, 0.80, or 0.79 and predictivity rates of 89.3%,
82.1%, or 78.6%), the TSS model anticorrelates (Spearman coefficients
−0.65, −0.35, or −0.67 with predictivity rates
of 25.0%, 35.7%, or 25%) confirming the essential role of neglected
ground-state destabilization.[23] This was
probably one of the reasons of limited success of initial theozyme
design.[13,14]
Figure 4
Comparison of theoretical DTSS and TSS results
with experimental
data for Kemp eliminase mutants obtained from directed evolution experiments.[8] In the CAMM approach, reactant–amino acid
side-chain interactions were calculated by employing cumulative atomic
multipole moment expansion. In the GB approach, corresponding ESP
atomic charges have been used, including solvent effects estimated
from the Generalized Born method. MMPBSA denote mean interaction energies
using MD trajectories where solvent effects were obrained from the
Poisson–Boltzmann model.[51]
Comparison of theoretical DTSS and TSS results
with experimental
data for Kemp eliminase mutants obtained from directed evolution experiments.[8] In the CAMM approach, reactant–amino acid
side-chain interactions were calculated by employing cumulative atomic
multipole moment expansion. In the GB approach, corresponding ESP
atomic charges have been used, including solvent effects estimated
from the Generalized Born method. MMPBSA denote mean interaction energies
using MD trajectories where solvent effects were obrained from the
Poisson–Boltzmann model.[51]Values given in units of kcal/mol.
For noncharged residues, DTSS has been assumed to be zero. The residue
charge code: red, positive; blue, negative; black, neutral.Values given in units of kcal/mol.For
noncharged residues, DTSS has been assumed to be zero. The residue
charge code: red, positive; blue, negative; black, neutral.Because of the approximations involved,
the correspondence is not
linear; however, this does not disqualify the overall approach, which
is intended to determine the ranking or narrow the large set of possible
mutations, so it would be manageable for more sophisticated methods.
Thus, it seems to be sufficient to indicate a general qualitative
trend, favoring the DTSS model over the TSS model.
Nonadditive
Effects of Mutations
Activation barrier changes (DTSS) originating
from specific residue
could sometimes vary as much as 1.25 kcal/mol, indicating possible
side-chain conformation changes, resulting from interactions between
other residues (see Table S1 in the Supporting
Information). Such variations responsible for nonadditivity of mutations
could be examined in detail using the multiscan technique presented
here. Let us discuss in detail ASP7 catalytic activity in P3, P4,
P5, P6, and P7 mutants. Figure a–e presents schematic side-chain orientations obtained
using a multidimensional scaling technique, preserving Euclidean distances
in the space of lower dimensionality. Mutation GLU146LYS in P4 forces
change of ASP7 conformation in P3 and, as a result, a loss of its
activity measured by DTSS from −1.12 kcal/mol to +0.02 kcal/mol.
Then, mutations LYS19GLU and LYS146XX (where XX rerpesents a neutral
amino acid) introduced into P4 restore ASP7 activity in P5. In P6,
the previous mutation is reversed, resulting in the same activity
loss as that observed in P4. Finally, the LYS19XX mutation in P6 restores
ASP7 activity in P7 again. All of these results indicate the essential
role of molecular context, which can be explored by the multiscan
technique presented in the previous chapter. Additional analysis of
nonadditive effects of LYS146 and Arg202 is presented in section S4 of the Supporting Information.
Figure 5
Optimal side-chain
orientations in Kemp eliminase active site determined
for P3–P7 mutants. DTSS values are given in units of kcal/mol.
Optimal side-chain
orientations in Kemp eliminase active site determined
for P3–P7 mutants. DTSS values are given in units of kcal/mol.
Discussion
The presented scheme
correctly explains the effect of mutations
leading from KE07 to R7–R10/11G, providing an atomistic insight
into the possible role of each residue contribution to the activation
barrier lowering, emerging from first principles.The most significant
result of this analysis is an effective approximate
solution of the “inverse catalysis problem” confronted
and confirmed for the first time by experimental results. We have
shown that, based on the information provided by the catalytic field,
the search space could be narrowed down to a limited number of computational
steps. The underlying electrostatic differential transition state
stabilization DTSS concept[25,26] has been confirmed
later in recent independent studies.[18,23,31] Jindal and co-workers analyzed the differences between
evolutionary pathways of two Kemp eliminase families: descendants
of KE07 and HG3.[23] They noted that, in
the first group, the improvement arises from ground-state destabilization,
whereas, in the second one, te improvement results from an increase
in transition-state stabilization (TSS). This is consistent with our
more general DTSS and catalytic field approach, which covers both
effects simultaneously.[25,26] Moreover, some elements
of differential stabilization of the transition state approach appeared
later in two other works,[18,22] which provides further
reinforcement for this idea.In summary, the role of residues
7, 19, 123, 146, 202, and 224
modified in laboratory-directed evolution experiments, which are not
in direct contact with reactants, seems to be modulation of the catalytic
field to minimize activation barrier, without changing the reactant
binding pose determined primarily by the amino acids constituting
the first coordination sphere.The essential advantage of the
present approach is the use of cumulative
atomic multipole moments to describe molecular charge redistribution
during the progress of the reaction, capturing the anisotropic character
of molecular electrostatic potentials,[64] which may be more significant in neutral systems.[43] It may be still crucial in proper description of the catalytic
field, since Δ has at least dipole
character, because of the charge conservation principle. Another important
feature of the catalytic field is that, despite special emphasis placed
on charged residues in the present analysis, it can provide more general
information; by considering its gradient Δ⃗, one gains insight into the optimal distribution
of catalytic dipoles in a way analogous to that presented above for
charges (see Figures S1 and S2 in the Supporting
Information). Finally, as mentioned earlier, whenever necessary, the
DTSS energy can always be generalized to include remaining interaction
energy terms defined in interaction energy decomposition schemes,
such as SAPT[39] or HVPT.[40] For example, exchange repulsion and dispersion terms could
be estimated from the corresponding nonempirical atom–atom
functions approximating accurate SAPT results.[65,66] An important issue is a nonadditivity of mutations reported recently.[23] Within the presented framework, it could be
included by exploration of the effect of mutation on rotamer distribution
of other residues (and corresponding DTSS energy changes). An example
of such analysis is presented in Figure , where we found that considering the mutual
interactions of residues Asp7, Lys19, Lys146, Arg202, and Arg2246
in P3, P4, P5, P6, and P7 mutants changes the optimal DTSS contribution
of Asp7 several times from −1.1 (P3), 0.0 (P4), −1.1
(P5), 0.0 (P6), and −1.1 (P7) kcal/mol.Such an approach
can be effectively used to obtain rapid estimates
of nonadditive effects in multiple mutants. The general idea presented
in this work may be easily introduced into other existing protocols,
such as an ensemble of short MD[67] or Monte
Carlo simulations.[36] Application of static
catalytic field gradients (dynamic electrostatic catalytic fields
DECF)[47] allows one to predict the catalytic
role of proton transfer in enzyme hydrogen-bond networks.[48]
Authors: Sagar D Khare; Yakov Kipnis; Per Greisen; Ryo Takeuchi; Yacov Ashani; Moshe Goldsmith; Yifan Song; Jasmine L Gallaher; Israel Silman; Haim Leader; Joel L Sussman; Barry L Stoddard; Dan S Tawfik; David Baker Journal: Nat Chem Biol Date: 2012-02-05 Impact factor: 15.040
Authors: Wiktor Beker; Karol M Langner; Edyta Dyguda-Kazimierowicz; Mikołaj Feliks; W Andrzej Sokalski Journal: J Comput Chem Date: 2013-05-21 Impact factor: 3.376
Authors: Anastassia N Alexandrova; Daniela Röthlisberger; David Baker; William L Jorgensen Journal: J Am Chem Soc Date: 2008-11-26 Impact factor: 15.419