Kay S Schaller1,2, Jeppe Kari1, Gustavo A Molina1, Kasper D Tidemand3, Kim Borch3, Günther H J Peters2, Peter Westh1. 1. Department of Biotechnology and Biomedicine, Technical University of Denmark, Søltofts Plads, DK-2800 Kgs. Lyngby, Denmark. 2. Department of Chemistry, Technical University of Denmark, Kemitorvet, DK-2800 Kgs. Lyngby, Denmark. 3. Novozymes A/S, Biologiens Vej 2, DK-2800 Kgs. Lyngby, Denmark.
Abstract
While heterogeneous enzyme reactions play an essential role in both nature and green industries, computational predictions of their catalytic properties remain scarce. Recent experimental work demonstrated the applicability of the Sabatier principle for heterogeneous biocatalysis. This provides a simple relationship between binding strength and the catalytic rate and potentially opens a new way for inexpensive computational determination of kinetic parameters. However, broader implementation of this approach will require fast and reliable prediction of binding free energies of complex two-phase systems, and computational procedures for this are still elusive. Here, we propose a new framework for the assessment of the binding strengths of multidomain proteins, in general, and interfacial enzymes, in particular, based on an extended linear interaction energy (LIE) method. This two-domain LIE (2D-LIE) approach was successfully applied to predict binding and activation free energies of a diverse set of cellulases and resulted in robust models with high accuracy. Overall, our method provides a fast computational screening tool for cellulases that have not been experimentally characterized, and we posit that it may also be applicable to other heterogeneously acting biocatalysts.
While heterogeneous enzyme reactions play an essential role in both nature and green industries, computational predictions of their catalytic properties remain scarce. Recent experimental work demonstrated the applicability of the Sabatier principle for heterogeneous biocatalysis. This provides a simple relationship between binding strength and the catalytic rate and potentially opens a new way for inexpensive computational determination of kinetic parameters. However, broader implementation of this approach will require fast and reliable prediction of binding free energies of complex two-phase systems, and computational procedures for this are still elusive. Here, we propose a new framework for the assessment of the binding strengths of multidomain proteins, in general, and interfacial enzymes, in particular, based on an extended linear interaction energy (LIE) method. This two-domain LIE (2D-LIE) approach was successfully applied to predict binding and activation free energies of a diverse set of cellulases and resulted in robust models with high accuracy. Overall, our method provides a fast computational screening tool for cellulases that have not been experimentally characterized, and we posit that it may also be applicable to other heterogeneously acting biocatalysts.
Most
enzymatic reactions, both in vivo[1] and
in industrial applications,[2] occur
at an interface and hence represent heterogeneous (bio)catalysis.
One important industrial example is so-called saccharification, where
plant biomass is enzymatically deconstructed into small sugars for
subsequent fermentation to biofuels and green alternatives to petrochemicals.[3] Several other technical enzyme applications also
entail modification of an insoluble substrate by soluble enzymes.[2,4] Nevertheless, general kinetic descriptions of heterogeneous biocatalysis
remain scarce and incomplete, and this restricts both mechanistic
understanding and rational design of industrial enzymes.[5] Recent work has suggested that the efficacy of
heterogeneous enzymes may be rationalized along the lines of the Sabatier
principle.[6] This concept[7] states that efficient catalysis occurs at an intermediate
strength of substrate–catalyst interactions, and it is well
established within (nonbiochemical) heterogeneous catalysis.[8] In this field, it makes up a valuable framework
for analysis of microkinetic models and it provides guidance for the
computer-assisted design of inorganic catalysts.[9] In the current work, we explore related applications for
heterogeneous biocatalysts using cellulases as an example. Our starting
point is the experimental observation[10] of a linear free-energy relationship (LFER) between binding energy
and the activation barrier for a wide range of cellulases (see Figure ). This scaling has
the important corollary that it links the two customary enzyme kinetic
parameters KM (which is related to substrate
binding strength,[10,11] see eq ) and kcat = Vmax/E0 (which is
related to the activation energy of the rate-limiting step at a steady
state). In practice, this means that if the linkage of KM and kcat can be specified
from a workable number of experiments, one could obtain detailed insights
into the function of uncharacterized enzymes if just one of the kinetic
parameters can be determined in silico. Then, the other could subsequently
be estimated from the LFER. The computed kinetic parameters will pertain
specifically to the conditions (temperature, pH, type of substrate,
etc.) of the empirical LFER. Nevertheless, the approach could open
up for in silico comparative biochemistry and hence contribute to
the elucidation of the sequence–function relationship for heterogeneous
enzyme reactions. One application of such a tool would be to predict
and rationalize catalytic properties of isoenzymes from different
organisms. Another would be computer-aided enzyme design, which strives
to find variants with desired properties for technical applications.
Figure 1
Simplified
energy diagram of the processive cycle of cellulases
and the schematic representation of the LFER for different cellulases
acting on the same substrate.[12] Desorption
has been found to be the rate-limiting step of the processive cycle.[13−16] The experimental LFER provides a simple correlation between the
strength of the enzyme–substrate interaction and the maximal
turnover. This, in turn, opens for fast computational assessment of
enzyme kinetics because the turnover can be predicted from the LFER
if accurate estimates of binding strength can be determined in silico.
Simplified
energy diagram of the processive cycle of cellulases
and the schematic representation of the LFER for different cellulases
acting on the same substrate.[12] Desorption
has been found to be the rate-limiting step of the processive cycle.[13−16] The experimental LFER provides a simple correlation between the
strength of the enzyme–substrate interaction and the maximal
turnover. This, in turn, opens for fast computational assessment of
enzyme kinetics because the turnover can be predicted from the LFER
if accurate estimates of binding strength can be determined in silico.In silico determination of binding free energies
is, in general,
much easier than computational assessment of activation free energies
as the former can be done within the framework of molecular dynamics
(MD)/molecular mechanics (MM) and the latter needs, at least partly,
expensive quantum mechanical (QM) calculations. While QM/MM simulations
became common for mechanistic studies in enzymology, they still remain
computationally expensive.[17−24] It follows that implementation of the approach discussed above should
rely on computed binding free energies and subsequent estimate of
the activation energy based on an experimental LFER. However, the
computational description of complex formation for an interfacial
enzyme reaction is not trivial. In the case of cellulases, there are
still many open questions regarding their interplay with the surface
of the solid substrate.[25] Computational
methods have previously been used to probe these interfacial interactions
on a molecular level.[26−41] This type of work has provided important mechanistic insights, but
the setup of large enzyme–substrate systems with multiple domains
and phases remains cumbersome and their study computationally expensive.
While binding and QM/MM calculation have been reported for single
domain cellulases on a shorter substrate,[26,30,33,35,37,38] the same has not yet,
to the best of our knowledge, been feasible for multidomain cellulases
in complex with their native, insoluble substrate.The exploitation
of experimental LFERs in computational analysis
of enzymes will require robust methods to estimate binding energies
of multidomain, interfacial enzymes with low computational cost. One
established method is the so-called linear interaction energy (LIE)
approach.[42−51] LIE approximates the free energy of binding from the end states
via linear-response approximations. It utilizes the average electrostatic
and van der Waals (vdW) interaction energies from MD simulations.
A common form of the classical LIE equation iswhere α is the scaling
factor for the
van der Waals interactions (vdW), β is the scaling factor for
the electrostatic interaction (Coul), γ is an offset parameter,
and ΔUi is the change in internal
energy for the energy term i compared to a reference.[49] LIE parameters are obtained empirically—either from
the literature or through fitting to experimental data.[52] LIE has successfully predicted binding free
energies of small molecules to proteins,[42,44,45,47−49] but we are unaware of earlier applications to multidomain proteins
and surface binding.Many cellulases have a modular structure
(see Figure ) consisting
of a catalytic
domain (CD) and a noncatalytic carbohydrate-binding module (CBM) connected
by a flexible linker.[55−59] CBMs adsorbs to the cellulose surface and hence promotes the proximity
of the enzyme and the substrate. Once adsorbed, the CD abstracts a
single cellulose chain from the cellulose crystal and binds it in
an extended cleft before it hydrolyzes a glycosidic bond. Depending
on the type of cellulase, the catalytic cleft has a different degree
of openness[60] and a varying number of glucopyranose
subsites to accommodate multiple monomers of the polymeric substrate.[25]
Figure 2
Illustration of cellobiohydrolase (CBH) I from glycoside
hydrolase
(GH) family 7 from Trichoderma reesei (Tr), abbreviated as TrCel7A,
bound to a cellulose fibril[53] (black).
The enzyme consists of a catalytic domain (CD, red, PDB 4C4C [29]), a linker (orange), and a carbohydrate-binding
module (CBM, yellow, PDB 2MWK [54]). The ligand
is a single cellulose strand, abstracted from the fibril surface and
threaded into the CD and is highlighted in gray.
Illustration of cellobiohydrolase (CBH) I from glycoside
hydrolase
(GH) family 7 from Trichoderma reesei (Tr), abbreviated as TrCel7A,
bound to a cellulose fibril[53] (black).
The enzyme consists of a catalytic domain (CD, red, PDB 4C4C [29]), a linker (orange), and a carbohydrate-binding
module (CBM, yellow, PDB 2MWK [54]). The ligand
is a single cellulose strand, abstracted from the fibril surface and
threaded into the CD and is highlighted in gray.In this study, we propose a two-domain LIE method (2D-LIE), which
accounts for substrate interactions of both CD and CBM. This method
may be used as a flexible and computationally inexpensive approach
to assess cellulase binding strength. We demonstrate that the combination
of 2D-LIE and an experimental LFER can be effective for in silico
prediction of enzyme kinetic parameters. We envision that the approach
can be more widely applicable both in attempts to assess the kinetics
of uncharacterized isoenzymes identified, for example, through metagenomics
and in computer-aided design of enzyme variants with improved kinetic
properties.
Materials and Methods
2D-LIE
Model
The classical LIE equation
(eq ) was expanded to
include one electrostatic and one van der Waals term per protein domain.
In the case of cellulases with two domains, this leads to a 2D-LIE
model withwhere αi (i = CD, CBM) is
the van der Waals interaction scaling parameter of the domain i, βi is the electrostatic interaction scaling parameter, and Δ⟨Uij ⟩ is the change of the average internal energy of term j
(j = Coul, vdW) of the domain i compared to the reference. In this
work, a common reference within the data set was used, resulting in
ΔΔG values relative to the reference
(see the Supporting Information for further
detail). LIE is usually applied for the assessment of the binding
of a series of small molecules toward a common receptor. For cellulases,
the substrate (cellulose) remains the same, while different enzymes
are evaluated. The use of a common ligand alleviates possible systematic
errors in the modeling of the still challenging carbohydrate–protein
binding.[61−63] Additionally, both simulations—CD in complex
with polymeric cellononaose and CBM bound to a cellulose crystal—differ
from typical small-molecule binding studies. Therefore, no empirical
parameters from the literature could be used for the 2D-LIE model.
Instead, the LIE parameters were obtained through fitting to experimental
values. In the development of the 2D-LIE model, we made several assumptions.
First, the 2D-LIE model does not include the linker (see Figure ). The role of the
linker has been found to be more complex than simple spacing between
the CBM and CD, and its function is still not fully understood.[64−66] Therefore, the linker is disregarded for simplicity. Second, the
assumption is made that the unspecific binding of the CD toward the
crystal surface is negligible compared to the productive binding of
the threaded cellulose chain in the binding tunnel. Third, we do not
include any glycosylation in our setup. Most of the glycosylation
lies on the neglected linker and the glycosylation sites on the CD
are located away from the binding tunnel.[25]
Data Set Preparation
For the development
of the method, we used the recent kinetic data set from Kari et al.[10] for cellulases. As the method does not account
for any linker interaction, all variants with modifications in the
linker from this set were disregarded. GH family 12 was disregarded
due to the high outlier ratio in the experimental data set. This resulted
in a data set of 65 cellulases (see Table S1 in the Supporting Information for a comprehensive list), which contained
a diverse range of cellulases with different folds, catalytic mechanism
(inverting/retaining), substrate preferences (reducing/nonreducing
end), mode of attack (exo/endo acting), domain composition (with/without
CBM), and organisms of origin (see Figure ). All enzymes were of fungal origin and
about half were wildtypes. The remainder were different types of variants
made with the overall purpose of adjusting the substrate binding strength.
The reported Michaelis constant KM and
maximal turnover number kcat = Vmax/E0 were converted
to the corresponding binding or activation free energies using the
standard approach[10,11,67]where κ is either KM or kcat. As a reference
(κref), we used the parameters KM,ref and kcat,ref from Cel6A
of T. reesei. TrCel6A
is a well-characterized cellulase and its binding strength lies in
the middle of the investigated affinity range, making its experimental
measurement easier and propagated errors smaller. It follows that
all ΔΔG values reported below specify
the difference with respect to TrCel6A.
Figure 3
Illustration
of representatives from all GH families studied in
this work. CDs bound to cellononaose are shown on the left, CBMs bound
to a cellulose crystal on the right (all from CBM family 1[25]). TrCel7A and TrCel6A are processive cellobiohydrolases (CBHs) with a closed binding
tunnel, while TrCel7B, HiCel45A,
and TrCel5A are endoglucanases (EGs) with a more
open cleft.
Illustration
of representatives from all GH families studied in
this work. CDs bound to cellononaose are shown on the left, CBMs bound
to a cellulose crystal on the right (all from CBM family 1[25]). TrCel7A and TrCel6A are processive cellobiohydrolases (CBHs) with a closed binding
tunnel, while TrCel7B, HiCel45A,
and TrCel5A are endoglucanases (EGs) with a more
open cleft.
Structure
Preparation
For all enzymes,
UniProt[68] or GenBank[69] entries were gathered and simulations for the CDs were
set up. All sequences were run through InterProScan (5.39–77.0)[70,71] and searched against the Superfamily
database.[72] If a CBM was found, an additional
simulation was set up for this domain. If available, a crystal structure
(CS) was taken from the Protein Data Base (PDB).[73] For the majority of enzymes, no crystal structure was available
(47 out of 65 CDs, 25 out of 42 CBMs), and the partial sequences annotated
to contain the domains were used for homology modeling (HM) with MODELLER (2.2.00).[74−77] All CD systems were simulated bound to a cellononaose
ligand. CBM systems were initialized unbound above the cellulose surface. Figure provides a processing
scheme, and further details concerning the structure preparation can
be found in the Supporting Information.
Figure 4
Schematic
illustration of the preparation process for a two-domain
protein (here TrCel7A) for the 2D-LIE approach.
Schematic
illustration of the preparation process for a two-domain
protein (here TrCel7A) for the 2D-LIE approach.
General MD Settings
The CHARMM36
force field was used to describe all systems.[78−81] The topologies for the nonaose
ligand and the cellodecaose fibers of the crystal were prepared with
the CHARMM GUI.[82] All simulations were run in GROMACS (2018.6).[83−89] Boxes with minimal edge distances of 1.4 nm were constructed and
solvated with TIP3P water (see the Supporting Information for structure preparation).[90] To neutralize the net charge of the system, random water
molecules were exchanged with ions. All minimization steps were done
in a steepest-descent over 10 000 iterations. A time step of
2 fs was used. The long-range electrostatics were treated with the
particle-mesh Ewald method with a cubic interpolation and a cutoff
of 12 Å.[91] Van der Waals interactions
were treated in a Verlet scheme with a cutoff distance of 12 Å
and a switching function for the forces starting at 10 Å.[92] Hydrogen bonds were restrained using the LINCS
algorithm.[93] The solutes and the solvent
were coupled to individual heat baths with a Berendsen thermostat.[94] Pressure coupling was done with a Parrinello–Rahman
barostat.[95] Analysis of the trajectories
was performed with GROMACS. The trajectories
were visualized in PyMOL (2.3.3).[96]
Simulations of the CD Systems
Minimization
was conducted in three steps: First, keeping all solutes restrained.
Second, keeping only the protein restrained. Lastly, allowing the
complete system to move freely. Afterward, NVT simulations
with incremental temperature (100–300 K in 50 K steps) were
performed in succession for 20 ps each with restrains on the solute.
Thereafter, NPT simulations with and without restraints
on the solute were performed for 100 ps in series. The production
was run in the NPT ensemble at 300 K for 10 ns without
any restrains applied and energies were recorded every 10 ps. This
simulation length was chosen, as the increase of performance of the
method levels out at this time scale (see the Supporting Information).
Simulations
of the CBM Systems
The
CBM systems were treated in a similar fashion than the CD systems
but with a few differences. During the second minimization, only the
crystal was restrained. During all simulations, restrains were applied
to the crystal. After the NPT equilibration with
restrains on the solute, a soft-docking step was inserted by performing
a steered MD (SMD) simulation. The pull rate was set to −0.01
nm/ps along the direct connection of the center of masses of the crystal
and the CBM in all dimensions. The SMD was performed over 200 ps.
Afterward, the same workflow as for the CD was resumed.
Analysis
The 2D-LIE fit and subsequent
analysis were performed with Python (3.7.3).[97] The first 100 ps were disregarded and the average
energy terms over the rest of the production simulation were extracted
from the GROMACS output files. For the CD,
the electrostatic and van der Waals interaction energy terms between
the ligand and its surroundings as well as the ligand with itself
were extracted. For the CBM simulation, the energy terms between the
protein and its surroundings were taken. The usage of different LIE
terms per domain has several advantages. It equalizes the different
energy terms taken per domain and allows individual weighing of the
binding contribution for each domain. The sign for UCBMvdW changes
as compared to the corresponding CD term, which is captured by the
scaling parameter. The CD simulations and CBM simulations were normalized
by subtracting the energy values obtained from the TrCel6A CD simulation and TrCel6A CBM simulation,
respectively. The multidimensional fit was performed with SciPy (1.5.2).[98]
Results and Discussion
The energies obtained from the
simulations were used to fit 2D-LIE
models to the binding free energies derived from KM and to the activation free energies derived from kcat according to eq . For fitting, three different approaches
were employed—first, a global fit to all investigated cellulases;
second, a 5-fold cross-validation (CV) over the data set; and lastly, a fit toward
a small representative subset of well-known enzymes from model organisms.[25] The subset consisted of seven enzymes from the
industrially relevant fungi T. reesei and Humicola insolens (Hi), specifically TrCel6A (WT and a CBM-less variant), TrCel7A (WT and a CBM-less variant), TrCel7B, TrCel5A, and HiCel45A.[25] The three types of approaches (subset/CV/global)
were applied using two different sets of experimental values (KM and kcat). This
resulted in six sets of 2D-LIE parameters (see Table ). The parameters obtained by the fittings
were used to predict the binding of the remaining cellulases.
Table 1
2D-LIE Parameters (Equation ), Root-Mean-Square Errors
(RMSEs), and Pearson’s Correlation Coefficients (r2) Obtained through Fits to the Complete Data Set (Global),
5-Fold Cross-Validation (CV), or a Small Representative Subseta
exp.
fit
αCD
βCD
αCBM
βCBM
γ
RMSE [kJ/mol]
r2
KM
subset
0.286 ± 0.042
0.094 ± 0.063
0.054 ± 0.015
–0.008 ± 0.003
–0.119 ± 0.946
1.914
0.76
global
0.298 ± 0.031
0.131 ± 0.018
0.047 ± 0.011
–0.006 ± 0.002
–0.686 ± 0.339
1.827
0.78
CV
0.30(11)
0.131(11)
0.047(3)
–0.006(4)
–0.68(12)
1.85(11)
0.776(24)
kcat
subset
0.226 ± 0.054
0.078 ± 0.082
0.034 ± 0.019
–0.006 ± 0.004
–0.016 ± 1.229
1.582
0.75
global
0.233 ± 0.024
0.105 ± 0.014
0.032 ± 0.009
–0.004 ± 0.002
–0.891 ± 0.266
1.433
0.77
CV
0.23(1)
0.11(1)
0.032(3)
–0.004(8)
–0.89(11)
1.45(1)
0.767(3)
For the global
and subset cases,
error estimation of the fitting parameters are provided. For the 5-fold
cross-validation, the standard deviation is reported. For the subset
fitting and cross-validation, RMSEs and correlation coefficients were
derived from nonfitted entries.
For the global
and subset cases,
error estimation of the fitting parameters are provided. For the 5-fold
cross-validation, the standard deviation is reported. For the subset
fitting and cross-validation, RMSEs and correlation coefficients were
derived from nonfitted entries.
Binding Energy Prediction
Binding
strengths were derived from the subset scaled with experimental values
(eq ) with the Pearson’s
correlation coefficient of r2 = 0.76 and
a root-mean-square error (RMSE) of 1.91 kJ/mol for the binding energy
(see Figure a). The
5-fold CV resulted in r2 = 0.78 and RMSE
of 1.85 kJ/mol. A global fit over all 65 cellulases resulted in r2 = 0.78 and RMSE of 1.85 kJ/mol. For all three
cases, the RMSE was around the typical accuracy of LIE models.[52] The obtained 2D-LIE parameters and performance
indicators can be found in Table and the corresponding binding energy results in Table S2 (Supporting Information).
Figure 5
(a) Predicted
binding free energies and (b) predicted free activation
energies from the 2D-LIE models versus the experimental values. A
small representative subset was used to derive the fitting parameters.
(a) Predicted
binding free energies and (b) predicted free activation
energies from the 2D-LIE models versus the experimental values. A
small representative subset was used to derive the fitting parameters.
Activation Energy Prediction
The
prediction of activation free energies derived from the subset resulted
in r2 = 0.75 and RMSE of 1.58 kJ/mol (see Figure b). The 5-fold CV
resulted in r2 = 0.77 and RMSE of 1.45
kJ/mol. The model derived from the global fit yielded r2 = 0.77 and RMSE of 1.43 kJ/mol. As shown previously,
the RMSE is around the typical accuracy of LIE models for all three
cases.[52] Overall, the parameters and the
results show the same trend as for the prediction of the binding energy.
The performances of the models are comparable to the ones of the previous
models for the binding free energies, even though the underlying experimental
measure is more error prone.
Parameters and Performance
All three
approaches (subset/CV/global) yielded very similar fitting parameters
and performances for both target values (KM/kcat). Especially, the low standard
deviation of the parameters and performance indicators for the CV
indicate that the fit is very robust. The performance of the subset
fitting cases was quite similar to the global/CV approach, indicating
that it is possible to build a rather robust 2D-LIE model from an
experimental data set of a manageable size.The van der Waals
scaling factors (αCD) seem reasonable as similar
values have been reported for other systems.[49,99] For the β electrostatic scaling factor, Hansson et al.[42] observed that hydroxyl groups lower the parameter.
They found β = 0.43 for apolar compounds, β = 0.37 for
compounds bearing a single hydroxyl group, and β = 0.33 for
compounds bearing multiple hydroxyl groups. Therefore, low βCD values for cellononaose with a multitude of hydroxyl groups
seem reasonable. The CBM simulations (small proteins bound to a crystalline
surface) are a further stretch from the known application of LIE for
small-molecule drugs and cannot be readily compared to literature
values. The CBM scaling factors αCBM and βCBM are smaller than the CD values, which is in line with the
experimental observation that CBM contributes less to the binding
free energy compared to the CD.[12,15,100] The magnitude of βCBM is smaller compared to αCBM, which is in line with results in the literature[38,101,102] and reflects that the crystal
face pointing toward the CBM is hydrophobic and, therefore, the van
der Waals terms of binding are more important than the electrostatic
interactions in contrast to the CD.
Robustness
and Transferability
To
test the workflow, domains with available crystal structures (CS)
were remodeled using the same approach and the obtained values were
compared to those obtained from simulations with their crystal structures
(see the Supporting Information). On average,
a mean-absolute error MAECS↔HMbind,pred = 0.13 kJ/mol and an MAECS↔HM‡,pred = 0.08 kJ/mol was observed between the prediction using the crystal
structure and remodeling of the same. These changes are well below
the general precision of the method and illustrate the robustness
of the workflow as well as the ease of modeling of these well-studied
enzyme families.The built 2D-LIE models with their parameters
and their predictive capability should work for all enzymes for which
the underlying experimental LFER holds. As Kari et al.[10] state, this should be the case for all cellulases.
Predictions for values at different experimental conditions would
require a new fitting to experimental values at these conditions or
even new simulations (e.g., different temperature). While this is
admittedly a narrow use case, our intention within this work is to
present a broader, more general applicable approach to computationally
predict kinetic parameters of heterogeneous enzymes as a whole, rather
than the direct usage of the specific obtained parameters. Still,
the prediction of kinetic properties of cellulases, in particular,
is interesting for the industry. Furthermore, Kari et al.[10] claim a general occurrence of LFERs for heterogeneous
enzyme–substrate systems and, therefore, a similar approach
to in silico to predict their kinetics should be applicable. Beyond
the prediction of catalytic rates by assessment of binding strength,
the 2D-LIE approach itself can potentially be used to model the binding
of multidomain proteins, which constitute 65% of all eukaryotic proteins.[103] Naturally, the basic split approach of the
method is more sensible for proteins with loosely connected domains
acting on different parts of the substrate.
In Silico Prediction of
Kinetic Parameters
To illustrate the application of this
model, the two major cellulases
from GH family 7 of the industrial workhorse Aspergillus
niger [104] were investigated,
as detailed in the Supporting Information. The 2D-LIE parameters obtained from the global fit were used to
predict KM and kcat for both enzymes. Even though they differ in modularity
(±CBM), we found quite similar values for the kinetic parameters
(see Figure ). No
experimental kinetic parameters are available for these enzymes and
we await such data for further assessment of the approach. Prediction
of parameters at experimental conditions different from those used
in the underlying experiments[10] would require
new experiments to establish the relevant scaling relation.
Figure 6
Homology models
of the investigated CBHs from A.
niger with the predicted catalytic rates and Michaelis–Menten
constants.
Homology models
of the investigated CBHs from A.
niger with the predicted catalytic rates and Michaelis–Menten
constants.
Conclusions
and Outlook
Heterogeneous enzyme reactions are widespread
in the nature and
industry, but compared to bulk processes, they are generally poorly
understood on the molecular level. This deficiency is linked to the
complexity of interfacial reaction mechanisms[29,105] but also reflects a shortage of rigorous and comparative kinetic
data. This shortage results from limitations of both assays technologies
and kinetic theory, and there is currently no generally accepted rate
equation for enzyme reactions at interfaces. As a result, structure–function
relationships remain poorly developed compared to bulk enzymology.
Here, we addressed this by proposing an in silico method for the assessment
of kinetic parameters of cellulases acting on their insoluble substrate.
The approach relies on the recent experimental observation of an LFER[10] for a wide group of cellulases with different
structures and mechanisms. When this type of scaling indeed occurs,
the computationally challenging task of determining the activation
free energies for different enzymes can be replaced by more tractable
assessments of binding energy. Subsequently, calculated binding free
energies are readily linked to activation free energies through the
LFER (see Figure ),
and estimated values of KM and kcat can be derived. However, the practical importance
of this strategy relies on the availability of robust and computationally
cheap methods to quantify binding strength in silico. Such (fast)
methods have the potential to cover reasonable sequence spaces and
hence assess the function of many wildtypes or potential enzyme variants.
Here, we investigated the potential of an LIE-based approach in this
regard. We found that LIE principles could be expanded to compute
binding strengths of two-domain cellulases on the surface of their
natural, insoluble substrate (2D-LIE). The proposed method is computationally
relatively inexpensive,[52] and in combination
with homology modeling and the experimental LFER, it allows rapid
in silico prediction of KM and kcat for cellulases belonging to different structural
and mechanistic classes. Computational convenience was obtained through
a number of simplifying assumptions including negligible nonspecific
adsorption to the solid surface of both linker and CD. Extensive testing
against a large data set suggested that the 2D-LIE method was able
to reproduce experimental binding strengths, and this supported the
validity of the underlying assumptions. Henceforth, the method could
be used for exploratory screening of interfacial enzymes with known
substrate specificity and with unknown kinetic parameters, provided
an LFER can be established. We demonstrated this aspect by computationally
predicting the kinetic parameters of the GH family 7 enzymes of the
industrially relevant fungus A. niger. The principle of using LFER as a means to simplify computational
methods for catalyst design came from the field of inorganic heterogeneous
catalysis.[9] While many differences between
this field and heterogeneous biocatalysis may be identified, we propose
that the common occurrence of LFER reflects general properties and
limitations of interfacial catalysis. This generality may call for
optimism regarding the implementation of principles from heterogeneous
catalysis within interfacial enzymology. Moreover, it may infer that
linear scaling relations are common and hence exist for other interfacial
enzymes than cellulases. If indeed so, the approach sketched out here
could have broader importance in virtual screening of isoenzymes acting
on insoluble substrates and as a tool for the design of efficient
industrial enzymes. This approach is particularly attractive, if a
rapid computational method for enzyme–substrate binding free
energies can be developed, as this would open up for phenotypical
assessments of larger sequence spaces and benefit both engineering
and discovery of interfacial enzymes. The ability to assess biochemical
parameters in silico also appears timely in light of the rapidly expanding
gap between the wealth of protein sequence data on the one hand and
limited records of biochemical characterization on the other.
Authors: Olgun Guvench; Sairam S Mallajosyula; E Prabhu Raman; Elizabeth Hatcher; Kenno Vanommeslaeghe; Theresa J Foster; Francis W Jamison; Alexander D Mackerell Journal: J Chem Theory Comput Date: 2011-10-11 Impact factor: 6.006
Authors: David Van Der Spoel; Erik Lindahl; Berk Hess; Gerrit Groenhof; Alan E Mark; Herman J C Berendsen Journal: J Comput Chem Date: 2005-12 Impact factor: 3.376
Authors: Silke Flindt Badino; Jenny Kim Bathke; Trine Holst Sørensen; Michael Skovbo Windahl; Kenneth Jensen; Günther H J Peters; Kim Borch; Peter Westh Journal: Protein Eng Des Sel Date: 2017-07-01 Impact factor: 1.650
Authors: Mark R Nimlos; Gregg T Beckham; James F Matthews; Lintao Bu; Michael E Himmel; Michael F Crowley Journal: J Biol Chem Date: 2012-04-10 Impact factor: 5.157
Authors: Anna S Borisova; Elena V Eneyskaya; Suvamay Jana; Silke F Badino; Jeppe Kari; Antonella Amore; Magnus Karlsson; Henrik Hansson; Mats Sandgren; Michael E Himmel; Peter Westh; Christina M Payne; Anna A Kulminskaya; Jerry Ståhlberg Journal: Biotechnol Biofuels Date: 2018-01-13 Impact factor: 6.040