Morris E Sharp1, Francisco X Vázquez2, Jacob W Wagner1, Thomas Dannenhoffer-Lafage1, Gregory A Voth1. 1. Department of Chemistry, James Franck Institute, and Institute for Biophysical Dynamics , The University of Chicago , Chicago , Illinois 60637 , United States. 2. Department of Chemistry , St. John's University , Queens , New York 11439 , United States.
Abstract
Standard low resolution coarse-grained modeling techniques have difficulty capturing multiple configurations of protein systems. Here, we present a method for creating accurate coarse-grained (CG) models with multiple configurations using a linear combination of functions or "states". Individual CG models are created to capture the individual states, and the approximate coupling between the two states is determined from an all-atom potential of mean force. We show that the resulting multiconfiguration coarse-graining (MCCG) method accurately captures the transition state as well as the free energy between the two states. We have tested this method on the folding of dodecaalanine, as well as the amphipathic helix of endophilin.
Standard low resolution coarse-grained modeling techniques have difficulty capturing multiple configurations of protein systems. Here, we present a method for creating accurate coarse-grained (CG) models with multiple configurations using a linear combination of functions or "states". Individual CG models are created to capture the individual states, and the approximate coupling between the two states is determined from an all-atom potential of mean force. We show that the resulting multiconfiguration coarse-graining (MCCG) method accurately captures the transition state as well as the free energy between the two states. We have tested this method on the folding of dodecaalanine, as well as the amphipathic helix of endophilin.
Biomolecular
systems are inherently multiscale, involving the interplay of many
length and time scales working in concert. As a result, the theoretical
methods used to understand these systems must be able to take into
account this multiscale nature. Although molecular dynamics (MD) simulations
can be used to successfully probe biomolecular processes and interactions
in ways that can complement experimental results,[1] standard MD methods are limited to length and time scales
that may be unable to accurately characterize many biomolecular phenomena.
These systems require the use of multiscale MD methods,[2−4] where low resolution, or coarse-grained (CG), models of biological
molecules are developed to reduce the computational cost of such simulations,
all while attempting to maintain the fundamental physics and chemistry
underlying the biomolecular processes.[4,5] The use of
low-resolution CG models can be used to probe important processes
such as protein–protein interactions,[6−8] membrane remodeling
by proteins,[9−16] and large-scale protein conformational changes.[17]Current CG models are often unable to accurately
reproduce large conformational changes in proteins[2,3] and
are thus typically best suited to model a single protein conformation.
However, even CG models that may be able to capture multiple protein
conformations are still unable to reproduce the free energy difference
between the two states and the transition barrier.[18,19] This has been seen for CG models that were parametrized with a top-down
approach, incorporating information from experiment, or a bottom-up
approach, where high frequency motions in higher resolution all-atom
(AA) simulations are integrated out to generate smoother, but less
detailed interaction potentials. The popular MARTINI protein force
field,[20,21] a top-down CG model, requires an elastic
network on the peptide backbone to maintain secondary structure and
is therefore restrained to the native state.[3,22] Top-down
models, including OPEP[23] and MARTINI, in
general are not guaranteed to generate the correct kinetics or sample
the correct distribution of states. Bottom-up CG approaches, including
relative entropy minimization[24] (REM) and
multiscale coarse-graining[25−27] (MS-CG), can at best capture
portions of multiple conformations but less accurately capture the
free energy difference between the states and the transition barrier.[28−30] This limitation of MS-CG and REM is not inherent to the methods
themselves but instead to the practical considerations when building
the models, including limiting the models to pair interactions and
not having unique interactions between all CG sites. This deficiency
of current CG modeling techniques impedes the study of complex, multiscale
biophysical phenomena, including, for example, actin polymerization,
where the ATP hydrolysis within actin as well as actin conformational
change drive polymerization and depolymerization,[31] or molecular motors such as myosin where ATP hydrolysis
and other effects drive large scale conformational change.Much
of the previous work on multiconfiguration CG modeling has been focused
on developing fine-grained, Cα elastic network models (ENMs)
of proteins that can switch between two or more equilibrium states.[32−36] These methods treat the individual equilibrium configurations as
discrete states and use a two-state-like mixing approach to couple
the discrete states while also providing the smooth saddle point between
the states. The benefit of such an approach is that the CG model can
sample configurations, or states, corresponding to very different
protein structures, but switching between structures in these simulations
is not parametrized in a multiscale manner that accurately reproduces
key features in the potential of mean force (PMF) for the conformational
transition, such as the transition barrier and the free energy difference
between the conformational states. A recent study by Bereau and Rudzinski[37] similarly separates the system into distinct
conformational basins and uses a surface hopping-like scheme to switch
between the states describing those basins. However, constraints need
to be placed on the system to reproduce the correct probability distribution
of states and the algorithm also does not conserve energy on an overall
continuous CG potential.In this work, we introduce a new multiconfiguration
coarse-graining (MCCG) method that can both switch between CG models
that describe different parts of the molecular conformational during
the course of a MD simulation and directly take into account key features
of the AA PMF to couple the individual CG models. (Hence the term
“multiconfigurational” is used for the method.) This
multiscale approach results in CG simulations that can reproduce the
transition barrier and the free energy difference between the states
of the AA PMF but with much fewer degrees of freedom than the AA simulation.
In the MCCG method presented here, a two-state mixing approach is
used, where the diagonal elements of a two state Hamiltonian-like
matrix represent the free energy surface for each of the independently
developed CG models. MCCG thus at the outset takes advantage of the
strength of CG models to accurately reproduce protein conformations
in individual free energy basins. Each of these states can represent
the system in a different basin, e.g., folded or unfolded, open or
closed, in membrane or in solution. Importantly, off-diagonal elements
in the 2 × 2 quantum-like matrix represent the thermodynamic
coupling between the CG states. However, that coupling is assumed
to be only a function of a few key CG collective variables (CVs) that
can well describe the transition between states. In turn, this coupling
can be directly calculated from the AA PMF in a way that incorporates
the differences between the CG potential energy surfaces and the full
AA PMF. The MCCG methodology thus results in a multiconfigurational
CG simulation on a continuous CG free energy surface that can reproduce
the features of the AA PMF, including the saddle point barrier, in
terms of CG CVs that describe the transition between the CG structural
states. The resulting model can also reasonably reproduce PMFs along
other CVs that were not a part of the original model development,
with some caveats.
Theory and Methods
Theory
We demonstrate
here a method to build CG models that can accurately reproduce an
AA PMF for conformational change. Our strategy is as follows: a combined
effective free energy surface in the CG variables is defined using
a two-state Hamiltonian-like matrix, whose state elements encompass
two different configurations of a protein, as well as a coupling term
between those two states. The combined energy surface is then obtained
by diagonalizing this Hamiltonian, and the coupling term between the
two configurations is next defined in terms of a projection on one
or more collective variables of the AA PMF.To develop such
CG models that can describe multiple conformational basins and transitions
between them in terms of the many-dimensional CG coordinates , we first
treat the individual structures as discrete “states”
with separate CG force fields and hence diabatic-like energy surfaces,
labeled V11() and V22(). Here,
we are limiting ourselves to the case where only two CG structural
models (states) are considered, but in principle, more than two can
be treated (albeit with added complication). Although our system is
not a quantum mechanical system (it is purely a linear combination
of CG energy surfaces), treating the individual CG structures as discrete
thermodynamic states within an effective quantum-like two-state coupling
picture allows us to conveniently describe the system using a 2 ×
2 matrix Hamiltonian, as in the case of electron transfer[38]where V11() and V22() are the CG energy surfaces
of the discrete (diabatic) thermodynamic CG states, and V12() is a coupling
between the two CG states. In the MCCG model, the V11() and V22() terms are determined by the CG effective force fields describing
the individual “states”, i.e., particular conformational
basin regions in the overall energy landscape of the CG variables.
By using the following characteristic equation, the CG energy surface, E, can be created that includes a smooth saddle point between
the free energy surfaces of the individual CG models:We note
that this approach is similar in spirit to the Empirical Valence Bond
(EVB) approach[39] but that these present
equations are defined at the CG, rather than AA, level as in EVB.If there is a nontrivial solution to the characteristic equation
above, then the secular equation can be solved for the lowest energy
eigenvalue for every value of the CG coordinate :resulting in the following
standard 2 × 2 quantum-like expression for the coupled energy
surface in the CG variables:We note that previously developed
multistate CG ENM methods have either treated the off-diagonal coupling
term, V12(), as a constant[32,33,36] or have used a second-order expansion around the saddle point to
approximate the coupling.[34,40]A key goal in
this work is to develop a method for determining V12() using
information obtained from AA simulations, which allows the CG system
to reproduce one or more AA PMFs in certain collective variables (CVs)
related to the conformational transition. To this end, we can first
solve the solution in eq for the off-diagonal coupling term, V12(), in terms of both the
individual states and the total coupled potential energy surface:Again, we stress that we
are simply using a linear combination function to describe the overall
coarse-grained (CG) system much like a two-state Hamiltonian matrix
in the theory of electron transfer or the empirical valence bond method.
Therefore, the lowest eigenvalue of eq E() should be thought of as the “correct” overall
energy surface for the CG variables that includes the proper transitions
between conformations. Given this, eq should then be considered as the definition of V12().It must be stressed here that the most general form of the
coupling element above, V12(), is exceptionally difficult to know.
In a bottom-up CG model, if one has perfect sampling as well as a
complete physically accurate basis set to represent the CG interactions,
then one will simply have E() and there is no need for the MCCG approach at all.
However, this is rarely if ever going to be the case. It is more likely
the case that any CG approach (bottom-up or top-down or something
in-between) will provide reasonable models for the conformational
basins in the CG energy landscape, and one may even obtain these models
by using different CG methodologies in each basin. As such, in order
to then derive an approximate expression for V12() in which we
can more easily calculate the relevant terms, we make two assumptions.
First, we make the assumption that the CG energy function in the CG
variables can approximate the true many-body AA PMF projected onto
the CG variables. Here, we will thus treat the AA PMF projected onto
the CG variables as the “true” system PMF, but we note
that this function may also be estimated using force-pulling experiments
or constructed “top-down” in some ad hoc (or other)
manner of modeling. In other words, it should not be misunderstood
here that only an AA PMF can be used in the approach about to be presented.
Second, and most importantly, we assume that we can describe the process
of transitioning from energy basin V11() to V22() with a
single, or small collection of, collective variables (CVs) defined
here as . These collective variables
are a low dimensional projection defined from the full set of CG variables, .Therefore, with these
key assumptions stated above in hand, the off-diagonal coupling can
be rewritten in terms of the reduced dimensional PMFs of the CG diabatic
states in terms of the CVs, defined as F11() and F22(), as well as the AA PMF similarly
projected onto the CVs, such thatFor F(), we also
define that there is a function that maps the AA coordinates onto
the CG CV, . Using only the collective
variables also simplifies the calculation of the AA PMF, which can
only be tractably calculated for a small collection of variables.
By including the off-diagonal coupling between states in a way that
is determined directly from the AA PMF, the free energy surface of
the MCCG model can reproduce the AA PMF in terms of CG CVs.We note that, if the components of the coupling term are calculated
perfectly, the MCCG PMF will then match the AA PMF for the chosen
CVs when the MCCG simulation is run for the full set of CG variables . However, entropic and other
contributions from the AA system may not have been fully accounted
for in the CG potential energy terms nor in its projection along the
CVs, . To account for this at least
in part, V12() may be iteratively updated to achieve better agreement between
the AA and CG PMFs along the CVs, .
The coupling term V12() is iteratively updated to improve the resulting MCCG PMF
by adding the difference between the fine-grained (e.g., AA) PMF F() and the MCCG PMF iteration F() to the coupling
term:where α is a tuning parameter to ensure convergence.
Successive updates to the off-diagonal coupling term will then drive
the MCCG PMF to approach the AA PMF along .
Simulation Details
AA Simulations
All-atom (AA) simulations
were performed with the GROMACS (version 5) simulation suite[41] using the CHARMM36 force field.[42] Simulations were integrated with a 2 fs time step. Nonbonded
van der Waals interactions were switched to zero between 1.0 and 1.2
nm. Electrostatic interactions were evaluated using Particle Mesh
Ewald.[43] Bonds to hydrogen were constrained
using the LINCS algorithm.[44] Temperature
was maintained at 310 K using the stochastic velocity rescaling thermostat[45] with a coupling time constant of 0.1 ps. The
pressure was maintained isotropically in constant pressure simulations
using the Parrinello–Rahman barostat[46] at a pressure of 1.0 bar, a compressibility of 4.5 × 10–5, and a coupling time constant of 2.0 ps. Coordinates
were saved every 1 ps.
AA System Initialization and Equilibration
The endophilin amphipathic helix H0 (Sequence: MSVAGLKKQFHKATQKVSEKVG)
and the capped alanine 12-mer were each initialized in a linear configuration
using the Molefacture plugin in VMD.[47] H0
and dodecaalanine were solvated using VMD in a TIP3P water box with
a side length of 6.0 and 4.8 nm, respectively, and ionized with 150
mM NaCl. The two peptides were then energy-minimized using the steepest
descent algorithm to a maximum force of 1000 kJ mol–1 nm–1; the systems converged within 500 steps.
The peptides were then equilibrated for 1 ns in the isothermal–isobaric
ensemble (NPT).
AA Production Simulations
Dodecaalanine
(ALA-AA) was simulated for 1 μs in the canonical ensemble (constant
NVT) to adequately sample the folding/unfolding process. Free energy
surfaces and CG models were obtained from the single, 1 μs simulation.
H0 was simulated (H0-AA) using temperature replica-exchange molecular
dynamics[48] (T-REMD) with 72 temperature
windows exponentially distributed from 310 to 565 K. Each temperature
replica was first equilibrated at the respective temperature for 100
ps. Each temperature replica was then simulated for 100 ns, with exchanges
occurring every 100 steps. The average exchange probability across
all replicas was 25%. Free energy surfaces and CG models were obtained
from the lowest temperature replica.
CG Simulations
Coarse-grained (CG) simulations were carried out using the LAMMPS
simulation package.[49] Simulations were
integrated with a 5.0 fs time step. Temperature was maintained using
the Langevin thermostat[50] with a damping
coefficient of 500 fs. Coordinates were saved every 200 time steps.
MCCG simulations were performed using a newly designed plugin for
the LAMMPS simulation package (https://github.com/mocohen/USER-MCCG), which interfaces with the PLUMED2 plugin.[51] MCCG simulation parameters were the same as the CG simulations.
CG Models
Dodecaalanine
Four solvent free CG models of dodecaalanine
were constructed from the 1 μs AA simulation. Each alanine residue
was represented by a single bead corresponding to the α carbon,
for a total of 12 CG beads for the whole peptide. Two methods for
creating CG models were used: heterogeneous elastic networks[52] (HENM) and Boltzmann Inversion (BI). HENM models
of polyalanine were created using the full trajectory (ALA-HENM) as
well as for the folded configurations (ALA-HENM-F). HENM models were
constructed using a cutoff of 1.5 nm. BI models of polyalanine were
created using the full trajectory (ALA-BI) as well as for only the
unfolded configurations (ALA-BI-U), configurations with a helicity
(Q) less than 0.2.
Each bead in the BI model was treated equivalently; models were constructed
with a single bond, angle, dihedral, and nonbonded interaction. Each
CG model of dodecaalanine was simulated for 4 billion time steps to
obtain the free energy surfaces.
H0
Two solvent
free CG models of H0 were constructed from the lowest temperature
replica of the AA T-REMD simulation. Each residue was represented
by a single bead corresponding to the Cα carbon for a total
of 22 CG beads for the whole peptide. An HENM model of H0 was constructed
using only the folded configurations (H0-HENM-F). The HENM model was
constructed using a cutoff of 1.5 nm. A BI model of H0 was created
using only the unfolded configurations (H0-BI-U). Each bead in the
BI model was treated equivalently; models were constructed with a
single bonded, angular, dihedral, and nonbonded interaction. Each
CG model of H0 was simulated for 4 billion time steps to obtain the
free energy surfaces.
MCCG Models
MCCG models were constructed
for dodecaalanine (ALA-MCCG) and H0 (H0-MCCG) using the folded HENM
and unfolded BI CG models for the two states. For the iterative procedure,
α was set to be 1.
Results and Discussion
In this section, we demonstrate the ability of a coupled two-state
MCCG system to capture the folding and unfolding of two different
peptides: an alanine 12-mer and endophilin H0.
Dodecaalanine
Dodecaalanine was chosen to test our MCCG model because it has been
used in previous studies that attempted to capture the folding and
unfolding process of peptides.[29,30] We first simulated
dodecaalanine for 1 μs using AA MD. This fine-grained simulation
sampled folded, partially folded, globular, and extended configurations.
The free energy surface (Figure ) along the helicity (Q) and radius of gyration () CVs shows a large free energy
minimum in the unfolded configuration (Q < 0.2) as well as a few small free energy minima
in the helical configurations (Q > 0.8). These two CVs have been previously[30] used to characterize the different conformations explored
by dodecaalanine.
Figure 1
2D potential of mean force for the AA dodecaalanine (ALA-AA)
system with mapped coordinates as a function of helicity (Q) and radius of gyration
(R). The folded state
is denoted in the red square (F) and the unfolded state (U), in purple.
2D potential of mean force for the AA dodecaalanine (ALA-AA)
system with mapped coordinates as a function of helicity (Q) and radius of gyration
(R). The folded state
is denoted in the red square (F) and the unfolded state (U), in purple.We chose a Cα mapping scheme,
one CG site per residue at the Cα position, for each CG state
model. A Cα CG mapping is able to characterize the difference
between an unfolded and folded helix.[53] Using the Cα CG mapping, we developed two simple CG models
to capture the folding and unfolding of the peptide using the AA trajectories.
Both the Boltzmann Inversion (ALA-BI) and Heterogeneous Elastic Network
(ALA-HENM) models developed from the AA trajectory of dodecaalanine
failed to capture this transition, as seen in the 2D PMF (Figure ). ALA-BI reproduces
only portions of the folded configuration and is quite sticky; it
stabilizes structures with small radius of gyration (R < 0.4), configurations that are
not sampled in the AA simulation. ALA-HENM samples the unfolded configuration
as well as a large number of high-energy states that are not sampled
in the AA simulation. Previous studies by Carmichael and Shell,[29] as well as Rudzinski and Noid,[30] have also demonstrated the inability of more complicated
CG methods to accurately capture both the folded and unfolded configurations
of alanine polymers using the REM, iterative generalized Yvon-Born-Green,[54,55] and MS-CG techniques.
Figure 2
2D potentials of the mean force of dodecalalanine
as a function of helicity (Q) and radius of gyration (R) for the four CG models: (A) Boltzmann inversion (ALA-BI);
(B) heterogeneous elastic network model (ALA-HENM); Boltzmann inversion
for the unfolded configurations (ALA-BI-U); (D) heterogeneous elastic
network model for the folded configurations (ALA-HENM-F). ALA-BI and
ALA-HENM both cannot reproduce the AA mapped PMF and do not sample
portions of the AA mapped PMF well. ALA-BI-U reproduces the unfolded
portion of the AA PMF, and ALA-HENM reproduces the folded portion
of the AA PMF.
2D potentials of the mean force of dodecalalanine
as a function of helicity (Q) and radius of gyration (R) for the four CG models: (A) Boltzmann inversion (ALA-BI);
(B) heterogeneous elastic network model (ALA-HENM); Boltzmann inversion
for the unfolded configurations (ALA-BI-U); (D) heterogeneous elastic
network model for the folded configurations (ALA-HENM-F). ALA-BI and
ALA-HENM both cannot reproduce the AA mapped PMF and do not sample
portions of the AA mapped PMF well. ALA-BI-U reproduces the unfolded
portion of the AA PMF, and ALA-HENM reproduces the folded portion
of the AA PMF.We next parametrized
separate CG models to model the folded and unfolded configurations.
A BI CG model parametrized solely from AA coordinates of the unfolded
configuration (ALA-BI-U) reproduced the unfolded portion of the 2D
PMF, and a heterogeneous Elastic Network CG model parametrized solely
from AA coordinates of the folded configuration (ALA-HENM-F) reproduced
the folded portion of the 2D PMF (Figure ). While these simple CG models reproduce
the general shape of the unfolded and folded areas of the PMF, they
do not contain some of the local minima observed in the AA PMF. This
is to be expected, as CG models generally smooth out the PMF. However,
more features could be captured in the individual CG models for the
folded and unfolded states if more sophisticated CG modeling techniques,
such as MS-CG and REM, are used to build the diabatic states.Using these CG models as the diabatic states and the AA and CG 2D
PMFs to calculate the coupling, we constructed an MCCG model of dodecalanine
(ALA-MCCG). The coupling term V12() was mostly localized to the transition
region between the two states (Figure ). Due to the large transition region, the coupling
is required to operate over a reasonably large portion of conformational
space. The resulting 2D PMF from the MCCG simulations of ALA-MCCG
show sampling in both the folded and unfolded configurations, with
transitions between the two states (Figure ).
Figure 3
(Top) Tabulated coupling term (V12()), used for the dodecaalanine
MCCG model, as a function of helicity (Q) and radius of gyration (R). (Bottom) Resulting MCCG PMF for dodecaalanine
(ALA-MCCG). ALA-MCCG reproduces two-state behavior but does not reproduce
the barrier height or well depths.
(Top) Tabulated coupling term (V12()), used for the dodecaalanine
MCCG model, as a function of helicity (Q) and radius of gyration (R). (Bottom) Resulting MCCG PMF for dodecaalanine
(ALA-MCCG). ALA-MCCG reproduces two-state behavior but does not reproduce
the barrier height or well depths.However, while the MCCG model was able to describe the two-state
behavior of dodecaalanine, it was not able to accurately reproduce
the barrier height nor the free energy difference between the two
configurations. The high PMF barrier, 25 kJ/mol, in the MCCG model
versus 15 kJ/mol in the AA model suggested that the coupling in the
transition region is too low. We therefore refined the MCCG model
by using the iterative procedure described in eq . Since the MCCG PMF and AA PMF are most different
in the transition region, this slowly increases the coupling in the
transition region, allowing the MCCG PMF barrier to reproduce the
AA PMF barrier. As seen in Figure , the iterative MCCG procedure reduces the free energy
barrier to 20 kJ/mol by the third iteration and 17 kJ/mol (within
an order of kT) by the seventh iteration. Additionally, the PMF well
depths, at 0 kJ/mol for both states in the AA PMF, are matched by
the seventh iteration. It should again be noted that the AA PMF contains
at least 10 local minima, which are smoothed out in the CG PMF. While
MCCG substantially improves the transition region, it should not be
expected to improve all other aspects of the CG model, especially
features in the diabatic wells. The primary motions of the system
are driven by the diabatic potentials; it is only in the transition
region that the coupling term has its largest effect. Nevertheless,
even with the diabatic states derived from simple CG models, MCCG
can more accurately reproduce the overall AA PMF along the chosen
CVs compared to more sophisticated (and as of yet unknown) CG modeling
techniques.
Figure 4
2D PMFs as a function of helicity (Q) and radius of gyration (R) comparing the mapped AA system (ALA-AA)
with various iterations (Iter X) of MCCG. The iterative procedure
improves the difference in free energy difference between the two
states, as well as the free energy barrier to match the AA system.
2D PMFs as a function of helicity (Q) and radius of gyration (R) comparing the mapped AA system (ALA-AA)
with various iterations (Iter X) of MCCG. The iterative procedure
improves the difference in free energy difference between the two
states, as well as the free energy barrier to match the AA system.We further analyzed the MCCG trajectories
to determine whether ALA-MCCG can accurately capture additional CVs
that were not included in the development of the coupling term (Figure ). For each additional
CV, MCCG produces a PMF that is a combination of the ALA-HENM-F and
ALA-BI-U models. The two collective variables most correlated with
the coupled CVs, R22 (the second component of the
gyration tensor, which describes the width of the polymer) and RMSD,
improve the most compared with the 1D AA PMFs. However, end-to-end
distance, which is least correlated with the coupled CVs, is worse
at reproducing the reference AA PMF. This demonstrates the importance
of choosing good (or possibly additional) CVs for the MCCG coupling
term. CVs not as correlated with the coupled CVs (or anticorrelated)
may potentially be adversely affected by the MCCG procedure. By reducing
the many-body PMF to a PMF dependent only upon a few CVs, some information
is inevitably lost. Choosing CVs that can best represent the many-body
PMF in the full set of CG variables will produce the best MCCG models, but there
is obviously a limit on how many CVs can be chosen in terms of one’s
ability to calculate the AA multidimensional reduced PMFs as a function
of them (here, we have used 2D PMFs). Thus, when building MCCG models,
it is important to check whether other PMF properties not correlated
with the coupled CVs are affected.
Figure 5
1D PMFs for three CVs comparing the ALA
MCCG model with all-atom PMFs, as well as the two CG models comprising
the MCCG model. The MCCG model more accurately represents CVs that
are coupled (RMSD) or correlated with the coupled CVs (R22). CVs that are not related to the coupled CVs (end-to-end
distance) are less accurately captured.
1D PMFs for three CVs comparing the ALA
MCCG model with all-atom PMFs, as well as the two CG models comprising
the MCCG model. The MCCG model more accurately represents CVs that
are coupled (RMSD) or correlated with the coupled CVs (R22). CVs that are not related to the coupled CVs (end-to-end
distance) are less accurately captured.
Endophilin H0
The folding process of endophilin H0, a 22-residue
amphipathic helix that targets curved membranes, has been previously
studied by Cui et al.[56] The folding process
was characterized using the α–β similarity and
the number of backbone hydrogen bonds CVs. These CVs can only be described
using atomistic detail. Here, we chose to characterize the folding
process using the root mean squared deviation (RMSD) from the idealized
folded configuration and the second component of the gyration tensor[57] (R22) of the Cα coordinates
(Figure a). These
CVs adequately separate the two states and contain a distinct transition
region. The folded configuration of H0 is characterized by R22 and RMSD values less than 0.4 nm, while the
unfolded configuration is characterized by RMSD > 0.5 nm.
Figure 6
2D PMFs as
a function of root mean squared deviation (RMSD) and the second component
of the gyration tensor (R22) for H0. For the mapped
AA system (H0-AA), the folded state is denoted in the red square (F)
and the unfolded state (U), in purple. The Boltzmann inversion CG
model of the unfolded state (H0-BI-U) captures the unfolded portion
of the AA PMF, and the hENM CG model of the folded state (H0-BI-F)
captures the folded portion of the AA PMF. The coupling term (V12()) is at a
maximum in the transition region (RMSD = 0.45) as well as the portions
of the PMF unsampled by H0-BI-U and H0-HENM-F.
2D PMFs as
a function of root mean squared deviation (RMSD) and the second component
of the gyration tensor (R22) for H0. For the mapped
AA system (H0-AA), the folded state is denoted in the red square (F)
and the unfolded state (U), in purple. The Boltzmann inversion CG
model of the unfolded state (H0-BI-U) captures the unfolded portion
of the AA PMF, and the hENMCG model of the folded state (H0-BI-F)
captures the folded portion of the AA PMF. The coupling term (V12()) is at a
maximum in the transition region (RMSD = 0.45) as well as the portions
of the PMF unsampled by H0-BI-U and H0-HENM-F.A BI CG model of the unfolded configuration (H0-BI-U) and
a HENM model of folded configuration (H0-HENM-F) accurately reproduced,
respectively, the unfolded and folded portions of the PMF, along the
CVs R22 and RMSD, but with smoothed PMF minima
compared to the AA PMF. As noted earlier, CG models typically smooth
out portions of the free energy surface because the high frequency
motions are integrated out of the model. We then constructed an MCCG
model of H0 (H0-MCCG). As in the case of dodecaalanine, the coupling
is highest in the transition region (Figure ), as well as regions not sampled by the
diabatic states (RMSD > 1.0 nm and RMSD < 0.2 nm).H0-MCCG
exhibits two-state behavior, sampling both the unfolded and folded
configurations, with transitions between the two states (Figure ). H0-MCCG has a
barrier height of 10 kJ/mol, within an order of of the barrier
height of 7.5 kJ/mol in the AA PMF. As in the case of dodecalanine,
the initial MCCG barrier height is incorrect. Applying the iterative
MCCG method for only 2 iterations for H0-MCCG, however, reproduces
the 7.5 kJ/mol barrier height at RMSD = 0.5 nm.
Figure 7
2D PMFs as a function
of root mean squared deviation (RMSD) and the second component of
the gyration tensor (R22) comparing the mapped
AA system (H0-AA) with various iterations (Iter X) of MCCG. The iterative
procedure improves the difference in free energy difference between
the two states and converges to H0-AA within 2 iterations.
2D PMFs as a function
of root mean squared deviation (RMSD) and the second component of
the gyration tensor (R22) comparing the mapped
AA system (H0-AA) with various iterations (Iter X) of MCCG. The iterative
procedure improves the difference in free energy difference between
the two states and converges to H0-AA within 2 iterations.The H0-MCCG PMF is in much better agreement with
the AA PMF compared to ALA-MCCG. This is likely caused by the more
accurate diabatic states in the case of the H0-MCCG models. The coupling
term is therefore required to do less work to reproduce the AA PMF,
as it is already more accurately represented by the individual CG
models. Accordingly, the number of iterations required to converge
the barrier height and PMF minima is significantly reduced due to
the improved CG diabatic states.
Conclusions
Proteins
can undergo major conformational changes that cannot always be captured
by a single CG model having an imperfect description of the interactions
between the CG sites. The MCCG method introduced in this work is thus
designed to capture the transitions between multiple conformational
basins in an approximate CG model in a way that can reasonably reproduce
the overall PMF for the conformational change process. Although previous
methods have been developed that can simulate a CG model that transitions
between multiple conformational states, these methods are not guaranteed
to reproduce key features of the AA PMF, especially in the transition
barrier region. In contrast, the new MCCG method utilizes an AA-based
off-diagonal coupling, the V12() term in eq , to reproduce the correct distribution of states as well
as the transition barrier.An important strength of the MCCG
method is the fact that the off-diagonal coupling term is defined
to be a function of the CVs that describe the transition process.
When a constant off-diagonal coupling term is used in contrast, the
system may be unable to capture the correct location of the transition
barrier, in addition to its height. Using a coupling term that is
a function of the transition CVs not only allows the system to reproduce
the transition barrier but also leads to a more realistic description
of the transition process. Additionally, as the method is applied
to other systems, more complicated CVs can be used to incorporate
other influences on structural transformations, for example, environmental
interactions that affect protein structure. As long as a PMF can be
calculated, or at least estimated, to describe the transition between
states, whether from AA simulations, from experiment, or phenomenologically,
an MCCG model can be parametrized for the system under study.Another advantage of the MCCG method is the ability to use simple
CG models to capture complex, multistate (i.e., multiconformation)
behavior in a sort of “divide and conquer” fashion instead
of using more complex CG modeling methods. If CG models are constructed
and adequate for each of the individual protein conformations, and
if the AA PMF (or a good estimate of it) is known for the structural
transition, then an MCCG model can be created for the system. An accurate
MCCG model may ideally utilize the calculation of the AA PMF, which
can be computationally expensive, but for which many methods have
been developed.[58−62] The major reduction in computational effort afforded by the resulting
MCCG model can then be taken advantage of to investigate new emergent
behavior. For example, when many proteins are allowed to interact
with each other, such as in cellular environments, the MCCG model
will be influenced not only by the PMF for a single protein but also
by the interactions between proteins on the CG level. This will lead
to simulations that will be more representative of actual cellular
environments, where protein conformations are influenced both by smaller
scale atomic level interactions and by the larger scale protein motions
and protein–protein interactions.
Authors: Jaehyeok Jin; Alexander J Pak; Aleksander E P Durumeric; Timothy D Loose; Gregory A Voth Journal: J Chem Theory Comput Date: 2022-09-07 Impact factor: 6.578