Gaussian accelerated molecular dynamics (GaMD) is a recently developed enhanced sampling technique that provides efficient free energy calculations of biomolecules. Like the previous accelerated molecular dynamics (aMD), GaMD allows for "unconstrained" enhanced sampling without the need to set predefined collective variables and so is useful for studying complex biomolecular conformational changes such as protein folding and ligand binding. Furthermore, because the boost potential is constructed using a harmonic function that follows Gaussian distribution in GaMD, cumulant expansion to the second order can be applied to recover the original free energy profiles of proteins and other large biomolecules, which solves a long-standing energetic reweighting problem of the previous aMD method. Taken together, GaMD offers major advantages for both unconstrained enhanced sampling and free energy calculations of large biomolecules. Here, we have implemented GaMD in the NAMD package on top of the existing aMD feature and validated it on three model systems: alanine dipeptide, the chignolin fast-folding protein, and the M3 muscarinic G protein-coupled receptor (GPCR). For alanine dipeptide, while conventional molecular dynamics (cMD) simulations performed for 30 ns are poorly converged, GaMD simulations of the same length yield free energy profiles that agree quantitatively with those of 1000 ns cMD simulation. Further GaMD simulations have captured folding of the chignolin and binding of the acetylcholine (ACh) endogenous agonist to the M3 muscarinic receptor. The reweighted free energy profiles are used to characterize the protein folding and ligand binding pathways quantitatively. GaMD implemented in the scalable NAMD is widely applicable to enhanced sampling and free energy calculations of large biomolecules.
Gaussian accelerated molecular dynamics (GaMD) is a recently developed enhanced sampling technique that provides efficient free energy calculations of biomolecules. Like the previous accelerated molecular dynamics (aMD), GaMD allows for "unconstrained" enhanced sampling without the need to set predefined collective variables and so is useful for studying complex biomolecular conformational changes such as protein folding and ligand binding. Furthermore, because the boost potential is constructed using a harmonic function that follows Gaussian distribution in GaMD, cumulant expansion to the second order can be applied to recover the original free energy profiles of proteins and other large biomolecules, which solves a long-standing energetic reweighting problem of the previous aMD method. Taken together, GaMD offers major advantages for both unconstrained enhanced sampling and free energy calculations of large biomolecules. Here, we have implemented GaMD in the NAMD package on top of the existing aMD feature and validated it on three model systems: alanine dipeptide, the chignolin fast-folding protein, and the M3 muscarinic G protein-coupled receptor (GPCR). For alanine dipeptide, while conventional molecular dynamics (cMD) simulations performed for 30 ns are poorly converged, GaMD simulations of the same length yield free energy profiles that agree quantitatively with those of 1000 ns cMD simulation. Further GaMD simulations have captured folding of the chignolin and binding of the acetylcholine (ACh) endogenous agonist to the M3 muscarinic receptor. The reweighted free energy profiles are used to characterize the protein folding and ligand binding pathways quantitatively. GaMD implemented in the scalable NAMD is widely applicable to enhanced sampling and free energy calculations of large biomolecules.
Accelerated molecular
dynamics (aMD) is an enhanced sampling technique
that works by smoothing the potential energy surface to lower energy
barriers and thus accelerate conformational transitions of biomolecules.[1] Without the need to set predefined collective
variables (CVs), aMD allows “unconstrained” enhanced
sampling of many complex biomolecules.[2] For example, aMD simulations provide significant speed-up of peptide
and protein conformational transitions,[1,3] lipid diffusion
and mixing,[4] protein folding,[5] and protein–ligand binding.[6] Hundreds-of-nanosecond aMD simulations are able
to capture millisecond time scale events in both globular and membrane
proteins.[7]While aMD is powerful
for enhanced conformational sampling, its
accuracy for free energy calculations has attracted lots of attention.[8] In theory, frames of aMD simulations can be reweighted
by the Boltzmann factors of the corresponding boost potential (i.e., eΔ) and averaged over each bin of selected
CV(s) to obtain the canonical ensemble. However, the exponential reweighting
is known to suffer from large statistical noise in practical calculations[2,7c,8,9] because
the Boltzmann reweighting factors are often dominated by a very few
frames with high boost potential. The boost potential in aMD simulations
of proteins is typically on the order of tens to hundreds of kilocalories
per mole, which is much greater in magnitude and wider in distribution
than that of CV-biasing simulations (e.g., several kilocalories per
mole in metadynamics). It has been a long-standing problem to accurately
reweight aMD simulations and recover the original free energy landscapes,
especially for large proteins.[6,7c] Notably, when the boost
potential follows near-Gaussian distribution, cumulant expansion to
the second order provides improved reweighting of aMD simulations
compared with the previously used exponential average and Maclaurin
series expansion reweighting methods.[10] The reweighted free energy profiles are in good agreement with the
long-time scale conventional molecular dynamics (cMD) simulations
as demonstrated on alanine dipeptide and fast-folding proteins.[5b] However, such improvement is limited to rather
small systems (e.g., proteins with less than ∼35 amino acid
residues).[5b] In simulations of larger systems,
the boost potential exhibits significantly wider distribution and
does not allow for accurate reweighting.In order to achieve
both unconstrained enhanced sampling and accurate
energetic reweighting for free energy calculations of large biomolecules
like proteins, Gaussian accelerated molecular dynamics (GaMD) has
been developed by applying a harmonic boost potential to smooth the
biomolecular potential energy surface.[11] GaMD greatly reduces the energy barriers and accelerates conformational
transitions and ligand binding by orders of magnitude.[11] Moreover, because the boost potential follows
Gaussian distribution, the original free energy profiles of biomolecules
can be recovered through cumulant expansion to the second order.[11] GaMD solves the long-standing energetic reweighting
problem as encountered in the previous aMD method[8] and allows us to characterize complex biomolecular conformational
changes quantitatively.[11] Compared with
other enhanced sampling methods such as the metadynamics[12] and adaptive biasing force (ABF),[13] GaMD does not require predefined CVs, which
is advantageous for studying “free” protein folding
and ligand binding processes[11] and efficient
free energy calculations of large biomolecules.[11]Here, we have implemented GaMD in the NAMD package
on top of its
existing aMD feature. The potential statistics of simulated biomolecules
are collected from short cMD and used to construct the harmonic boost
potential. The implemented GaMD in NAMD will be demonstrated on three
model systems that have been extensively studied earlier: alaninedipeptide,[10,11,14] the chignolin fast-folding protein,[5b,11] and the M3 muscarinic G protein-coupled receptor (GPCR) bound by the
acetylcholine (ACh) endogenous agonist.[6,15]
Methods
Theory
GaMD enhances the conformational sampling of
biomolecules by adding a harmonic boost potential to smooth the system
potential energy surface when the system potential is lower than a reference energy E:[11]where k is
the harmonic force constant. The two adjustable parameters E and k are automatically determined by
applying the following three criteria. First, for any two arbitrary
potential values and found on the original energy surface, if , ΔV should be a
monotonic function that does not change the relative order of the
biased potential values, i.e., . Second,
if , the potential difference
observed on the
smoothed energy surface should be smaller than that of the original,
i.e., . By combining
the first two criteria and
plugging in the formula of and ΔV, we obtainwhere Vmin and Vmax are the system minimum
and maximum potential energies. To ensure that eq is valid, k has to satisfy: . Let us define , then 0 < k0 ≤ 1. Third, the
standard deviation of ΔV needs to be small
enough (i.e., narrow distribution) to ensure accurate
reweighting using cumulant expansion to the second order:[10] σΔ = k(E – Vavg)σ ≤ σ0, where Vavg and σ are the average and standard deviation of the system
potential energies, σΔ is
the standard deviation of ΔV with σ0 as a user-specified upper limit (e.g., 10kBT) for accurate reweighting. When E is set to the lower bound E = Vmax according to eq , k0 can be calculated
asAlternatively,
when the threshold energy E is set to its upper bound E = Vmin + 1/k, k0 is set toif k0″ is found to be between
0 and 1. Otherwise, k0 is calculated using eq .For energetic reweighting
of GaMD simulations, the probability distribution along a selected
reaction coordinate A() is written as p*(A), where denotes the atomic positions {1, ..., }.Given the boost potential ΔV() of each frame, p*(A) can be reweighted to recover the
canonical ensemble distribution, p(A), aswhere M is
the number of bins, β = 1/kBT and ⟨eβΔV(⟩ is the ensemble-averaged Boltzmann factor of ΔV() for simulation frames found in
the jth bin. In order to reduce the energetic noise,
the ensemble-averaged reweighting factor can be approximated using
cumulant expansion:[16]where the first three cumulants are
given
byAs shown earlier, when the boost potential
follows near-Gaussian distribution, cumulant expansion to the second
order provides the more accurate reweighting than the exponential
average and Maclaurin series expansion methods.[10] Finally, the reweighted free energy is calculated as .
Implementation
Similar to the aMD
implemented in NAMD,[14] three modes are
available for applying boost
potential to biomolecules in GaMD: (1) boosting the dihedral energetic
term only, (2) boosting the total potential energy only, and (3) boosting
both the dihedral and total potential energetic terms (i.e., “dual-boost”).
The major code modification is to extend the aMD function in NAMD
2.11 to include the boost potential calculation used in GaMD (Appendix A). As described in the previous section,
GaMD boost potential is computed based on statistics of the system
potential such as the minimum, maximum, average and standard deviation.
Therefore, three stages of simulation are needed to collect the potential
statistics. They include the (i) cMD, (ii) equilibration, and (iii)
production stages. The program first collects potential statistics
from a short cMD run. Subsequently, a boost potential is added to
the system in the equilibration stage while update of the potential
statistics continues. During this stage, the boost potential applied
in each step is computed based on the energetic statistics collected
up to that particular step. After the equilibration stage, the statistics
collected is assumed to be sufficient to represent the potential energy
landscape of interest. Hence, the potential statistics are fixed to
calculate the boost potential for running the production simulation.
Note that in both the cMD and equilibration stages, there are a small
number of steps at the beginning of each stage during which we do
not collect statistics. These steps, named preparation steps, are
performed to allow the system to adapt to the simulation environment.
The program starts collecting statistics of the potential energies
after the preparation steps.
Simulation Protocols and Benchmarks
For alanine dipeptide
and chignolin, the AMBER ff99SB force field was used and the simulation
systems were built using the Xleap module in the AMBER package[17] as described previously.[9,11] By
solvating the structures in a TIP3P[18] water
box that extends 8–10 Å from the solute surface, the alaninedipeptide system contained 630 water molecules and 2211 waters for
chignolin. The total number of atoms in the two systems is 1912 and
6773 for alanine dipeptide and chignolin, respectively. Periodic boundary
conditions were applied for the simulation systems. Bonds containing
hydrogen atoms were restrained with the SHAKE algorithm[19] and a 2 fs time step was used. Weak coupling
to an external temperature and pressure bath was used to control both
temperature and pressure.[20] The electrostatic
interactions were calculated using the particle mesh Ewald (PME) summation[21] with a cutoff of 8.0 Å for long-range interactions.
After the initial energy minimization and thermalization as described
earlier,[11] dual-boost GaMD was applied
to simulate the two systems. The system threshold energy E for applying the boost potential was set to Vmax. The default parameter values were used for the GaMD simulations
except stated otherwise. For alanine dipeptide, statistics of the
system potential were first collected from an initial 2 ns cMD run,
followed by a 6 ns equilibration run. Based on the statistics collected,
the threshold energy E and harmonic force constant k were computed automatically according to eq . Finally, three independent 30
ns production runs were performed with different randomized initial
atomic velocities. For chignolin, the dual-boost GaMD simulation includes
2 ns cMD, 50 ns equilibration after adding the boost potential, and
then three independent 300 ns production runs.For the M3 muscarinic GPCR, the system was set up following the same
protocol as used previously.[6,7d] The tiotropium (TTP)-bound
X-ray structure (PDB: 4DAJ) of the M3 receptor[22] was used. After removal of TTP, the M3 receptor
was inserted into a palmitoyl-oleoyl-phosphatidyl-choline (POPC) bilayer
with all overlapping lipid molecules removed using the Membrane plugin and solvated in a water box using the Solvate plugin in VMD.[23] Four ACh ligand molecules
were placed at least 40 Å away from the receptor orthosteric
site in the bulk solvent. The system charges were then neutralized
with 18 Cl– ions. The simulation systems of the M3 receptor
initially measured about 80 Å × 87 Å × 97 Å
with 130 lipid molecules, ∼11 200 water molecules and
a total of ∼55 500 atoms. Initial energy minimization
and thermalization of the M3 receptor system follow the
same protocol as used in the previous study.[6] GaMD simulation was then performed using the dual-boost scheme with
the threshold energy E set to Vmax. The simulation included 2 ns cMD, 50 ns equilibration
after adding the boost potential and then three independent production
runs with randomized atomic velocities (one for 400 ns and another
two for 300 ns).The GaMD simulations were carried out using
NAMD 2.11 on the Gordon
supercomputer at the San Diego Supercomputing Center. Excellent scalability
was obtained as shown in Figure S1. For
the M3 muscarinic GPCR system, benchmark simulations showed
that GaMD ran at ∼10 ns/day with 64 CPUs and up to ∼61
ns/day with 640 CPUs, which were ∼8–11% slower than
the corresponding cMD runs. This performance is very similar to that
of the conventional aMD implemented in NAMD.[14] GaMD production frames were saved every 0.1 ps. The VMD[23] and CPPTRAJ[24] tools
were used for trajectory analysis. For chignolin, the root-mean square
deviations (RMSD) of the Cα atoms relative to the
native NMR structure (PDB: 1UAO) and the protein radius of gyration, Rg, were calculated. For the M3 muscarnic receptor,
the density based spatial clustering of applications with noise (DBSCAN)
algorithm[25] was applied to cluster the
diffusing ligand molecules for identifying the highly populated binding
sites. Finally, the PyReweighting toolkit[10] was applied to compute the potential of mean force (PMF) profiles
of the backbone dihedrals Φ and Ψ in the alanine dipeptide,
the (RMSD, Rg) of chignolin and structural
clusters of the diffusing ACh in the M3 receptor system.
Results
Free Energy Profiles of the Alanine Dipeptide
With
the three independent 30 ns GaMD production trajectories of the alaninedipeptide (Figure A), energetic reweighting was applied to calculate the free energy
profiles of the backbone dihedrals Φ and Ψ. Analysis of
the system boost potential showed that it followed Gaussian distribution
with low anharmonicity (7.18 × 10–3; Figure B). The boost potential
average is 6.93 and 1.87 kcal/mol for the standard deviation. With
this, the cumulant expansion to the second order was applied for the
reweighting.
Figure 1
(A) Schematic representation of backbone dihedrals Φ
and
Ψ in alanine dipeptide. (B) Distribution of the boost potential
ΔV applied in the GaMD simulations with anharmonicity
equal to 7.18 × 10–3. (C–D) Potential
of mean force (PMF) profiles of the (C) Φ and (D) Ψ dihedrals
calculated from three 30 ns GaMD simulations combined using cumulant
expansion to the second order. (E) 2D PMF profile of backbone dihedrals
(Φ, Ψ). The low energy wells are labeled corresponding
to the right-handed α helix (αR), left-handed
α helix (αL), β-sheet (β), and
polyproline II (PII) conformations. (F) Distribution anharmonicity
of ΔV of frames found in each bin of the PMF
profile.
(A) Schematic representation of backbone dihedrals Φ
and
Ψ in alanine dipeptide. (B) Distribution of the boost potential
ΔV applied in the GaMD simulations with anharmonicity
equal to 7.18 × 10–3. (C–D) Potential
of mean force (PMF) profiles of the (C) Φ and (D) Ψ dihedrals
calculated from three 30 ns GaMD simulations combined using cumulant
expansion to the second order. (E) 2D PMF profile of backbone dihedrals
(Φ, Ψ). The low energy wells are labeled corresponding
to the right-handed α helix (αR), left-handed
α helix (αL), β-sheet (β), and
polyproline II (PII) conformations. (F) Distribution anharmonicity
of ΔV of frames found in each bin of the PMF
profile.In comparison, the reweighted
PMF profiles obtained from 30 ns
GaMD trajectories agree quantitatively with the original profiles
from much longer 1000 ns cMD simulation. Although the GaMD derived
PMF profile of Φ exhibits moderate fluctuations near the energy
barrier at 0° and slightly elevated free energy well centered
at ∼50° (Figure C), it essentially overlaps with the original profile in the
other regions, similar for the entire PMF profile of Ψ (Figure D). In contrast,
three cMD simulations did not properly sample the energy barriers
of Φ at 0° and Ψ at 120° as shown in Figures S2A and S2B, respectively. For Φ,
the energy well centered at 60° obtained from the 30 ns cMD simulations
was higher than that from 1000 ns cMD simulation. Therefore, while
cMD simulations performed for 30 ns are poorly converged for alaninedipeptide, GaMD simulations of the same length yielded significantly
improved free energy profiles that agree quantitatively with those
of the 1000 ns cMD simulation.In addition, we calculated a
2D PMF of (Φ, Ψ) in alaninedipeptide by reweighting the three 30 ns GaMD trajectories combined.
As shown in Figure E, five free energy wells were identified in the reweighted PMF profile
of (Φ, Ψ), which are centered around (−144°,
0°) and (−72°, −18°) for the right-handed
α helix (αR), (48°, −6°) for
the left-handed α helix (αL), (−150°,
156°) for the β-sheet, and (−72°, 162°)
for the polyproline II (PII) conformation (Figure E). Their corresponding minimum
free energies are estimated as 0, 0.47, 1.82, 1.44, and 2.35 kcal/mol,
respectively. In addition, the distribution anharmonicity of ΔV of frames clustered in each bin of the 2D PMF is smaller
than 0.10 in all low-energy regions (Figure F), suggesting that reweighting using second
order cumulant expansion is a reasonable approximation. Indeed, the
reweighted 2D PMF profile obtained from three 30 ns GaMD trajectories
(Figure E) is very
similar to that obtained from 1000 ns cMD (Figure S2D), but not for the 30 ns cMD simulations (Figure S2C). Therefore, GaMD accurately samples the free energy
profiles of alanine dipeptide within much shorter simulation time
compared with cMD, demonstrating the efficiency and accuracy of GaMD
for the biomolecular free energy calculations.
Folding of Chignolin
Started from an extended conformation
of chignolin, simulations using GaMD implemented in NAMD 2.11 were
able to capture complete folding of the protein into its native structure
within 300 ns. The RMSD obtained between the simulation-folded chignolin
and NMR experimental native structure (PDB: 1UAO) reaches a minimum
of 0.2 Å (Figure A). The system boost potential applied in the GaMD simulations followed
Gaussian distribution with the anharmonicity equal to 9.66 ×
10–3 (Figure B). The average and standard deviation of the boost potential
are 11.2 and 2.8 kcal/mol, respectively. During the three independent
300 ns GaMD simulations, chignolin folded into the native conformational
state with RMSD < 2 Å and unfolded repeatedly in two of the
simulations. It remained in the folded state after rapid folding within
∼20 ns in the third simulation (Figure C). Upon folding, the chignolin showed decrease
of the radius of gyration, Rg, to 4.2
Å (Figure D).
The average folding time of chignolin obtained from the GaMD simulation
is ∼28 ns, which is significantly shorter than the 600 ns folding
time obtained from the previous long-time scale cMD simulations[26] (i.e., ∼30 times speedup).
Figure 2
Folding of
chignolin captured in the GaMD simulations: (A) comparison
of simulation-folded chignolin (blue) with the PDB (1UAO) native structure
(red) that exhibits 0.2 Å RMSD, (B) distribution of the boost
potential ΔV with anharmonicity equal to 9.66
× 10–3, (C) the RMSD of chignolin between simulation
snapshots and the PDB native structure, and (D) the radius of gyration, Rg, of chignolin calculated from the three independent
300 ns GaMD simulations. (E) 2D (RMSD, Rg) PMF calculated by reweighting the three 300 ns GaMD simulations
combined. (F) Distribution anharmonicity of ΔV of frames found in each bin of the PMF profile.
Folding of
chignolin captured in the GaMD simulations: (A) comparison
of simulation-folded chignolin (blue) with the PDB (1UAO) native structure
(red) that exhibits 0.2 Å RMSD, (B) distribution of the boost
potential ΔV with anharmonicity equal to 9.66
× 10–3, (C) the RMSD of chignolin between simulation
snapshots and the PDB native structure, and (D) the radius of gyration, Rg, of chignolin calculated from the three independent
300 ns GaMD simulations. (E) 2D (RMSD, Rg) PMF calculated by reweighting the three 300 ns GaMD simulations
combined. (F) Distribution anharmonicity of ΔV of frames found in each bin of the PMF profile.Based on Gaussian distribution of the boost potential, cumulant
expansion to the second order was applied to reweight the three 300
ns GaMD simulations of chignolin combined. A 2D PMF profile was calculated
for the protein RMSD relative to the PDB native structure and the
radius of gyration, (RMSD, Rg) as shown
in Figure E. The reweighted
PMF allowed us to identify the folded (“F”) and intermediate
(“I”) conformational states, which correspond to the
global energy minimum at (1.0 and 4.0 Å) and a low-energy well
centered at (4.5 and 5.5 Å), respectively. Figure F plots the distribution anharmonicity of
ΔV for frames found in each bin of the 2D PMF
as shown in Figure E. The anharmonicity exhibits values smaller than 0.05 in the simulation
sampled conformational space, suggesting that the boost potential
follows Gaussian distribution for proper reweighting using cumulant
expansion to the second order. In summary, GaMD enables efficient
enhanced sampling and free energy calculations of protein folding
as demonstrated on the chignolin.
Ligand Binding to a Muscarinic
G Protein-Coupled Receptor
Finally, the GaMD implemented
in NAMD 2.11 was demonstrated on
binding of the ACh endogenous agonist to the M3 muscarinic
GPCR (Figure A). The
M3 muscarinic receptor is widely expressed in human tissues
and a key seven-transmembrane (TM) GPCR that has been targeted for
treating various human diseases, including cancer,[27] diabetes,[28] and obesity.[29] Three independent GaMD simulations were performed
on the M3 receptor, one for 400 ns and another two for
300 ns. As shown in Figure B, the system boost potential follows Gaussian distribution
with anharmonicity equal to 1.33 × 10–2. The
average and standard deviation of the boost potential are 10.9 and
3.0 kcal/mol, respectively. Such narrow distribution will ensure accurate
reweighting for free energy calculation using cumulant expansion to
the second order.
Figure 3
Binding of the acetylcholine (ACh) endogenous agonist
to the M3 muscarinic G-protein-coupled receptor simulated
via GaMD:
(A) Schematic representation of the computational model, in which
the receptor is shown in ribbons (orange), lipid in sticks, ions in
small spheres, and four ligand molecules in large spheres. (B) Distribution
of the boost potential ΔV with anharmonicity
equal to 1.33 × 10–2. (C) Probability distribution
of the ACh (the N atom in blue dots) diffusing in the bulk solvent
and bound to the M3 receptor (orange ribbons), in which
the Glide docking pose of ACh is shown in green sticks. (D) RMSD of
the diffusing ACh relative to the Glide docking pose calculated from
the 400 ns GaMD simulation. (E) Ten lowest energy structural clusters
of ACh that are labeled and colored in a green–white–red
(GWR) scale according to the PMF values obtained from reweighting
of the GaMD simulation.
Binding of the acetylcholine (ACh) endogenous agonist
to the M3 muscarinic G-protein-coupled receptor simulated
via GaMD:
(A) Schematic representation of the computational model, in which
the receptor is shown in ribbons (orange), lipid in sticks, ions in
small spheres, and four ligand molecules in large spheres. (B) Distribution
of the boost potential ΔV with anharmonicity
equal to 1.33 × 10–2. (C) Probability distribution
of the ACh (the N atom in blue dots) diffusing in the bulk solvent
and bound to the M3 receptor (orange ribbons), in which
the Glide docking pose of ACh is shown in green sticks. (D) RMSD of
the diffusing ACh relative to the Glide docking pose calculated from
the 400 ns GaMD simulation. (E) Ten lowest energy structural clusters
of ACh that are labeled and colored in a green–white–red
(GWR) scale according to the PMF values obtained from reweighting
of the GaMD simulation.During the 400 ns GaMD simulation of the M3 muscarinic
receptor, ACh was observed to enter the receptor and then bind to
the receptor endogenous ligand-binding (“orthosteric”)
site (Figure C). Highly
populated clusters were identified for the ligand in the extracellular
vestibule and orthosteric site of the receptor, while the ligand diffuses
nearly homogeneously in the bulk solvent. Note that periodic boundary
conditions were applied on the simulation system and thus ACh diffused
to the cytoplasmic side of the lipid membrane, which may not occur
in the real cells. Nonetheless, ACh entered the receptor from only
the extracellular side, recapitulating the first step of GPCR-mediated
cellular signaling machinery. Figure D plots the RMSD of the four diffusing ACh molecules
relative to the ligand binding pose predicted from Glide docking[30] in the orthosteric site. The ACh-3 molecule
was observed to bind the extracellular vestibule with ∼10 Å
RMSD, dissociate completely from the receptor, rebind to the extracellular
vestibule at ∼200 ns and then enter the receptor to the orthosteric
pocket at ∼270 ns. It finally rearranged its conformation in
the orthosteric pocket, reached a minimum RMSD of 2.0 Å at ∼340
ns, and stayed bound in the orthosteric site until the end of the
400 ns GaMD simulation. Moreover, during the dissociation of the ACh-3,
another ligand molecule (ACh-2) bound briefly to the receptor extracellular
vestibule during ∼125–180 ns. Similar observations were
obtained in the other 300 ns GaMD simulations of the M3 receptor as shown in Figures S3 and S4, during which different ACh molecules were able to bind the extracellular
vestibule but could not reach the orthosteric site within the limited
simulation time.In order to obtain a quantitative picture of
the ligand binding
pathway, the DBSCAN algorithm[25] was applied
to cluster trajectory snapshots of four diffusing ligand molecules
from the 400 ns GaMD simulation. Energetic reweighting[10,11] was then applied to each of the ligand structural clusters to recover
the original free energy. Ten structural clusters with the lowest
free energies are shown in Figure E. Global free energy minimum (0.0 kcal/mol) was found
for cluster “C1” in the orthosteric site. The second
lowest energy minimum was identified for cluster “C2”
(0.12 kcal/mol) located in the extracellular vestibule formed between
ECL2/ECL3. Moreover, cluster “C3” that exhibits a different
conformation compared with cluster “C1” and higher free
energy (0.33 kcal/mol) was also identified in the orthosteric pocket.
In addition to cluster “C2”, clusters “C4”
with 0.45 kcal/mol, “C6” with 0.51 kcal/mol, “C8”
with 1.23 kcal/mol, and “C10” with 1.96 kcal/mol were
also identified in the extracellular vestibule, in which the positively
charged N atom of the ligand interacts with residue Trp5257.35 through cation−π interactions. The residue superscripts
denote the Ballesteros–Weinstein (BW) numbering of GPCRs.[31] Three clusters of higher free energies, “C5”
with 0.50 kcal/mol, “C7” with 0.94 kcal/mol, and “C9”
with 1.50 kcal/mol, appear to connect “C1” in the orthosteric
pocket and “C2” in the extracellular vestibule. Therefore,
structural clusters “C1”, “C3” ↔
“C7”, “C5”, “C9” ↔
“C2”, “C4”, “C6”, “C8”,
and “C10” appear to represent an energetically preferred
pathway for the endogenous agonist binding to the M3 muscarinic
receptor.
Discussion
By adding a harmonic
boost potential to the potential energy surface,
GaMD provides both unconstrained enhanced sampling and free energy
calculation of biomolecules. Important statistical properties of the
system potential, such as the average, maximum, minimum, and standard
deviation values, are used to calculate the simulation acceleration
parameters, particularly the threshold energy E and
force constant k for applying the boost potential.
In this study, we have implemented GaMD in the NAMD software version
2.11. GaMD computes the potential boost according to statistics of
the system potential collected during the cMD and equilibration stages.
It is worth noting that the statistics collection in GaMD-NAMD slightly
differs from the previous version of GaMD implemented in AMBER.[11] In the AMBER version, the average and standard
deviation of potential energies are calculated in every “ntave” steps, a native function available in the
AMBER program.[32] In the NAMD version of
GaMD, the average and standard deviation of the potential are calculated
using the potential values collected up to the current step (see details
in Appendix A).[33] The two variables are updated every step until the end of the cMD
and equilibration stages. Moreover, similar to the implementation
of conventional aMD in NAMD, the LJ correction term is not included
in the total potential energy calculation in GaMD.With the
implementation of GaMD in NAMD, we have demonstrated the
code on three model systems: the alanine dipeptide, chignolin (a globular
protein) and the M3 muscarinic GPCR (a membrane protein).
For the alanine dipeptide biomolecular model system, short GaMD simulations
performed for only 30 ns were able to reproduce highly accurate free
energy profiles of the backbone dihedrals that may need as long as
1000 ns cMD simulation to converge. The free energy errors were almost
negligible except the elevated free energy well of Φ near 50°
by ∼0.5 kcal/mol and slight fluctuations in the energy barriers
(particularly Φ at 0° and Ψ at −120°; Figure ). In contrast, cMD
simulations lasting 30 ns hardly sample these free energy barriers
and exhibit poor convergence. Therefore, GaMD-NAMD greatly accelerates
the conformational sampling and accurate free energy calculation of
the alanine dipeptide.For the chignolin, the GaMD-NAMD simulations
were able to fold
the protein rapidly. In two of the three 300 ns GaMD simulations,
chignolin undergoes both folding and unfolding repeatedly (Figure C). Compared with
the average folding time obtained from long-time scale cMD simulations
(600 ns),[26] GaMD folds the protein within
∼28 ns, i.e., ∼30 times faster. Unlike the previous
GaMD-AMBER simulations,[11] the fully unfolded
state of chignolin does not appear as a low-energy well in the reweighted
free energy profile obtained from the present GaMD-NAMD simulations.
This behavior will be subject to further investigation in future GaMD
studies. Nonetheless, in addition to sampling the folded state in
the global free energy minimum, the GaMD-NAMD simulations also captured
the intermediate state during the folding of the protein. This is
consistent with the previous long-time scale cMD[26] and aMD[5b] simulations.Finally, we have demonstrated GaMD-NAMD on ligand binding to the
M3 muscarinic GPCR as a model membrane protein system.
While the ACh endogenous agonist binds only transiently to the receptor
extracellular vestibule in two 300 ns GaMD simulations, the ligand
enters the receptor and binds to the target orthosteric site in a
400 ns GaMD simulation. Although in principle multiple binding and
unbinding events may be needed in order to compute converged ligand
binding free energy, structural clustering and reweighting of the
GaMD simulation allows us to identify energetically preferred binding
sites and pathway of the diffusing ligand. Particularly, the lowest
energy cluster of ACh is identified in the orthosteric site, in excellent
agreement with the Glide docking pose. The second lowest energy cluster
is located in the extracellular vestibule, with the positively charged
N atom of ACh forming cation−π interaction with the receptor
residue Trp5257.35. This is consistent with previous extensive
experimental and computational studies that the extracellular vestibule
of class A GPCRs acts as a metastable intermediate site during binding
of orthosteric ligands.[6,15] The energetically preferred pathway
of agonist binding to the M3 receptor identified from the
current GaMD-NAMD simulation is similar to that found in previous
long-time scale cMD,[34] aMD,[6] and GaMD simulations[35] of class
A GPCRs. GaMD is thus promising to predict low-energy conformations
of ligand binding and serve as a useful tool in structural biology
and drug discovery. Moreover, GaMD mainly accelerates transitions
across enthalpic energy barriers because it is based on boosting the
system potential energy surface. However, GaMD can be potentially
combined with the parallel tempering[36] and
replica exchange[37] algorithms for further
enhanced sampling over entropic barriers as discussed earlier.[11]In summary, we have implemented GaMD in
NAMD 2.11 that shows excellent
scalability for supercomputer simulations of large biomolecules.[38] The updated source code files of NAMD 2.11 for
implementing GaMD are now publicly available at http://gamd.ucsd.edu. The implementation
shall be released in the upcoming version 2.12 of NAMD. It is complementary
to the original implementation of GaMD in the graphics processing
unit (GPU) version of the AMBER software[11] that runs extremely fast simulations with one or a small number
of GPU cards.[17c,32] As demonstrated on selected model
systems, results of the current work shall facilitate the applications
of GaMD in enhanced sampling and free energy calculations of a wide
range of large biomolecules, such as proteins, lipid membrane, nucleic
acids, virus particles, and cellular structures.
Authors: Thomas A Halgren; Robert B Murphy; Richard A Friesner; Hege S Beard; Leah L Frye; W Thomas Pollard; Jay L Banks Journal: J Med Chem Date: 2004-03-25 Impact factor: 7.446
Authors: Richard A Friesner; Jay L Banks; Robert B Murphy; Thomas A Halgren; Jasna J Klicic; Daniel T Mainz; Matthew P Repasky; Eric H Knoll; Mee Shelley; Jason K Perry; David E Shaw; Perry Francis; Peter S Shenkin Journal: J Med Chem Date: 2004-03-25 Impact factor: 7.446
Authors: Yinglong Miao; Sara E Nichols; Paul M Gasper; Vincent T Metzger; J Andrew McCammon Journal: Proc Natl Acad Sci U S A Date: 2013-06-18 Impact factor: 11.205
Authors: Andrew C Kruse; Jianxin Hu; Albert C Pan; Daniel H Arlow; Daniel M Rosenbaum; Erica Rosemond; Hillary F Green; Tong Liu; Pil Seok Chae; Ron O Dror; David E Shaw; William I Weis; Jürgen Wess; Brian K Kobilka Journal: Nature Date: 2012-02-22 Impact factor: 49.962
Authors: James C Phillips; David J Hardy; Julio D C Maia; John E Stone; João V Ribeiro; Rafael C Bernardi; Ronak Buch; Giacomo Fiorin; Jérôme Hénin; Wei Jiang; Ryan McGreevy; Marcelo C R Melo; Brian K Radak; Robert D Skeel; Abhishek Singharoy; Yi Wang; Benoît Roux; Aleksei Aksimentiev; Zaida Luthey-Schulten; Laxmikant V Kalé; Klaus Schulten; Christophe Chipot; Emad Tajkhorshid Journal: J Chem Phys Date: 2020-07-28 Impact factor: 3.488
Authors: Jingbo Zhang; Nuo Wang; Yinglong Miao; Felix Hauser; J Andrew McCammon; Wouter-Jan Rappel; Julian I Schroeder Journal: Proc Natl Acad Sci U S A Date: 2018-10-09 Impact factor: 11.205