Jan Neumann1, Nadine Schwierz2. 1. Allianz Global Investors GmbH, Bockenheimer Landstrasse 42, 60323 Frankfurt am Main, Germany. 2. Department of Theoretical Biophysics, Max-Planck-Institute of Biophysics, 60438 Frankfurt am Main, Germany.
Abstract
Magnesium is an indispensable cofactor in countless vital processes. In order to understand its functional role, the characterization of the binding pathways to biomolecules such as RNA is crucial. Despite the importance, a molecular description is still lacking since the transition from the water-mediated outer-sphere to the direct inner-sphere coordination is on the millisecond time scale and therefore out of reach for conventional simulation techniques. To fill this gap, we use transition path sampling to resolve the binding pathways and to elucidate the role of the solvent in the binding process. The results reveal that the molecular void provoked by the leaving phosphate oxygen of the RNA is immediately filled by an entering water molecule. In addition, water molecules from the first and second hydration shell couple to the concerted exchange. To capture the intimate solute-solvent coupling, we perform a committor analysis as the basis for a machine learning algorithm that derives the optimal deep learning model from thousands of scanned architectures using hyperparameter tuning. The results reveal that the properly optimized deep network architecture recognizes the important solvent structures, extracts the relevant information, and predicts the commitment probability with high accuracy. Our results provide detailed insights into the solute-solvent coupling which is ubiquitous for kosmotropic ions and governs a large variety of biochemical reactions in aqueous solutions.
Magnesium is an indispensable cofactor in countless vital processes. In order to understand its functional role, the characterization of the binding pathways to biomolecules such as RNA is crucial. Despite the importance, a molecular description is still lacking since the transition from the water-mediated outer-sphere to the direct inner-sphere coordination is on the millisecond time scale and therefore out of reach for conventional simulation techniques. To fill this gap, we use transition path sampling to resolve the binding pathways and to elucidate the role of the solvent in the binding process. The results reveal that the molecular void provoked by the leaving phosphate oxygen of the RNA is immediately filled by an entering water molecule. In addition, water molecules from the first and second hydration shell couple to the concerted exchange. To capture the intimate solute-solvent coupling, we perform a committor analysis as the basis for a machine learning algorithm that derives the optimal deep learning model from thousands of scanned architectures using hyperparameter tuning. The results reveal that the properly optimized deep network architecture recognizes the important solvent structures, extracts the relevant information, and predicts the commitment probability with high accuracy. Our results provide detailed insights into the solute-solvent coupling which is ubiquitous for kosmotropic ions and governs a large variety of biochemical reactions in aqueous solutions.
Magnesium
plays a vital role in almost every biological process.
By now more than 800 different biochemical roles of Mg2+ have been identified in physiological processes ranging from the
creation of cellular energy or the synthesis of biomolecules to the
activation of enzymes and ribozymes.[1−5] The specific requirement for Mg2+ as a cofactor is particularly
pronounced in nucleic acid systems where Mg2+ plays structural
roles by complexing negatively charged groups or catalytic roles by
accelerating or inhibiting chemical reactions in ribozymes.[3,6−8]In RNA systems, Mg2+ ions are essential
for two reasons:
They screen the electrostatic repulsion, thus allowing RNA to fold
into compact and functional structures.[9] In addition, a smaller fraction of Mg2+ ions interacts
directly with the functional atom groups of the RNA. These site-specific
ions stabilize the three-dimensional structure further and are involved
either in a direct contact (inner-sphere) or are mediated through
the hydrogen bond of a coordinating water molecule (solvent-shared).[7,9] The second reason why cations are essential is that binding of Mg2+ to active binding sites allows ribozymes to perform chemical
reactions that would not be possible from the basic RNA building blocks
alone.[3,6]Despite the importance of Mg2+ in RNA biology, a detailed
understanding of the Mg2+-RNA interactions at the atomic
level is still lacking. As a first step in providing a thorough understanding,
we here resolve the binding pathways of Mg2+ to RNA and
elucidate the dynamic interplay of direct ion-RNA and indirect water-mediated
interactions.In aqueous solutions, the first hydration shell
of Mg2+ consists of six water molecules arranged in octahedral
symmetry.[10] Water molecules from the first
hydration shell
exchange with the second, more loosely bound hydration shell on the
microsecond time scale.[11−13] This dynamic equilibrium allows
for ligand exchange. Hereby, oxygen atoms are the preferred binding
partners, in particular the nonbridging oxygens of the phosphate group
on the backbone of RNA.[3,14] Interestingly, water exchange
around Mg2+ is orders of magnitude slower compared to other
metal ions.[11,15] The long lifetimes of water molecules
in the first hydration shell facilitate two distinct binding arrangements:
inner-sphere and outer-sphere. In inner-sphere binding, one water
molecule is removed from the first hydration shell, and the ion is
in direct contact with the RNA atoms. In outer-sphere binding, the
interactions with the RNA are mediated by water, while Mg2+ remains coordinated by six water molecules.[3]Does Mg2+ bind in inner- or outer-sphere coordination
and how can it transition from the one binding mode to the other?
Discerning the nature of the exact interactions is an important and
controversially discussed question in the field.[14,16,17] Typically, structural knowledge on the binding
mode of Mg2+ is obtained from crystallographic experiments.
However, the correct assignment in the electron density maps remains
notoriously difficult. Yet, with proper stereochemical guidelines
a consistent picture emerges: Nucleobase nitrogen and carbonyls are
poor inner-sphere Mg2+ binders,[16,17] while the nonbridging oxygens of the phosphate group are the primary
nucleic acid binding location. In large RNA structures, Mg2+ ions can, depending on the exact environment, bind in inner- or
outer-sphere coordination,[3,18] whereas the inner-sphere
coordination is thermodynamically favored in simple mononucleotide
systems.[19]In addition to the structural
information from crystallography,
NMR can provide valuable insight into the exchange kinetics.[11,20,21] Depending on the number of direct
Mg2+-RNA contacts, the time scale for ligand exchange ranges
from milliseconds[20,22,23] to hundreds of seconds.[21] However, the
structural changes at the microscopic level during an exchange process
are not accessible from experiments. Here, simulations can contribute
important insights by characterizing the solvent behavior and by providing
a unique atomistic description of the dynamics. Yet, simulating the
transition from outer-to-inner sphere binding is tremendously challenging
for two reasons: According to experiments, the transition is on the
millisecond to second time scale and therefore out of reach for conventional
all-atom simulations. Alternatively, enhanced sampling techniques
such as metadynamics, replica exchange, and umbrella sampling can
be applied[24,25] but do not prove any information
on the exchange dynamics and reaction mechanism.The second
challenge is the coupling of the solvent to the exchange
dynamics. Kosmotropic ions, such as Mg2+, provoke strong
structural ordering of the first hydration shells. Therefore, any
process in aqueous solutions is expected to be governed by a complex
interplay of structural, orientational, and hydrogen bonding effects
extending over several hydration shells. It is therefore not surprising
that the solvent not only is a spectator of the chemical process but
also plays an active role in the evolution. Similarly, the dynamics
of seemingly simple processes such as dipeptide isomerization, ion
pair formation, water exchange between hydration shells, or proton
and electron transfer is governed by solute–solvent coupling.[26−33] Therefore, attempts in describing the dynamics of such processes
in terms of simplified reaction coordinates that do not include the
solute–solvent coupling are likely to fail for several reasons.
(i) Solvent reorganization is orthogonal to such simplified reaction
coordinates giving rise to pronounced non-Markovian behavior or memory
effects.[34] (ii) The metastable states are
not uniquely separated leading to the violation of the no-recrossing
assumption and failure of transition state theory.[35] (iii) Enhanced sampling techniques that rely on biasing
the slow degrees of freedom become inefficient as the solvent reorganization
becomes slower than the reactive motion itself. Consequently, quantifying
the contribution of the solvent to the reaction coordinate is one
of the long-standing problems in chemical reaction kinetics.In order to address both challenges, we apply transition path sampling
as a particularly powerful sampling strategy to provide unbiased microscopic
insight into the dynamics of Mg2+ binding to the phosphate
oxygen of RNA. Subsequently, we perform a committor analysis comprising
more than 28,600 configurations as the basis for a machine learning
algorithm which automatically selects the optimal deep learning model
from thousands of scanned architectures in a robust and efficient
manner. The resulting optimized deep neural network is shown to capture
the intimate solute–solvent coupling and to provide an accurate
description of the complete dynamical process.
Methods
Atomistic
Model and Simulation Setup
Our model system
consists of an RNA dinucleotide with two guanine nucleobases (Figure ). The RNA dinucleotide
is an ideal model system to investigate ion-RNA interactions since
it contains the three most important metal cation binding sites: The
nonbridging phosphate oxygens of the backbone and the N7 and O6 metal
binding site on the nucleobase.[19] For Mg2+, there is clear experimental evidence that the nonbridging
phosphate oxygens are the primary nucleic acid binding site for Mg2+.[3,17] Since both oxygen atoms (O1P and O2P) have
identical force field parameters, we focus on the interactions with
O1P which has a partial charge of −0.776e.
Our selection of the phosphate oxygens is further justified by a large
number of unbiased simulations that reveal exclusive binding of Mg2+ to the phosphate oxygen and not to the nucleobases in agreement
with previous results.[25] A single Mg2+ ion, one Cl– ion, and 2150 water molecules
are added to the cubic simulation box (L = 40 Å).
Since the dinucleotide has one negative charge, the simulation box
is neutral. For Mg2+ and Cl–, recently
optimized force field parameters were used[36] in combination with the TIP3P water model.[37] The TIP3P water model assigns partial charges of −0.834e and 0.417e to oxygen and hydrogen. Note
that to date a large variety of water models exist with some reproducing
the physical properties of water better than TIP3P.[38] Our main motivation to use TIP3P water is that it is frequently
used in biomolecular simulations since protein and nucleic acid force
fields were optimized with TIP3P. The force field parameters of the
RNA are taken from Amber99sb-ildn*[39] with
parmbsc0[40] and χ0L3 corrections.[41]
Figure 1
Simulation snapshot of the RNA dinucleotide consisting
of two guanine
nucleobases with a Mg2+ ion in inner- or outer-sphere coordination.
The backbone binding sites (atoms O1P and O2P) and the N7 and the
O6 binding site of guanine are shown. The exchanging water molecule
is shown in blue.
Simulation snapshot of the RNA dinucleotide consisting
of two guanine
nucleobases with a Mg2+ ion in inner- or outer-sphere coordination.
The backbone binding sites (atoms O1P and O2P) and the N7 and the
O6 binding site of guanine are shown. The exchanging water molecule
is shown in blue.The Mg2+ parameters
were optimized in our previous work[36] in
combination with TIP3P and are particularly
suited to investigate ion binding since they reproduce experimental
activity coefficients and the hydration free energy. In addition,
the Mg2+ parameters by Mamatkulov and Schwierz[36] predict an interchange dissociative exchange
mechanism in water in agreement with experiments, while polarizable
force fields[42−44] erroneously yield an associative mechanism.[45] Note that models with variable polarizability[46] or scaled charge force fields[47,48] might yield improvement but are beyond the scope of our current
work.A detailed comparison of the Mamtkulov Mg2+ force field
parameters with other parameters from the literature can be found
in refs (36 and 49).All simulations were performed using GROMACS[50] with periodic boundary conditions. Particle mesh Ewald
summation was used along with a Fourier spacing of 0.12 nm and a grid
interpolation up to order 4 to handle long-range electrostatic forces.
Close Coulomb real space interactions were cut off at 1.2 nm, and
Lennard-Jones (LJ) interactions were cut off after 1.2 nm, respectively.
Long-range dispersion corrections for energy and pressure were applied
to account for errors stemming from truncated LJ interactions.The initial energy minimization was performed with the steepest
descent algorithm. For each simulation, an NVT and a subsequent NPT
equilibration were done for 1 ns, controlling the temperature at 300
K and at a pressure of 1 bar with the Berendsen thermostat and barostat.[51] All production runs and the transition path
sampling were done in the NVT ensemble at a temperature of 300 K using
the velocity rescaling thermostat with stochastic term[52] and a time step of 2 fs. Here, the velocity
rescaling thermostat was used since it generates the canonical ensemble
and ensures detailed balance in the canonical ensemble of transition
paths.[53]
Free Energy Profiles
The one-dimensional free energy
profile as a function of the distance r between the Mg2+ ion and the oxygen atom
O1P was calculated from umbrella sampling using PLUMED.[54] A force constant k = 600,000 kJ/(mol nm2) and a window spacing
of 0.005 nm were used for r < 0.35 nm. A force constant k = 60,000 kJ/(mol nm2) and a window spacing
of 0.01 nm were used for r > 0.35 nm. To ensure convergence, we computed the reverse
pathway and performed a block analysis by dividing the 100 ns long
simulation into eight individual blocks (Figure S1 in the Supporting Information.)The two-dimensional
free energy profiles as a function of r and the effective hydration distance s6 were calculated from umbrella sampling using PLUMED.[54] Hereby, s6 was defined
as the sum of six Mg2+-oxygen distanceswhere ri are the
distances between Mg2+ and the five closest water molecules,
and rex is the distance between Mg2+ and the exchanging water molecule. A force constant k = 100,000 kJ/(mol nm2) and a window spacing of 0.01 nm were used. Without further
restraints, the system shows three stable states in the two-dimensional
projection (see Figure S2 in the Supporting Information). In the umbrella sampling, we focus only on transitions between
the two relevant states (corresponding to the inner-sphere configuration
with five coordinating water molecules and the outer-sphere coordination
in which O1P is replaced by the selected water molecule for which
the umbrella potential is applied). This is justified further by the
results from the unbiased simulations (TPS and fleeting trajectory
setup with 3.5 μs total simulation time) which clearly demonstrate
that Mg2+ goes directly to the phosphate oxygen and never
to any of the nucleobase binding sites. In the umbrella sampling,
this is achieved by an additional biasing potential on the hydration
number (see the Supporting Information for
further details). Position restraints were applied on all heavy atoms
of the dinucleotide with the exception of the nonbridging oxygen atoms
of the phosphate group. Each umbrella simulation was performed for
2 ns discarding 500 ps for equilibration. The free energy profiles
were calculated using the weighted histogram analysis method (WHAM).[55]
Transition State Theory
TST is the
most popular theory
to calculate reaction rates. In simple systems for which the reaction
coordinate is exactly known, TST gives an accurate estimate of the
rate. However, in complex many-body systems such as the one presented
here, TST could fail due to the violation of the nonrecrossing hypothesis
which forms the cornerstone of the theory. Therefore, TST can be used
only to provide an upper estimate for the rate constant. For a more
accurate estimate, additional corrections as implemented in the reactive
flux method are required[35,56] but are beyond the
scope of our current work. Following conventional transition state
theory (TST), the upper estimate of the rate constant is calculated
from[57,58]where λ* is the position
of the barrier top, θ(λ̇) is the Heaviside step
function, and ⟨...⟩ denotes
the average over the restrained ensemble of trajectories initiated
from an equilibrium ensemble of phase points on the dividing surface.
Transition Path Sampling
Transition path sampling[53,59] was used to harvest an ensemble of rare trajectories that connect
the two stable states. Starting from an initial reactive pathway generated
at high temperature, new trial trajectories were created by randomly
selecting a time slice, randomizing the velocities, and integrating
the equations of motion forward and backward in time. Using standard
two-way shooting moves with a fixed length of 1.6 ps, trial moves
are accepted if they connected the two stable states and rejected
otherwise.
Committor Analysis and Transition States
For a simulation
snapshot taken from a reactive transition pathway, the committor pA is defined as the probability of the configuration
initiated with randomized velocities drawn from a Maxwell–Boltzmann
distribution to be committed to state A (inner-sphere coordination).
Configurations from basin A have pA =
1, configurations from basin B have pA = 0, and transition states have pA =
0.5. The commitment probability pA was
calculated from the fraction of trajectories initiated with randomized
velocities that reach state A. 29,112 configurations along more than
2500 independent pathways obtained from transition path sampling were
used as shooting points. 28,612 shooting points were chosen in the
transition region (with 0.6 < λ < 1.0 based on the coordinate
λ defined in eq ). In addition, 500 shooting points were selected in the regions
of the stable states (λ < 0.6 and λ > 1.0). For
each
shooting point, 100 trajectories were initiated with random velocities
and run forward and backward for 2 ps. A configuration was identified
as a transition state if half of the trajectories relaxed into each
stable state (0.45 < pA < 0.55).
From the ensemble of shooting point, two subsets with r = 0.28 ± 0.02 nm or λ =
0.8 ± 0.05 with 3,500 randomly drawn data points were selected,
and the distribution p(pA) was calculated.In total, 3.5 μs of unbiased simulation
data were used to test the different putative reaction coordinates.
Commitment Probability from Deep Neural Networks
Deducing
an appropriate description by visual inspection is virtually impossible
if complex configurational rearrangements are considered as in the
system presented here. Neural networks, on the other hand, are particularly
suited for this task. For example, in the pioneering work by Ma and
Dinner,[28] neural networks were used to
predict the committor based on a set of coordinates for the isomerization
of alanine dipeptide. More recently, Jung et al. proposed and implemented
a combination of path sampling and deep neural networks to guide the
sampling and to extract the reaction coordinate.[26,27,60] Following the work by Jung et al.,[26,27] we used a deep neural network to learn the reaction coordinate from
the outcome of the committor analysis by minimizing the likelihood
loss function.[61,62] Note, however, that we did not
guide the sampling as described in refs (26 and 27). Instead, we performed a preceding
committor analysis (as described above) and used the committor to
map the molecular configurations onto the reaction coordinate qpred(X). Each configuration was
described by a set of 83 physical properties (referred to as features X in the following). The committor was parametrized as in
refs (26 and 27)where the predicted
committor ppred is a nonlinear function
of all the features X of the system. Note that by rescaling
the reaction coordinate
via q = q̃/2 the frequently
used expression for the committor by Peters and Trout[62] is retrieved:Therefore, eqs and 4 are expected to yield
identical results. The likelihood that a model can reproduce the observed
data is given by[26,27,61,62]where N is
the number of shooting points, and n and n are
the number of trajectories that reach state A and B from shooting
point i, respectively. Following the work by Jung
et al.,[26,27] we modeled the committor with a deep neural
network. Hereby, the weight matrix that defines the connection between
the nodes of the deep neural network was optimized by minimizing the
negative log likelihood loss l(26,27)The overall accuracy and generalization
of
the machine learning model (i.e., the deviation of the predicted values pApred from the simulated ones pAtrue) was estimated by dividing the available
data into a training set, a validation set, and a test set. The first
was used to train the model, the second was used to validate the training
result, and the third was used to check whether the final model is
able to correctly predict the values for structures that were not
used for training.
Training a Deep Learning Regression Model
with Hyperparameter
Optimization
One challenge in the application of neural networks
is to choose the model architecture. As illustrated in Figure A, the accuracy of the network
to predict the commitment probability of a configuration and to autonomously
select the transition state strongly depends on the underlying model
architecture. However, there is no generic way to determine all the
model parameters such as the number of hidden layers, the number of
neurons, and all hyperparameters a priori. Moreover, a network architecture
that works well for one system might fail to capture other systems
or dissimilar problems. Therefore, building an optimal deep neural
network manually by trial and error is a time-consuming problem which
highly depends on human expertise and intuition. In order to make
this progress more efficient, we developed an algorithm that automatically
finds the optimal network architecture. First, we specified a class
of multilayer perceptron (MLP) regression models that are described
by a set of hyperparameters.[63] Subsequently,
we selected the optimal model by running a Keras Tuner random search
hyperparameter optimization.[64] Hereby,
the generic MLP was defined by the following set of hyperparameters:
number of hidden dense layers (1–6), number of neurons per
hidden layer (32–256), activation function (Rectified Linear
Units (ReLU) or Scaled Exponential Linear Unit (SELU)), position of
a single dropout layer (none/after feature input/middle layer/before
output layer), dropout rate (0–40%), and learning rate for
the Adam Optimizer (logarithmic sampling from 10–4–10–2). For ReLU activation functions, a
Glorot uniform initialization and a normal dropout layer were used.
For SELU activation functions, a Lecun normal initialization and an
alpha dropout layer were used. ReLU and SELU are the standard choices
for regression models in Keras and were therefore used. The custom
loss function by Jung et al.[26,27] according to eq was used. All models were
trained via back-propagation using the stochastic gradient algorithm.
Figure 2
Dependence
of the performance on the model architecture. (A) Comparison
of the committor distribution p(pAtrue) for
transition states selected from deep learning algorithms with different
model architectures (see the Supporting Information for further details of the models). (B) Validation likelihood loss l for a random subset of model architectures. One parameter
of the architecture, the total number of neurons, is shown in color.
In total, several thousand different model architectures were explored
by running a Keras Tuner random search in the space of model parameters.
Dependence
of the performance on the model architecture. (A) Comparison
of the committor distribution p(pAtrue) for
transition states selected from deep learning algorithms with different
model architectures (see the Supporting Information for further details of the models). (B) Validation likelihood loss l for a random subset of model architectures. One parameter
of the architecture, the total number of neurons, is shown in color.
In total, several thousand different model architectures were explored
by running a Keras Tuner random search in the space of model parameters.The data was split into a 20% test set, a 72% training
set, and
an 8% validation set. Different splitting of the data yielded similar
results (see the Supporting Information). All features were normalized using the Keras standard scaler with
mean μ = 0 and standard deviation σ = 1. Each model was
set up for 50 epochs with a training batch size of 128. Early stopping
with patience 10 was used on the validation loss. Multiple thousand
different model architectures were explored, and the likelihood loss
of a random subset of 2,160 models is shown in Figure B.For further optimization, a subset
of best models based on minimal
validation loss was chosen. The training of the best model subset
was continued for up to 200 epochs with the previous batch size and
early stopping. In addition, a stepwise reduction of the learning
rate by a factor of 5 down to 10–5 was used if the
validation loss reached a plateau for 7 training steps. Finally, the
best model was chosen based on the highest maximum in the p(pApred) distribution.In summary, the optimization
algorithm allows us to find the optimal
model architecture in a quick and reliable fashion. The optimal deep
learning model has five hidden layers with 256 neurons, ReLU activation,
and an initial learning rate of 1.9 * 10–4. A dropout
layer with 30% dropout is placed directly before the output layer.
All code was developed with Keras,[65] scikit-learn,[66] and TensorFlow.[67]
Features
For each shooting point, we calculated 83
features that reflect different structural properties of the molecular
configuration. Focusing only on the dominant indirect exchange, the
data set has 17,477 entries. Each entry consists of 83 features, and
the label pAtrue was calculated for each shooting point
using the committor analysis. Specifically, the features include all
distances r between
Mg2+ and the 20 closest water molecules, distances between
Mg2+ and different RNA atoms and the Cl– ion, all angles a formed
between Mg2+, O1P, and the 20 closest water molecules,
the Steinhardt-Nelson order parameters q3, q4, and q6(68) of the first and second hydration shell,
tetrahedral order parameters,[54] and the
number of hydrogen bonds in the first and second hydration shell and
between RNA and water (Table S1).Transition path sampling, performed in previous work,[31,45] yielded detailed molecular insights into the mechanism of water
exchange in the first hydration shell and showed that the coordinates
of the exchanging ligands need to be included to provide a reasonable
description. Similarly, in the present case, a putative, knowledge-based
reaction coordinate λ was defined, which combines the Mg2+-RNA distance r and the effective hydration distance s6 via a trigonometric functionThe atan2
function is an extension of the
usual atan function with two real numbers as argument. Therewith,
the atan2 function comprises enough information to yield function
values in the range of 0–2π and allows us to cover all
possible values for s6 and r. Note that the parameters s60 = 1.18 nm
and r0 = 0.18 nm correspond to the minima in
the two-dimensional free energy profile. With this choice, λ
= π/2 corresponds to the inner-sphere coordination, and λ
= 0 corresponds to the outer-sphere coordination (Figure B).
Figure 3
Free energy landscape
of Mg2+-RNA interactions. (A)
Free energy profile F(r) as a function of the distance between Mg2+ and the phosphate oxygen. Simulation snapshots of the two stable
states are shown. (B) Two-dimensional free energy landscape F(r, s6) as a function of r and s6 where s6 is the sum over the distances between Mg2+ and the five closest molecules and the exchanging water
molecule. The dashed line indicates configurations from the top of
the free energy profile with r* = 0.28 nm shown
in (A). Trajectories, initiated from the upper green stripe, relax
back into state A, and trajectories from the lower blue stripe relax
back into state B. The diagonal dashed line for λ* = 0.8 approximates
the saddle between states A and B. (C) Committor distributions pSim(pA) for trajectories
initiated with r* = 0.28 nm (top of the one-dimensional
free energy profile shown in A) and with λ* = 0.8 (saddle of
two-dimensional free energy profile shown in B). pid(pA) is the distribution
expected for an ideal reaction coordinate with 3500 shooting points.
Free energy landscape
of Mg2+-RNA interactions. (A)
Free energy profile F(r) as a function of the distance between Mg2+ and the phosphate oxygen. Simulation snapshots of the two stable
states are shown. (B) Two-dimensional free energy landscape F(r, s6) as a function of r and s6 where s6 is the sum over the distances between Mg2+ and the five closest molecules and the exchanging water
molecule. The dashed line indicates configurations from the top of
the free energy profile with r* = 0.28 nm shown
in (A). Trajectories, initiated from the upper green stripe, relax
back into state A, and trajectories from the lower blue stripe relax
back into state B. The diagonal dashed line for λ* = 0.8 approximates
the saddle between states A and B. (C) Committor distributions pSim(pA) for trajectories
initiated with r* = 0.28 nm (top of the one-dimensional
free energy profile shown in A) and with λ* = 0.8 (saddle of
two-dimensional free energy profile shown in B). pid(pA) is the distribution
expected for an ideal reaction coordinate with 3500 shooting points.
Feature Relevance
To this end, the
machine learning
model employs a large number of features to model the reaction coordinate
and to predict the commitment probability pApred of new structures.
In the following, we aim to determine the importance of each input
feature. Since the deep learning network consists of a large number
of nodes, weighted sums, and nonlinear transformations, the feature
relevance was calculated numerically by selectively replacing single
features or combinations of features by noise and resampling the model.For the single feature relevance, we generated for every input
feature i a data set in which the values X were replaced by randomly permuted
values X. The random permutation
approach was chosen as it exactly conserves the original distribution.
The normalized relevance of the i-th feature r was defined aswhere lopt is
the converged loss of the optimal model, and lmax is the largest value upon permutation. The relevance corresponds
to the loss of information (i.e., the negative log likelihood increase)
upon effectively removing a single feature from the data set while
leaving all other features unchanged. Accordingly, the single feature
rank ranges from low (unimportant) to high (important).For
single feature permutation importance, standard libraries can
be used.[66,69] For this work, we implemented a custom algorithm
that extends the standard permutation importance algorithms to be
able to measure the importance of combined feature sets (see below).
Note that an alternative approach for the single feature analysis
was proposed in ref (26) where uniform random noise was used instead of random permutations.For the relevance of combined features, N(N + 1) /2 = 3486 permuted data sets were generated as follows.
Starting from the optimal model, the feature which yielded the smallest
loss upon permutation was removed (rank 1). This feature contained
the smallest amount of information and was consequently least important
for the process. Maintaining the permutation of rank 1, we selected
the second feature that yielded the smallest loss upon permutation
(rank 2). The procedure was repeated until all features are permuted.
The normalized relevance for effectively removing the information
on n features is given bywhere lopt is
the converged loss of the optimal model, and lrand is loss obtained by permuting all features. Here, the
relevance corresponds to the increase of the negative log likelihood
upon removing the combination of n features that
contain the least amount of information. Accordingly, the feature
combination rank ranges from low (least important combination) to
high (most important combination).
Results
and Discussion
Free Energy Landscape of Mg2+-RNA
Interactions
During association, one of magnesium’s
six strongly bound
hydration waters is removed to facilitate a direct contact between
Mg2+ and the phosphate oxygen. In the simplest case, this
ligand exchange can be described by the distance r between Mg2+ and the phosphate
oxygen, while all other degrees of freedom are integrated out. The
corresponding one-dimensional free energy profile is shown in Figure A. The free energy profile has two stable states. State
A corresponds to the inner-sphere coordination, and state B corresponds
to the outer-sphere coordination. The free energy barrier from outer-sphere
to inner-sphere is about 21 kBT and exactly matches the value for water exchange.[31] Consequently, the barrier corresponds to the
free energy necessary to remove one water molecule from the first
hydration shell in order to facilitate a direct contact with the phosphate
oxygen. The inner-sphere coordination is thermodynamically more stable
in agreement with experimental findings[19] and Collins’ empirical rule like-seeks-like:[70] Due to the high charge density of the phosphate oxygens,
direct ion pairing with a high binding affinity is expected.Using transition state theory TST (eq ), we can provide an upper estimate of the exchange
rate or conversely a lower estimate of the lifetime. Here, it is essential
to mention that the true rate could be significantly smaller due to
the violation of the nonrecrossing hypothesis. As a lower limit, Mg2+ is estimated to remain about 300 s in the inner-sphere coordination
before transitioning to outer-sphere. Similarly, Mg2+ is
estimated to remain about 0.2 ms in the outer-sphere coordination
before transitioning back to inner-sphere. Note that the exchange
between the outer-sphere cordination and bulk is much faster since
the free energy barrier is significantly smaller (Figure A). The lifetime of the outer-sphere
coordination is in agreement with the millisecond time scale observed
experimentally.[20,22,23] On the other hand, the computed lifetime of the inner-sphere coordination
is likely too high reflecting the shortcoming of current Mg2+ force fields in reproducing experimental binding affinities at nucleic
acid binding sites.[25,49,71]The quality of a reaction coordinate can be assessed by a
committor
analysis. For an ideal reaction coordinate, about half of the trajectories
initiated from the barrier top are expected to relax back to either
stable state. Consequently, the distribution p(pA) of the probability to relax back to state
A should have a sharp peak at pA ≈
1/2 (binomial committor distribution). While r provides a simplified description of the
process, it is not an adequate reaction coordinate by itself. The
committor analysis (Figure C) shows that most configurations, initiated with r = 0.28 nm, relax back to
the inner-sphere coordination. Therefore, the Mg2+-oxygen
distance alone is insufficient to describe the dynamics of the transition.
To provide a more realistic picture, the water molecules from the
first hydration shells have to be included. Figure B shows the two-dimensional free energy landscape
as a function of r and
the hydration parameter s6, which includes
the distances of the five closest waters and the exchanging water
(eq ). From the two-dimensional
representation, the failure of r as the reaction coordinate can be rationalized: Trajectories
starting from the upper panel (s6 >
1.24
nm) are committed to state A, while trajectories starting from the
lower panel (s6 < 1.24 nm) are committed
to state B. r and s6 can be combined into a putative, knowledge-based
reaction coordinate λ (eq ). Based on the committor distribution (Figure C), λ provides significant improvement
compared to r. Still,
configurations with λ* = 0.8 lead to a much broader distribution
compared to the ideal binomial distribution.These results show
that free energy profiles along a few simple
coordinates provide valuable initial insight. Yet, the committor analysis
reveals that the exchange dynamics is more complex than the free energy
profiles might suggest.
Kinetic Pathways from Transition Path Sampling
To gain
insight into the kinetic pathways of Mg2+ association and
dissociation, transition path sampling is applied to sample a large
number of inner-to-outer sphere transitions. Four representative transition
paths that connect the two stable states are shown in Figure A. Figure C shows the distribution of transition times.
Typically, the exchanging water molecules spend less than 0.5 ps in
transition. This time is considerably smaller compared to the millisecond
time scale of the stable states. The clear separation of time scales
highlights the path sampling method as a particularly powerful sampling
strategy for these systems.
Figure 4
Kinetic pathways from transition path sampling.
(A) Four representative
pathways connecting the two stable states. Blue/gray pathways correspond
to the indirect/direct exchange mechanism. (B) Probability distribution
along transition pathways for the distance r between Mg2+ and the phosphate oxygen and the exchange
angle αex between phosphate oxygen, Mg2+, and exchanging water oxygen. (C) Distribution of transition times
for the indirect and the direct exchange mechanisms. (D) Indirect
Mg2+ exchange mechanism in which the leaving phosphate
oxygen and the incoming water molecule occupy different positions
on the water octahedron. (F) Direct Mg2+ exchange mechanism
in which the leaving phosphate oxygen and the incoming water molecule
occupy the same position on the water octahedron. In (D) and (E),
the five closest water molecules are shown. The incoming water molecule
is highlighted in blue.
Kinetic pathways from transition path sampling.
(A) Four representative
pathways connecting the two stable states. Blue/gray pathways correspond
to the indirect/direct exchange mechanism. (B) Probability distribution
along transition pathways for the distance r between Mg2+ and the phosphate oxygen and the exchange
angle αex between phosphate oxygen, Mg2+, and exchanging water oxygen. (C) Distribution of transition times
for the indirect and the direct exchange mechanisms. (D) Indirect
Mg2+ exchange mechanism in which the leaving phosphate
oxygen and the incoming water molecule occupy different positions
on the water octahedron. (F) Direct Mg2+ exchange mechanism
in which the leaving phosphate oxygen and the incoming water molecule
occupy the same position on the water octahedron. In (D) and (E),
the five closest water molecules are shown. The incoming water molecule
is highlighted in blue.The distribution of the
exchange angles (Figure B) along reactive pathways indicates that
two alternative exchange pathways exist: In the indirect exchange
mechanism, the leaving oxygen ligand and the entering water
molecule occupy different positions on the water octahedron (Figure D). During activation,
one water molecule from the second hydration shell enters the molecular
void between the hydration shells. This motion leads to a concerted
motion of the phosphate oxygen out of the first hydration shell. The
distances of the leaving phosphate oxygen and the entering water are
elongated compared to their equilibrium values (Table ). Reflecting balanced electrostatic interactions,
the Mg2+-phosphate oxygen distance is smaller than the
Mg2+-water oxygen distance due to the smaller partial charge
on the phosphate oxygen compared to the water oxygen. The distances
of the five closest molecules remain relatively unchanged (Table ). Yet, they rearrange
such that the transition state has an approximate mirror symmetry.
The mirror plane is perpendicular to the plane composed of Mg2+, the phosphate, and water oxygen and contains three water
molecules (Figure D).
Table 1
Properties of the Transition State
Ensemblea
mechanism
rI [Å]
rex [Å]
rs [Å]
αex [deg]
τ [ps]
indirect
3.17 ± 0.03
3.51 ± 0.03
1.928 ± 0.003
122.9 ± 14.7
0.48 ± 0.01
direct
3.28 ± 0.03
3.46 ± 0.03
1.928 ± 0.003
47.7 ± 5.1
0.56 ± 0.01
Mg2+-O1P distance r, Mg2+-oxygen distance
of exchanging water rex, Mg2+-oxygen distance of the five closest water molecules r, angle between O1P, Mg2+, and the exchanging water molecule αex, average
transition time τ. Standard deviations are indicated. A sample
size of 1759 or 754 transition states was used for the indirect and
direct exchange mechanisms, respectively.
Mg2+-O1P distance r, Mg2+-oxygen distance
of exchanging water rex, Mg2+-oxygen distance of the five closest water molecules r, angle between O1P, Mg2+, and the exchanging water molecule αex, average
transition time τ. Standard deviations are indicated. A sample
size of 1759 or 754 transition states was used for the indirect and
direct exchange mechanisms, respectively.In the direct exchange mechanism,
the leaving
oxygen ligand and the entering water molecule occupy the same positions
on the water octahedron (Figure E). The exchange arises via the attack of the incoming
water onto the edge of the octahedron formed by the oxygen ligands.
Similar as in the indirect pathway, the Mg2+-phosphate
oxygen distance is slightly smaller than the Mg2+-water
oxygen distance, while the distances of the five closest water molecules
remain relatively unchanged (Table ). In the transition state, the five closest water
molecules form a square pyramidal coordination, and the transition
state has a distorted C2 symmetry (Figure E).Based on
the change of bond length during activation, both pathways
correspond to an interchange dissociative (I) process and are akin to the pathways of
water exchange.[31] In equilibrium, the indirect
exchange is observed much more frequently (92%) compared to the direct
mechanism (8%) since configurations with cis positions of exchanging
ligands (direct pathways) are energetically less favorable compared
to trans positions (indirect pathways).[72]The results reveal that phosphate oxygen, exchanging water,
and
the five closest water molecules play a decisive role in the exchange
mechanism. However, additional simulations in which those coordinates
were fixed, while the solvent outside the first hydration shell was
relaxed, show that they are yet insufficient to predict p. Consequently, solvent molecules beyond
the first hydration shell are crucial for the exchange and need to
be considered explicitly.
Optimized Deep Neural Networks for Quantitative
Predictions
During the transition from outer-to-inner sphere
binding, water
molecules from the first two hydration shells couple to the exchange.
The reordering of close and distant water molecules determines whether
the process can proceed or not. This behavior might be expected since
the kosmotropic Mg2+ ion causes strong orientational ordering
in the first hydration shells leading to long-range and collective
interactions. However, resolving the subtle rearrangements and providing
a quantitative description of the solute–solvent coupling is
a demanding problem that is impossible to solve by visual inspection.
In order to make progress in quantifying the solvent participation,
we use a deep neural network to model the outer-to-inner sphere exchange.
Here, we focus the machine learning on the dominant indirect exchange.
In particular, we use 17,477 structures along independent transition
paths and their commitment probabilities pA (Figure A) to learn
the functional relation between pA and
the features describing the structure of Mg2+, RNA, and
the first two hydration shells.
Figure 5
Reaction coordinate from artificial intelligence.
(A) Data set
used for machine learning. The committor probability pAtrue and free
energy contour are shown as a function of the features r and s6 for
the full data set consisting of 17,477 entries for indirect exchange
pathways (72% training set, 8% validation set, 20% test set). (B)
Committor values pApred predicted by the optimized deep neural
network correlated with the values pAtrue obtained from
the committor simulations. The RMS error of the prediction is 6.5%.
(C) Comparison of the committor distribution p(pAtrue) for transition states selected from the deep learning algorithm,
from linear regression over all features, and from the expert knowledge
using eq and the binomial
distribution expected for an ideal reaction coordinate.[61] (D) Ranking of the features according to their
relevance in the machine learned reaction coordinate: single feature
relevance r (top) and
combined feature relevance r (bottom). The color indicates the rank of the features according
to the combined feature relevance. The corresponding features are
shown in Table and Table S2.
Reaction coordinate from artificial intelligence.
(A) Data set
used for machine learning. The committor probability pAtrue and free
energy contour are shown as a function of the features r and s6 for
the full data set consisting of 17,477 entries for indirect exchange
pathways (72% training set, 8% validation set, 20% test set). (B)
Committor values pApred predicted by the optimized deep neural
network correlated with the values pAtrue obtained from
the committor simulations. The RMS error of the prediction is 6.5%.
(C) Comparison of the committor distribution p(pAtrue) for transition states selected from the deep learning algorithm,
from linear regression over all features, and from the expert knowledge
using eq and the binomial
distribution expected for an ideal reaction coordinate.[61] (D) Ranking of the features according to their
relevance in the machine learned reaction coordinate: single feature
relevance r (top) and
combined feature relevance r (bottom). The color indicates the rank of the features according
to the combined feature relevance. The corresponding features are
shown in Table and Table S2.
Table 2
Ranking
of the Eight Most Important
Features in the Machine Learned Reaction Coordinate According to Their
Normalized Relevance (Eqs and 9)a
c-rank
feature
s-rank
feature
84
rI – rex
84
rI – rex
83
rI
83
rI
82
q40
82
q40
81
rP
81
rP
80
hO1P1
80
hex1
79
hex1
79
q41
78
a1
78
rCl
77
s6
77
a7
The combined
feature rank (c-rank)
corresponds to the combination of N – n features that contains the most information, i.e., yields
the smallest log likelihood loss upon permuting n features. The single feature rank (s-rank) corresponds to the feature
which yields the highest loss upon permuting the feature while leaving
all other features unchanged. The ranking of all N = 83 features is given in Table S1.
The choice of the architecture of the deep learning model is essential
for its performance. To derive the optimal deep learning model, we
have defined an algorithm that systematically selects the optimal
architecture by scanning through thousands of individual models using
hyperparameter optimization. In order to make robust predictions and
prevent overfitting, we optimize the neural network on the training
set and select the model that performs best on the validation set.
Finally, using the test set, we illustrate the performance of the
optimal machine learning model in predicting the commitment probability
of unknown structures (Figure B). The results demonstrate that the optimized deep neural
network is capable of predicting pA with
an RMS error of 6.5%. The high accuracy in predicting the progress
of the binding process clearly shows that the network extracts all
relevant information, combines it in a weighted nonlinear fashion
to a scalar reaction coordinate, and captures the details of the solute–solvent
coupling precisely.Further insight into the quality of the
predictions is obtained
from the distribution of commitment probability p(pA) of the transition states selected
autonomously by the neural network (Figure C). The distribution is unimodal and sharply
peaked at pA ≈ 0.5 and closely
resembles the binomial distribution expected for an ideal reaction
coordinate.[61] Consequently, the information
contained in the features is sufficient for the network to single
out the transition state. The results from deep learning provide significant
improvement compared to the reaction coordinate based on expert knowledge
and highlight the machine learning ansatz as a particularly useful
strategy in recognizing the relevant patterns. In addition, the results
from deep learning outperform the results from linear regression based
on the exact same features (Figure C). Therefore, hidden layers and a nonlinear combination
of features using activation functions are essential to describe the
many-body interactions. At the same time, the complexity of the deep
learning model prohibits further insight into the reaction mechanism.
In this regard, it is particularly useful to evaluate the contribution
of each feature. Two complementary rankings are presented in Figure D. The single feature
ranking quantifies the relevance of a single feature independent of
all others. Since the single feature ranking can contain redundant
information, the complementary combined feature rank is given, which
yields the most relevant combination of features. Taken together,
a consistent picture emerges: All features describing the molecular
structure within the first two hydration shells contain information
that is relevant for the process. Adding this information to the model
leads to a continuous decrease of the likelihood loss and therefore
to a linear increase of the relevance r. Finally, four features give rise to an exponential
increase in the relevance and carry about 40% of the information (Table ). These features reflect the concerted motion of the leaving
phosphate oxygen and the entering water molecule as well as the rearrangement
of the five closest water molecules from octahedral to mirror symmetry.
At the same time, the remaining 60% of the information is distributed
over the remaining 79 features reflecting the importance and many-body
nature of the solvent–solute coupling.The combined
feature rank (c-rank)
corresponds to the combination of N – n features that contains the most information, i.e., yields
the smallest log likelihood loss upon permuting n features. The single feature rank (s-rank) corresponds to the feature
which yields the highest loss upon permuting the feature while leaving
all other features unchanged. The ranking of all N = 83 features is given in Table S1.
Conclusion
Characterizing the kinetic pathways of Mg2+ binding
to biomolecules such as RNA is fundamental in understanding its functional
role in biochemical processes. Yet, the transition from water-mediated
outer-sphere to direct inner-sphere binding is on the millisecond
time scale and therefore out of reach for conventional all-atom simulations.
To fill this gap, we used transition path sampling to resolve the
kinetic pathways of Mg2+ binding to the most important
ion binding site on RNA, namely the phosphate oxygen. The results
reveal a superior indirect pathway and an inferior direct exchange
pathway. In both pathways, the molecular void in the first hydration
shell provoked by the leaving phosphate oxygen is immediately filled
by an entering water. At the same time, the water molecules from the
first and second hydration shell couple to the concerted exchange.
These long-range and collective interactions are a direct consequence
of the high ionic charge density of Mg2+ which provokes
strong ordering in the surrounding solvent and an intimate coupling
between solute and solvent. Consequently, ligand exchange gives rise
to a complex interplay of orientational, packing, and hydrogen bonding
effects in which the collective reorientation and translation of several
solvent molecules become important.A complete understanding
of the dynamics requires knowledge of
the detailed molecular motions involved in the exchange process. However,
quantifying their contribution is exceptionally difficult. Instantaneous
fluctuations, resulting from different solvent configurations, influence
the progress in particular since the time scale of solvent reorientation
(about 10 ps) is slower than the reactive motion itself (0.4 ps).
In order to quantify how the reactant and solvent dynamics are coupled,
the rapidly growing field of machine learning offers exciting possibilities
in extracting complex patterns in large data sets and in finding optimal
reaction coordinates.[26−28,60,73,74] The results presented here reveal
that a properly optimized deep neural network is particularly suited
to recognize the molecular motions that occur during the course of
the binding, to extract the relevant information, and to predict the
commitment probability with high accuracy. About half of the information
on the dynamics is contained in the concerted motion of the leaving
phosphate oxygen and the entering water molecule as well as in the
rearrangement of the five closest water molecules. The other half
is contained in the solvent structure rendering deep neural networks
particularly useful in recognizing the relevant molecular structures.The question how the solvent affects processes in aqueous solutions
is ubiquitous in all areas of chemistry and biology. Here, machine
learning provides a promising perspective to explore the intimate
coupling between charged solutes and the solvent at the molecular
level.[26,27,73] Still, more
work is required to investigate whether a simple one-dimensional reaction
coordinate exists that captures such complex dynamics.[75]
Authors: Alberto Pérez; Iván Marchán; Daniel Svozil; Jiri Sponer; Thomas E Cheatham; Charles A Laughton; Modesto Orozco Journal: Biophys J Date: 2007-03-09 Impact factor: 4.033
Authors: Jay W Ponder; Chuanjie Wu; Pengyu Ren; Vijay S Pande; John D Chodera; Michael J Schnieders; Imran Haque; David L Mobley; Daniel S Lambrecht; Robert A DiStasio; Martin Head-Gordon; Gary N I Clark; Margaret E Johnson; Teresa Head-Gordon Journal: J Phys Chem B Date: 2010-03-04 Impact factor: 2.991