Chunyue Ma1, Daniel J Chung2,3, Dylan Abramson1, David R Langley2,3,4, Kelly M Thayer1,2,3. 1. Department of Mathematics & Computer Science, Wesleyan University, Middletown, Connecticut 06459, United States. 2. Department of Chemistry, Wesleyan University, Middletown, Connecticut 06459, United States. 3. Molecular Biophysics Program, Wesleyan University, Middletown, Connecticut 06459, United States. 4. Arvinas Inc., New Haven, Connecticut 06511, United States.
Abstract
Glutathione peroxidase 4 (GPX4) reduces lipid hydroperoxides in lipid membranes, effectively inhibiting iron-dependent cell death or ferroptosis. The upregulation of the enzyme by the mutations at residues D21 and D23 has been suggested to be associated with higher protein activity, which confers more protection against neurodegenerative diseases such as Alzheimer's, Parkinson's, and Huntington's diseases. Therefore, it has become an attractive target for treating and preventing neurodegenerative diseases. However, identifying means of mimicking the beneficial effects of these mutations distant from the active site constitutes a formidable challenge in moving toward therapeutics. In this study, we explore using molecular dynamics simulations to computationally map the conformational and energetic landscape of the wild-type GPX4 protein and three mutant variants to identify the allosteric networks of the enzyme. We present the conformational dynamic profile providing the desired signature behavior of the enzyme. We also discuss the implications of these findings for drug design efforts.
Glutathione peroxidase 4 (GPX4) reduces lipid hydroperoxides in lipid membranes, effectively inhibiting iron-dependent cell death or ferroptosis. The upregulation of the enzyme by the mutations at residues D21 and D23 has been suggested to be associated with higher protein activity, which confers more protection against neurodegenerative diseases such as Alzheimer's, Parkinson's, and Huntington's diseases. Therefore, it has become an attractive target for treating and preventing neurodegenerative diseases. However, identifying means of mimicking the beneficial effects of these mutations distant from the active site constitutes a formidable challenge in moving toward therapeutics. In this study, we explore using molecular dynamics simulations to computationally map the conformational and energetic landscape of the wild-type GPX4 protein and three mutant variants to identify the allosteric networks of the enzyme. We present the conformational dynamic profile providing the desired signature behavior of the enzyme. We also discuss the implications of these findings for drug design efforts.
A protein's allosteric behavior
is characterized by the modulation
of its binding affinity to its ligand as the result of its interaction
with a third molecule known as the allosteric effector[1−4] or perturbations to the protein introduced by a point mutation.
Allosteric regulation pervades biology as one of the most prominent
and significant regulatory mechanisms,[5−7] playing an essential
role in the regulation of such diverse functions as transcription
activation,[8−10] DNA repair,[11−14] metabolism,[15−19] signal transduction with G-protein coupled receptors,[20−24] neurological function,[25−28] protein kinase regulation,[29−33] regulatory domains of proteins,[34,35] and many more. An allosteric effector may increase or decrease binding
affinity for the ligand, making it an allosteric activator or repressor,
respectively.[36,37]The considerable distance
between the allosteric effector and the
protein active sites raised the question of how the allosteric signal
traverses the protein to affect the active site,[5,34,38,39] and several
ideas have been put forth.[5,8,21] Many early thoughts on the problem involved a domino effect pathway-like
approach in which one residue interacts with the next across a molecule.
While this simplistic approach is a helpful zeroth-order model, experimental
evidence quickly became too difficult to interpret. A comprehensive
description of the allosteric effect may call for characterization
of the dynamic interchange of substates.[2,4,40] Such a view has proven difficult to observe in the
experiment because capturing it in action requires a time resolved
description at the residue level. Ideas of multiple pathways, contributions
from evolution, and many other theories have been explored. Cooper
and Dryden brought forth the concept of allostery through energetic
pathways on a theoretical basis.[41] However,
analysis dominated by examining differences in structures has primarily
been unsuitable for assessing the validity of these ideas.With
the observation of allosteric effects without significant
conformational changes,[42] the idea has
become increasingly revitalized. However, asserting a concrete test
to determine whether the pathway is correct or not has proven difficult
in the absence of a clear observable to measure the allosteric signal
propagation. Recently, research efforts have been focusing on combining
the power of machine learning and data from molecular dynamics (MD)
simulations as a method of studying allosteric effects.[43−45] In addition, network theory is also used to identify the hidden
patterns in complex biological systems.[46] Given that the approaches mentioned above provide valuable insights
elusive to experiments into the allostery phenomenon, our work aims
to describe the dynamics of allostery using network theory and mathematical
modeling based on MD simulations.We selected the glutathione
peroxidase 4 (GPX4) system for our
inquiries into allostery. It is an enzyme encoded by the human GPX4
gene, and it is known to reduce lipid hydroperoxides in lipid membranes,
effectively inhibiting iron-dependent cell death or ferroptosis.[47−49] It (GPX4) serves as an antioxidant enzyme that reduces the hydroperoxide
species, restoring the membrane’s integrity. We study the allosteric
effect in the GPX4 protein as it illustrates the phenomenon of modulated
binding affinity by point mutations. It also serves as a timely target
from a human health perspective. Two single-point mutations at residue
D21 (GPX4 A) and residue D23 (GPX4 B) as well as a double mutation
at both residues (GPX4AB) are all known to activate the GPX4 activity.
Specifically, GPX4 A and AB both significantly enhance the protein
activity, whereas GPX4 B shows less significant activation.[47] The mutation site residues (green sticks) and
active site residues (pink sticks) are shown in Figure (PDB ID: 2OBI).[50]
Figure 1
GPX4 protein
system: GPX4 structure and engineered mutants used
in this study. The wild-type structure (PDB ID: 2OBI) illustrates critical
features of the enzyme. The catalytic triad residues collocate three
loops to form the active site (pink sticks). Residues D21 and D23
mark the locations of the allosteric active mutations (green sticks).
GPX4 protein
system: GPX4 structure and engineered mutants used
in this study. The wild-type structure (PDB ID: 2OBI) illustrates critical
features of the enzyme. The catalytic triad residues collocate three
loops to form the active site (pink sticks). Residues D21 and D23
mark the locations of the allosteric active mutations (green sticks).In this study, we used MD simulations to map the
conformational
and energetic landscape of the wild-type protein and three mutant
variants to identify the allosteric network of the enzyme. We first
attempted to explain the hyperactivity of GPX4 mutant variants through
changes in their conformations. In addition to root-mean-square deviation
(RMSD) and root-mean-square fluctuation (RMSF) analysis, we leveraged
conformational dynamics analysis from MD Markov state models (MD-MSMs)[51] when treating the point mutations as allosteric
activators. Our research indicated structural differences of a single
mutant (GPX4 A) but did not provide sufficient understanding of the
particularly strong activating effect of the double mutant (GPX4 AB).
We then adopted the energetic approach by comparing the respective
energetic networks of various GPX4 mutants. Mapping the network analysis
onto the molecule structures, we successfully theorized the dynamic
profile associated with hyperactivity of GPX4 that provided the desired
signature behavior of the enzyme. Overall, our research demonstrated
an exciting possibility that allosteric effects can be considered
a superposition of conformational and energetic system changes. We
discuss how our insights may be used to develop molecular activators
for GPX4 and the prospects for a broader application of this approach
for developing therapeutic activators.
Results
RMSD Results
To ensure the convergence of the systems
of the simulation, we computed the root-mean square deviation (RMSD)
of the backbone atoms of the four constructs concerning their energy-minimized
and equilibrated starting structure. We monitored the RMSD as a function
of time over the simulation to assess the global stability of the
trajectory. An unmodified crystal starting structure will typically
deviate about 2 Å from the crystal, and values slightly higher
can be expected for engineered systems. We saw that our simulations
exhibit an expected amount of dynamic fluctuation on the global scale,
with most conformations within about 1.0 Å of their reference
when converging over 1 microsecond, as seen in Figure .
Figure 2
RMSD plot: this plot shows the RMSD deviation,
a measure of the
average distance between atoms in a molecular structure compared to
the energy minimized structure. A snapshot of the protein is taken
every 200 ps. This is a 10-point moving average of the RMSD values
taken of the backbone C, N, and O atoms in the 160-amino acid system.
RMSD values show the similarity between the structures across the
course of the simulation.
RMSD plot: this plot shows the RMSD deviation,
a measure of the
average distance between atoms in a molecular structure compared to
the energy minimized structure. A snapshot of the protein is taken
every 200 ps. This is a 10-point moving average of the RMSD values
taken of the backbone C, N, and O atoms in the 160-amino acid system.
RMSD values show the similarity between the structures across the
course of the simulation.It is worth mentioning that the RMSD plot from Figure represents only
the 5th to
the 165th residues of the constructs. This decision was made to remove
the tail effect of the first four residues, which caused irregular
RMSD behavior and thus affected convergence analysis. In summary,
all four systems converged well over the course of the simulation.
With the stability of the simulations and apparent convergence of
global dynamic properties, we turned our attention to sequence-specific
analysis and the detailed characterization of the differences in their
dynamic behavior.
RMSF Results
To assess the differences in dynamics
by residue, we computed the RMSF of the alpha carbons with the average
structure of each simulation as their reference coordinates (Figure ). GPX4 A experienced
larger variations at the residue level with respect to the wild-type
structure. Its RMSF plot diverged from the RMSF plot of the wild-type
mutant (GPX4 WT) at multiple residues. This indicates structural differences
between GPX4 A and GPX4 WT.
Figure 3
RMSF plot: this plot shows the RMSF of the alpha
carbons with the
average structure of each simulation as their reference coordinates.
RMSF values show the similarity between the systems across the course
of the simulation on the residue level.
RMSF plot: this plot shows the RMSF of the alpha
carbons with the
average structure of each simulation as their reference coordinates.
RMSF values show the similarity between the systems across the course
of the simulation on the residue level.Another equally important observation is that the
residue mobility
for GPX4 B and GPX4 AB, unlike GPX4 A, very closely followed that
of GPX4 WT outside the regions of mutation site residues. This lack
of structural difference among the three constructs promoted our further
investigation into the energetic characteristics of the GPX4 mutations.
MD-MSM Analysis
We applied Molecular Dynamics Markov
State Model analysis (MD-MSM)[51,52] to our trajectories
to gain more insights into overall dynamic differences. The trajectories
of the MD simulations were stripped of ions, water molecules, and
side chains to only include the protein backbone atoms. These trajectories
were then concatenated into a single long trajectory clustered using
a k-means method based on pairwise RMSD values. The categorization
of simulation frames into discrete probabilistic states indicates
the protein’s innate predisposition to adopt specific conformations
in the presence or absence of the mutation.Since the number
of clusters was unknown beforehand, the clustering procedure was carried
out for two to five clusters. With the increased number of centroids,
the distance between snapshots and their respective centroid decreased,
as is expected with any parameter fitting calculation. An optimal
number of centroids, corresponding to states in the MSM, reduces the
error function without overfitting. We chose K =
3 for the clustering, confirming the RMSD frequency to centroids distributions.Because we clustered protein structures from all trajectories simultaneously
rather than independently for each trajectory, identified substates
were guaranteed to be identical across trajectories. We then assessed
the populations in each cluster as the states in the MSM, and the
transitions were obtained from the time evolution of the cluster
visited by each trajectory. The populations in each substate and the
transitions can post priori be associated with the trajectory from
which they originated. The MD-MSM provided information about the composition
of each cluster by the protein system and frequency of transitions
from one substate to the next. To compare the populations in each
conformation substate, we showed the frequencies in clusters normalized
by simulation as shown in Figure .
Figure 4
MD-MSM cluster of substates: this histogram shows the
contributions
of the four GPX4 constructs in each of the three clusters of the MD-MSM.
Each of the three clusters (denoted 1–3) is composed of different
percentages from each of the four systems.
MD-MSM cluster of substates: this histogram shows the
contributions
of the four GPX4 constructs in each of the three clusters of the MD-MSM.
Each of the three clusters (denoted 1–3) is composed of different
percentages from each of the four systems.In Figure , we
observed that 30.1% of the GPX4 A mutants populated a single substate
exclusively in cluster 3, indicating the presence of a distinct conformational
state of GPX4 A during the process of simulation. GPX4 WT and GPX4
AB structures populated clusters 1 and 2 with roughly equal proportions.
About 54.4% of frames of the GPX WT and 53.1% of frames of the GPX4
AB mutant populated cluster 1, and 45.6% of frames of the GPX WT mutant
and 46.8% of the frames of the GPX4 AB mutant inhabited cluster 2.
The GPX4 B mutant had an inverse distribution compared to GPX4 AB
and GPX4 WT, with 42.6% of the frames in cluster 1 and 56.4% in cluster
2. A small portion of the frames (1.1%) fell into cluster 3. Therefore,
GPX4 B had similar but also different conformational states than those
of GPX4 AB and GPX4 WT. This hints at a potential explanation for
GPX4 B’s marginal effect on the protein activity compared to
GPX4 AB.A more careful look at the clustering of GPX4 A revealed
that once
GPX4 A adopted a certain conformation in a cluster, it tended to preserve
the conformation, as seen from a very high frequency (>70%) of
self-transitions
(Figure ). In cluster
3, GPX4 A had a very high self-transition frequency of 96.9%, indicating
that it was a very energetically stable structure.
Figure 5
State transition frequency
of GPX4 A: this histogram shows the
transition frequency of GPX4 A among three different clusters. GPX4
A generally tended to a state within the same conformational state.
For cluster 3, GPX4 A had a very high self-transition frequency of
96.9%.
State transition frequency
of GPX4 A: this histogram shows the
transition frequency of GPX4 A among three different clusters. GPX4
A generally tended to a state within the same conformational state.
For cluster 3, GPX4 A had a very high self-transition frequency of
96.9%.
Energetic Network Analysis
We further investigated
the energetic differences of the GPX4 systems by comparing the energetic
networks constructed with MD simulation data. Energetic networks are
created with nodes representing the molecules’ residues and
edges representing the forces between the pair of residues. Specifically,
the edges between each pair of residues are generated as the sum of
all forces between pairs of atoms in each residue. Overall, pairwise
energetic networks capture the energy profile of the molecule over
the course of the simulation, with edge weights representing how much
two residues energetically interact with each other.We built
separate energetic networks using both the electrostatic forces and
Van der Waals forces among the residues. However, the results from
Van der Waals energetic networks among the structures did not produce
noticeable differences among the structures. We hypothesized that
this is due to the short-range nature of such forces given the context
of the long-range allosteric effect. Therefore, we focused on comparing
the electrostatic energetic network in our study.The energetic
networks were visualized using Gephi software.[52] Since the network was constructed by considering
every residue with respect to every other residue, it was a fully
connected network. Therefore, to focus on energetically important
edges, we kept only 3–5% of the total edges based on their
weights. These threshold values were chosen because below 3%, the
networks became mostly disconnected. In contrast, beyond 5%, the networks
became too densely connected. In both cases, the network was either
too disconnected or overconnected for us to record meaningful observations.
Shortest Electrostatic Path Analysis
The shortest electrostatic
path (SEP) is a novel approach our lab proposed to examine allosteric
effects using an energetic network. We use SEP(i,j) to denote the shortest path from residue i to another residue j on the binarized version of
the network.In this project, we focused on the SEP starting
from a mutation site and ending at an active site residue. We hypothesized
that the SEPs could model how electrostatic signals propagate from
the mutation sites (allosteric sites) to the active sites with minimum
signal loss. Following this conjecture, shorter SEPs potentially indicate
a more robust electrostatic signaling mechanism between the source
and destination residues.Figures and 7 show SEPs from
residue 21 to all three active sites
of the protein molecule with 3.5% of total edges kept in the network.
As seen from the figures, SEP(21, 46) and SEP(21, 81) were both shorter
in GPX4 AB, and SEPs(21, 136) were identical in both GPX4 AB and GPX4
WT. Similar observations persisted among all the SEPs we compared.
We observed that all the SEPs of GPX4 AB were shorter than or equal
to the ones identified in GPX4 WT. In total, GPX4 AB had the same
SEPs as GPX4 WT roughly 70% of the time. In the other 30% of the comparisons,
GPX4 AB had shorter SEPs than GPX4 WT (Figure ).
Figure 6
SEPs from D21 of the GPX4 WT network: The SEP
between D21 (yellow)
and the three active site residues (green) are [21-20-22-86-84-83-81], [21-20-22-86-84-82-45-46], [21-102-72-40-73-41-139-136] respectively. This network shows the top 3.5% of most essential
edges.
Figure 7
SEPs from D21A of the GPX4 AB network: the SEPs between
D21A (yellow)
and the three active site residues (green) are [21-104-106-75-82-45-46]/[21-104-106-75-81]/[21-102-72-40-73-41-139-136], respectively. This network shows the top 3.5% of most essential
edges. Note that the first two SEPs identified in the GPX4 AB network
were shorter than the corresponding SEPs from GPX4 WT, and the third
SEP in the GPX4 AB network had the same length as that in the GPC4
WT network.
Figure 8
SEP comparisons between GPXWT and GPX4 AB: in all cases,
the SEP
is either shorter for GPX AB or the same length for both GPX4 AB and
GPX4 WT. We did not observe any case when GPX4 WT had a shorter SEP
than that of GPX4 AB.
SEPs from D21 of the GPX4 WT network: The SEP
between D21 (yellow)
and the three active site residues (green) are [21-20-22-86-84-83-81], [21-20-22-86-84-82-45-46], [21-102-72-40-73-41-139-136] respectively. This network shows the top 3.5% of most essential
edges.SEPs from D21A of the GPX4 AB network: the SEPs between
D21A (yellow)
and the three active site residues (green) are [21-104-106-75-82-45-46]/[21-104-106-75-81]/[21-102-72-40-73-41-139-136], respectively. This network shows the top 3.5% of most essential
edges. Note that the first two SEPs identified in the GPX4 AB network
were shorter than the corresponding SEPs from GPX4 WT, and the third
SEP in the GPX4 AB network had the same length as that in the GPC4
WT network.SEP comparisons between GPXWT and GPX4 AB: in all cases,
the SEP
is either shorter for GPX AB or the same length for both GPX4 AB and
GPX4 WT. We did not observe any case when GPX4 WT had a shorter SEP
than that of GPX4 AB.We also observed that the SEPs of GPX4 AB tended
to have more variations
than those of GPX4 WT. Figures and 10 show SEPs from residue 23 to
all three active sites of the protein molecule with 3.5% of total
edges kept in the network. For SEPs starting from residue 23, three
distinct paths were identified in GPX4 AB, whereas only two distinct
paths could be mapped in GPX WT.
Figure 9
SEPs from D23 of the GPX4 WT network:
the SEP between D21 (yellow)
and the three active site residues (green). Two distinct paths can
be found. SEP(23, 46) and SEP(23, 81) have significant parts of their
pathways overlapped with each other.
Figure 10
SEPs from D23A of the GPX4 AB network: the SEP between
D21 (yellow)
and the three active site residues (green). Three, instead of two,
distinct paths can be found.
SEPs from D23 of the GPX4 WT network:
the SEP between D21 (yellow)
and the three active site residues (green). Two distinct paths can
be found. SEP(23, 46) and SEP(23, 81) have significant parts of their
pathways overlapped with each other.SEPs from D23A of the GPX4 AB network: the SEP between
D21 (yellow)
and the three active site residues (green). Three, instead of two,
distinct paths can be found.To verify the structural implications of the network
analysis,
we mapped all the residues participating in the SEPs back onto the
molecules using PyMol.[53] From Figures and 12 below, we observed that in GPX4 AB, the residues
in the SEPs form three main conduction pathways for electrostatic
signals to propagate from mutation sites to the active sites. However,
in GPX4 WT, we only identified two such conduction pathways.
Figure 11
Conduction
pathways of GPX4 WT: this graph shows the surface representation
of residues participating in the SEPs. In total, the residues identified
form two distinct conduction pathways from D21 and D23 to the active
sites of the protein molecule.
Figure 12
Conduction pathways of GPX4 AB and GPX4 WT: this graph
shows the
surface representation of residues participating in the SEPs. In total,
the residues identified form distinct three conduction pathways from
D21A and D23A to the active sites of the protein molecule.
Conduction
pathways of GPX4 WT: this graph shows the surface representation
of residues participating in the SEPs. In total, the residues identified
form two distinct conduction pathways from D21 and D23 to the active
sites of the protein molecule.Conduction pathways of GPX4 AB and GPX4 WT: this graph
shows the
surface representation of residues participating in the SEPs. In total,
the residues identified form distinct three conduction pathways from
D21A and D23A to the active sites of the protein molecule.The two conduction pathways on the left and right
in both GPX4
AB and GPX WT correspond to almost the same set of residues, which
reflects the 70% of the SEPs of equal length in both systems shown
in Figure . However,
the middle conduction pathway is unique to GPX4 AB, and it is mapped
to the 30% of the SEPs found shorter in GPX4 AB. These residues also
only participate in the GPX4 AB network’s SEPs. From the energetic
point of view, we hypothesize that GPX4 AB has more conduction pathways
with a wider variety of residues involved in its propagation networks.
Discussion
Structural Analysis
The RMSDs were calculated for all
the frames in the simulations against their respective energy minimized
structure where all four systems converged at roughly 1 Å (Figure ). Compared to the
RMSD plot of the wild-type GPX4 (GPX4 WT), GPX4 A experienced the
most significant variation, implying specific structural differences
in its molecule structures. On the other hand, GPX4 AB consistently
showed higher RMSD values, suggesting higher dispersion of its structure
during the simulation. This foreshadows our further investigation
of the energetic differences between GPX4 AB and GPX4 WT.The
RMSFs represent the positional differences by residues between the
structures over time. It calculates the square root of the sum of
the Euclidean distance between each residue in the current snapshot
with respect to a reference state taken as the energy-minimized conformation.
From the RMSF plot of the four systems, we can see the differences
in residue flexibility between GPX4 and the other three GPX4 systems
(Figure ). This suggests
that GPX4 A is structurally different from GPX WT and further supports
the possibility that GPX4 A has a better conformational state, which
corresponds to higher protein activity. However, it must be noted
that GPX4 B, GPX4 AB, and GPX4 WT all have the RMSF plots closely
following each other, indicating a lack of significant structural
differences.Another strategy adopted to examine the structural
differentiation
between GPX4 systems was MD-MSM analysis. It studies the shift in
frequency of the conformational substates of the four GPX4 systems
throughout the simulation. A three-cluster model was chosen, with
each snapshot of the simulation categorized by its likeness to one
of the three centroids. By comparing the snapshot composition of each
cluster to empirical activation studies determined in the previous
mutagenesis experiment, we characterized cluster 3 as an activated
conformational state for GPX4 A. We interpreted that the cluster,
which contained about 30% of the total frames of GPX4 A simulation,
corresponded to the binding competent conformation of GPX4 A. In addition,
the low probability of state transitions from cluster 3 to other clusters
and high self-transitioning rate lend further support to the argument
that the conformation in cluster 3 was energetically favorable. This
analysis shows that the MD-MSM can capture the effect of these “allosteric
mutations” on the conformational landscape of GPX4. The self-transition
probabilities provide thermodynamic information on the stability of
each of the substates. In contrast, the substate transitions allow
for the relative comparisons of the activation barriers across these
states. Overall, the structural analysis on the mutant systems suggests
that GPX4 A increased the protein activity by adopting a more favorable
conformation that tends to be preserved.Given that changes
in mutated enzyme activity reported from the
experiments were relatively small (e.g., from 100 to 126% for GPX4
A), we also considered the level of resolution of the MD-MSMs approach.
In our pilot study of PDZ,[51] we enjoyed
good quantitative agreement with NMR data, suggesting a level of resolution
appropriate for the applications herein. Our ability to achieve qualitative
agreement with the experimental observations validates qualitative
interpretation, which allowed us to gain key structural insight into
the constructs and conformations for the informed generation of experimentally
testable hypotheses. We expect this to be valuable in the iterative
process of drug development toward the goal of modulating the GPX4
activity as well.However, little evidence emerged to construct
a coherent argument
for the empirical results that show that GPX4 AB had the most substantial
activating effect. As we can observe from the similar RMSF plot for
both GPX4 AB and GPX4 WT and the almost identical clustering of these
two systems in the MD-MSM analysis, GPX4 AB had a very similar conformation
with that of the GPX4 WT. To explain the activation of protein without
noticeable structural changes, we turned to the energetic network
analysis of the systems.Our research tried to identify
the differences between mutant AB and the wild-type structure at the
energetic level through energetic network analysis. Energetic networks
are a relevant new concept connecting biological science with computational
science. It has recently been more widely applied to study specific
problems of biological interest due to its ability to represent the
dynamics of the molecule structures.[46] We
constructed the energetics networks of the GPX4 systems to reflect
the electrostatic forces of interactions between residues. Given our
interest in allosteric signaling, such networks are beneficial as
they potentially highlight the conduction pathways on which allosteric
signals can propagate.The most important observation from the
electrostatic networks analysis is the improved SEPs in the mutant
AB compared to those in the wild-type structure. Holding other variables
constant, about one-third of the SEPs we examined were shorter in
mutant AB than in the wild-type structure. It is important to reiterate
that shorter SEPs indicate faster conduction and less loss of electrostatic
signals.Since we normalized each edge based on its degree,
we had what
is known as a transition matrix T such that T(i,j) gives the probability
that a random walker starting at node i will transition
to node j. Therefore, after thresholding the graph,
we only kept the most statistically likely transitions between nodes i and j in a random walk process. Since
a random walk process is related to the vibrational dynamics of a
network,[54] we believe that the shorter
paths could potentially correspond to an improved ability for a perturbation
from the mutation site residues to affect the binding site residue
activity.Bearing this in mind, our results provide insights
into the seemingly
contradictory results between lack of structural differences and empirical
evidence of improved protein activity of mutant AB. Despite the lack
of positional changes in mutant AB, it can potentially enhance the
protein activity by having a better communication channel for the
long-range allosteric interactions that can be detected at the energetic
level.
Structural Implications of Energetic Networks
The observations
from energetic networks can also be corroborated by examining the
structural implications. Without any prior assumption of the structures,
we could nicely map the SEPs onto structurally plausible conduction
pathways for allosteric signals to propagate from the mutation sites
to the active sites of GPX4 protein through the medium of electrostatic
forces of interaction.We only identified two conduction pathways
from residues D21 and D23 to all the three active site residues in
the wild-type structure. In the case of the double-mutant GPX4 AB,
we discovered another communication channel. The left and right channels
in mutant AB were identical to those of the wild-type structures.
However, the middle conduction pathway was unique to mutant AB and
corresponded to the residues participating in one-third of the shorter
SEPs identified for mutant AB. Hence, this supports our hypothesis
that mutant AB improves the protein activity by transmitting electrostatic
signals better, faster, and with less loss of information. It also
suggests that allosteric signaling in molecules can occur in multiple
ways.Overall, the results indicated that the combined effect
of both
positional and energetic pathways contribute to the overall allosteric
development of the proteins. These combined effects could mean that
spatially related allosteric mechanisms can exist alongside energetic
pathways, thus providing redundancy in the signaling. Alternatively,
alternative means could be accentuated or dampened with specific triggers.Exploring these ideas is only just beginning, and the approaches
from this paper suggest a potential pathway forward to gaining deeper
insights into both the coordinate-based and energetics-based allosteric
signaling. This may provide needed insights into understanding how
allosteric therapeutics may operate. In the absence of a theoretical
framework for the operation of allosteric molecules, the design of
allosteric therapeutics has mainly been limited and mostly occurred
through serendipitous discoveries. Having tools to guide the allosteric
design of drugs will open the possibility of designing a new class
of drugs, opening the possibility of treating previously undruggable
targets.
Materials and Methods
Simulation Specifications and Trajectory Analysis
GPX4
starting configurations of the wild-type systems were obtained from
the human GPX4 U41C mutant (PDBID# 2OBI).[50] Three
activated systems were generated from this crystal structure: two
single mutants (D21A and D23A) and one double mutant combining both
mutations, all of which are known to enhance the GPX4 activity.[47] Standard lab procedures were adopted to conduct
the MD simulations. Energy minimization with decreasing constraints
on the protein solute was followed by heating to 300 K, and the temperature
was maintained using the Berendsen algorithm. Using explicit counterions
and the TIP3P water model, an all-atom 1000 ns (1 ms) MD simulation
was executed for each equilibrated GPX4 mutant system in AMBER 16[43,44,55,56] and the CUDA versions of the pmemd routine running on NVIDIA GPUs
were parallelized.[45,57] The parm99SB force field[55] was used for proteins and peptides, while the
TIP3P potential[58] was used for water. To
achieve electroneutrality, minimal salt was added to each simulation
cell to electroneutrality. Stability and convergence of the MD were
monitored by a standard protocol, such as RMSD. The trajectories were
analyzed with AMBER utility CPPTRAJ[59] and
the molecular visualization programs PyMol[53] and visual MD.[60,61] The zinc coordination parameterization
was achieved using the zinc AMBER force field. The simulation system
was treated under particle mesh Ewald periodic boundary conditions
with a 10 Å Lennard-Jones cutoff in a truncated octahedral box.
Na+ counterions were added to the system for electroneutrality, and
SHAKE was applied for hydrogen bond motions.
MD Markov State Models
The MD-MSM study of GPX4 is
a statistically driven illustration of the difference among the GPX4
mutant activities due to changes at the structural level. It is based
on the shift in frequency of the conformational substates of the four
GPX4 systems throughout the simulation. The MD-MSM procedure has been
described in detail. In summary, the trajectories of the MD simulations
were stripped of ions, water molecules, and side chains to only include
the protein backbone atoms. These trajectories were then concatenated
into a single long trajectory clustered using a K-means clustering
method[62] based on pairwise RMSD values
implemented in CPPTRAJ. The categorization of simulation frames into
discrete probabilistic states indicates the protein’s innate
predisposition to adopt specific conformations in the presence or
absence of the mutation. After the k-means clustering, we identified
the substates of the GPX4 mutants and then compared them to analyze
the extent to which the wild type differs structurally from the other
allosterically liganded protein systems with more active protein activities.
MSMs were constructed in terms of the nodes and links of a complex
network, with the nodes obtained by clustering the microstates using
the K-means clustering algorithm. The links were defined from the
Chapman–Kolmogorov-compliant[63] frequency
of direct transitions between nodes. In all calculations, atom-based
quantities obtained from MD were merged to present results by each
residue.
Pairwise Interaction Energy Network
The pairwise interaction
energy network is a method adopted to create representations of protein
systems at the energetic level. Such networks have nodes representing
the residues of the molecules and edges representing the electrostatic/Van
der Waals forces between the pair of residues. Every node in the network
will have the node degree equal to n – 1,
where n denotes the number of residues in the molecule.
The edges between each pair of residues represent the sum of all electrostatic
potentials between pairs of atoms in each residue. We took the absolute
value of each of these interactions. Depending on the strength of
the force computed, different edges will have varying edge weights,
with higher edge weights representing a higher degree of energetic
interactions between residues.Since computing the pairwise
interaction energy for every frame in the trajectory is computationally
intensive, we only added pairwise energies for a selected subset of
frames from each trajectory. For each 1 ms-long trajectory with 5000
frames, we extracted one frame out of every ten frames starting from
the first frame with a regular interval. Collected samples were then
concatenated into a new trajectory that serves as the ’trajectory’s
lightweight representation. We then fed the trajectories through the
energy analysis protocol from CPPTRAJ. We computed the energy for
every 165 residues and chose two combinations of residue pairs by
using a bash script that performed the CPPTRAJ energy command with
each pair of residues. Subsequently, Python scripts were written to
parse the CPPTRAJ outputs into tensors of shape (N, X, D, D) where N was the number of frames in the trajectory, X is the number of types of energy being studied, and D is the residue number. To compute a representative energy network
of the systems, we simply summed over each energy per energy channel.
Finally, the network edges were assigned thresholds by normalizing
the edge weights on a per-node basis and picking all nodes above that
threshold.Using the open-source program Gephi, we created visualizations
of all four networks for electrostatics and van der Waals forces.
We focused on examining the visualization of the electrostatic energy
network due to its long-range nature that is more closely associated
with the allosteric effect happening over a longer distance in the
GPX4 protein system. Since the network created is wholly connected,
we first filtered the edges in the network based on edge weight to
prepare the network for visualization. This step allowed us to only
look at energetically essential edges in the system. To further understand
the importance of residues, we also denoted the nodes with an average
weighted degree. In the images, as shown in Figures and 7, the more prominent
nodes are associated with residues that are more energetically important,
and more opaque edges represent higher weight.In general, the
drawing of a graph will have no relation to the
underlying dynamics of the system as there is no notion of distance
on structured chart data. However, these images were generated using
the ForceAtlas2 layout algorithm in Gephi, in which nodes are represented
as entities repulsing like-charged particles while edges attract their
nodes like springs. Therefore, these images highlight some of the
essential dynamics in allosteric signaling and thus can offer us good
intuition regarding the types of properties on which future experiments
may focus.
Shortest Electrostatic Pathways
The SEP is a novel
approach our lab proposed to examine allosteric effects in the context
of an energetic network. It refers to the shortest path from the mutation
site residues to the active site residues on a given network. We examined
six SEPs in total at every network snapshot, from two mutation site
nodes (allosteric site nodes) to three active site nodes. We selected
network snapshots with 2–5% of the edges kept and compared
the SEPs between mutated and non-mutated system networks. We hypothesized
that the SEPs potentially reflected the most efficient pathways that
electrostatic signals propagate from the mutation sites (allosteric
sites) to the active sites with minimum signal loss. In the context
of the allosteric effect, this would provide clues on how allosteric
products work through electrostatic signaling.
Authors: Abel Garcia-Pino; Sreeram Balasubramanian; Lode Wyns; Ehud Gazit; Henri De Greve; Roy D Magnuson; Daniel Charlier; Nico A J van Nuland; Remy Loris Journal: Cell Date: 2010-07-09 Impact factor: 41.582
Authors: Christopher Pargellis; Liang Tong; Laurie Churchill; Pier F Cirillo; Thomas Gilmore; Anne G Graham; Peter M Grob; Eugene R Hickey; Neil Moss; Susan Pav; John Regan Journal: Nat Struct Biol Date: 2002-04