Giuseppina La Sala1, Sergio Decherchi2,3, Marco De Vivo1,4, Walter Rocchia2. 1. Laboratory of Molecular Modeling and Drug Discovery, Istituto Italiano di Tecnologia, Via Morego 30, 16163 Genova, Italy. 2. CONCEPT Lab, Istituto Italiano di Tecnologia, Via Morego 30, 16163 Genova, Italy. 3. BiKi Technologies s.r.l., via XX Settembre 33, 16121 Genova, Italy. 4. IAS-S/INM-9 Computational Biomedicine Forschungszentrum Jülich, Wilhelm-Johnen-Straße, 52428 Jülich, Germany.
Abstract
The detection and characterization of binding pockets and allosteric communication in proteins is crucial for studying biological regulation and performing drug design. Nowadays, ever-longer molecular dynamics (MD) simulations are routinely used to investigate the spatiotemporal evolution of proteins. Yet, there is no computational tool that can automatically detect all the pockets and potential allosteric communication networks along these extended MD simulations. Here, we use a novel and fully automated algorithm that examines pocket formation, dynamics, and allosteric communication embedded in microsecond-long MD simulations of three pharmaceutically relevant proteins, namely, PNP, A2A, and Abl kinase. This dynamic analysis uses pocket crosstalk, defined as the temporal exchange of atoms between adjacent pockets, along the MD trajectories as a fingerprint of hidden allosteric communication networks. Importantly, this study indicates that dynamic pocket crosstalk analysis provides new mechanistic understandings on allosteric communication networks, enriching the available experimental data. Thus, our results suggest the prospective use of this unprecedented dynamic analysis to characterize transient binding pockets for structure-based drug design.
The detection and characterization of binding pockets and allosteric communication in proteins is crucial for studying biological regulation and performing drug design. Nowadays, ever-longer molecular dynamics (MD) simulations are routinely used to investigate the spatiotemporal evolution of proteins. Yet, there is no computational tool that can automatically detect all the pockets and potential allosteric communication networks along these extended MD simulations. Here, we use a novel and fully automated algorithm that examines pocket formation, dynamics, and allosteric communication embedded in microsecond-long MD simulations of three pharmaceutically relevant proteins, namely, PNP, A2A, and Abl kinase. This dynamic analysis uses pocket crosstalk, defined as the temporal exchange of atoms between adjacent pockets, along the MD trajectories as a fingerprint of hidden allosteric communication networks. Importantly, this study indicates that dynamic pocket crosstalk analysis provides new mechanistic understandings on allosteric communication networks, enriching the available experimental data. Thus, our results suggest the prospective use of this unprecedented dynamic analysis to characterize transient binding pockets for structure-based drug design.
Binding pockets are
often crucial for modulating the function of
biomolecules, such as those in protein enzymes and ion channels. For
example, small molecule drugs exert their beneficial action by binding
to a functional pocket of the protein target(s).[1] Detecting and characterizing these functional binding pockets
is therefore of paramount importance for biochemistry and drug discovery.[2] In this regard, molecular dynamics (MD) is a
useful tool for studying the appearance, evolution, and structural
modifications of binding pockets in large biomolecules, along trajectories
of hundreds of nanoseconds to even a few milliseconds.[3,4] MD can also describe the plasticity of those superficial and shallow
transient cavities,[5] which are often involved
in protein function because they interact with a small substrate or
another partner protein.[6,7]As MD trajectories
of large structural ensembles increase in length,
they create massive data files. These files can be hundreds of gigabytes
in size and are expected to reach tens of terabytes in the near future.[8] Therefore, there is a major need for algorithms
that can automatically extract the embedded information from these
massive data sets and produce intelligible reports on the spatiotemporal
evolution of the targeted protein, including its potentially druggable
binding pockets.[9]There are already
a number of algorithms that can detect protein
binding pockets in static structures.[10,11] Some of these
rely on the Voronoi diagrams[12] (e.g., MolAxis,[13] MOLE[14]), grids[15] (e.g., POCKET,[16] PocketFinder,[17] POVME[18,19]), and molecular surfaces
and probes (e.g., HOLE,[20] SURFNET[21]). Other algorithms analyze ensembles of structures,
but usually require a preliminary structural alignment (e.g., MDpocket,[22] PocketAnalizerPCA,[23] Epock, Trj_cavity,[24] and TRAPP[25]). In this case, the resulting information may
depend on the specific reference structure used for the alignment.
Atom-based algorithms (e.g., PROVAR[26] and
EPOSBP[27]) avoid the alignment
step. Nevertheless, most of these methods are limited to analyzing
a priori defined pocket(s) of interest only. Another key aspect is
that pockets can sometime take part in protein allosteric signaling.[2,28] Indeed, a number of theoretical approaches already exist to investigate
allosteric signaling, such as bioinformatics methods that rely on
the analysis of protein sequences under the assumption that evolutionarily
conserved residues are likely to have a functional role.[29] Vibrational motions of proteins examined through
normal-mode analysis (NMA) can also offer insights into potential
allosteric mechanisms. In this case, low frequency modes define functionally
relevant movements often triggered by the binding of an allosteric
effector.[30,31] Alternatively, allosteric signaling is often
investigated through protein conformational ensembles generated via
molecular dynamics (MD). These conformational ensembles are mapped
into a graph-based representation, which is composed of interconnected
nodes. The degree of a node’s interdependence, which reflects
correlation of motions of distant allosteric parts of the protein,
can be calculated, for example, via a mutual information analysis[32,33] or using the analysis of atomic positional fluctuations.[34−39]Here, we present an original algorithm for efficiently analyzing
extended MD trajectories. Differently from all previous methods, this
algorithm detects the formation and spatiotemporal evolution of all
the protein pockets. In addition, it monitors pocket crosstalk, defined
as the temporal exchange of atoms between adjacent pockets, which
we propose as a means to identify allosteric signaling (see Theory). In particular, our algorithm automatically
executes (a) an alignment-independent identification of all the pockets
on the protein surface; (b) a quantification and visualization of
the volume and surface area of all the pockets found in the protein,
for each structural frame of the MD trajectory; (c) a report of crosstalk
between pockets, described as merging and splitting events; and (d)
a detection of allosteric signal transmission networks across the
protein surface, defined as interconnected pocket motions (see Theory).We have applied this new algorithm
to long MD simulations of a
few selected pharmaceutically relevant targets: (i) the purine nucleoside
phosphorylase (PNP) enzyme, (ii) the adenosinic receptor (A2A) in POPC membrane, and (iii) the Abelson (Abl) kinase. The algorithm
produced a detailed analysis of the structure–dynamics–function
relationships, which it automatically extracted from these MD simulations.
Our results demonstrate the power of this dynamical analysis, particularly
as a prospective tool for characterizing binding pockets in structure-based
drug design.
Theory
Outline of the Method
The algorithm
identifies pockets
and tracks their evolution over time along an MD trajectory. NanoShaper
0.7 (freely available at www.electrostaticszone.eu),[40] is
used as a preliminary utility to detect the pockets on the protein
surface on individual frames, with particular attention given to those
that could be druggable sites. In a single frame of a protein of ∼2000
atoms, the execution time for detecting the binding pockets is in
the range of 2–6 seconds on a standard workstation. The main
part of the method is then dedicated to characterizing the pocket
dynamics along the MD trajectory through the assignment of a unique
and dynamics-consistent identifier for every pocket. The spatiotemporal
evolution of each pocket is thus monitored over the entire MD trajectory.
Both algorithms are described in the following sections. This method
is called Pocketron and is a module implemented in the BiKi Life Sciences
software suite (www.bikitechnologies.com).
Static Pocket Detection
The static pocket detection
algorithm is based on the concept of the solvent excluded surface
(SES),[41] or Connolly–Richards surface,[42] which is defined as the surface obtained by
rolling a spherical probe over the van der Waals surface of the molecular
system. The analytical computation of the SES is done via the Alpha
Shapes theory, as described by Decherchi and Rocchia.[40] Then, pockets are identified by calculating the volumetric
difference between the regions enclosed by the SESs, obtained with
two different probe radii (Supporting Figure 4). The smaller rolling spherical probe has a radius of 1.4 Å,
which corresponds to the spherical approximation of a water molecule.
Conversely, the larger rolling spherical probe has a default radius
of 3 Å. The size values can be modified at will. However, we
found that these specific values, together with a subsequent filter
selecting only pockets with a volume of at least 34.5 Å3 (∼3 water molecules), provide a reasonable identification
of potential binding sites. This information can be used to analyze
the entire protein surface, list the identified pockets, and store
their calculated volume and the list of contributing atoms.
Dynamical
Pocket-Tracking Algorithm and Pocket Crosstalk Detection
The algorithm tracks the atoms forming each pocket along an MD
trajectory. These data constitute the fundamental basis for how the
algorithm monitors the pocket dynamics. The algorithm monitors all
the atom-based events occurring along an MD trajectory, without requiring
any structural alignment or prior knowledge of the region to be searched.
It thus tracks the exchange of atoms between adjacent pockets, which
is our indicator of “pocket crosstalk”. With “exchange
of atoms”, we refer to the fact that the same atom may belong
to different pockets at different times (i.e., frames) during the
simulation. This analysis considers “merging” and “splitting”
events to be significant. These events are calculated via simple operations
on the atom sets that form each pocket. More specifically, a “merge”
event occurs when atoms belonging to the same pocket in the current
frame have belonged to separate pockets in a previous frame. Similarly,
a “split” event occurs when atoms of a single pocket
divide into two or more distinct pockets. To guarantee consistency
in pocket tracking, the algorithm initially stores the list of pockets
found in the first frame, as defined by their constituent atoms, and
assigns a unique pocket identifier (pID) to each stored pocket. Then,
for every additional frame, the new set of pockets is computed and
compared with the stored ones. When there is a mismatch with respect
to the stored pockets, the current pocket is added as a new entry
in the stored pocket’s list and assigned a new pID. The matching
algorithm relies on the Jaccard index to measure the degree of similarity
between two pockets. For example, let A and B be the sets of atoms belonging to two pockets. Then, the
Jaccard index between them is defined aswhere the modulus symbol indicates the cardinality
of the atom sets. If A and B are
identical (i.e., have the same constituent atoms), then the index
reaches its maximum value of 1. If only a fraction of the atoms are
shared between A and B, then the
index provides the number of matching atoms (numerator) divided by
the cardinality of the union of the atom sets. If there are no matching
atoms, the index is null.Pocket identification is therefore
performed with the similarity matrix M. Each entry M(i,k) is the Jaccard index
between the atoms of the ith stored pocket and those
of the kth pocket (detected in the current frame).
For each frame, each pocket is compared to all of the stored ones.
If M(i,k) = 0 ∀i, then the kth pocket is considered to
be a new pocket and is stored. Otherwise, the kth
pocket is connected to the stored entry that maximizes the Jaccard
index. In mathematical terms, p is the ith stored pocket and (t) is the kth detected pocket on the frame at time t; therefore,
the M matrix is the following:where K is the total number
of detected pockets in the current frame and N is
the total number of stored pockets so far. As already mentioned, the
pocket ID assignment is executed by finding the maximum entry for
every column of M:where the index w is
corresponding to the stored pocket to which k is
assigned. If M(w,k) = 0, a new entry is
created and stored in the pocket list with its own new pID.Similarly, to detect merging and splitting events, the algorithm
builds the matrix F, so that F(k,j) is the Jaccard index of the kth pocket at instant t and the jth pocket found in the previous frame at instant t – Δt. Then, the F matrix
iswhere J is now the total
number of pockets at the t – Δt instant (i.e., the previous frame presented to the algorithm).
Traversing this matrix allows the easy detection of merge and split
events. Indeed, if we let “NZ” be the operator
that returns all the nonzero entries of a row/column vector, and let
“(i,:) ” and “(:,i)” be the operators that return all the elements of the ith row/column, respectively, then we havewhere Merge((t)) is the set
containing
all the indexes of the pockets detected at the time (t – Δt) that shared some atoms with
pocket k. If this set is empty, then no merge event
involved (t). Similarly, Split((t – Δt)) is the set containing all the indexes of the pockets
detected at the at time t that shared some atoms
with the pocket labeled as j at time (t – Δt). If this set is empty, then
no split event involved (t – Δt). On a technical note, the entries of the F matrix involve only those pockets detected at the t and t – Δt instants.
However, once the merging and splitting events have been detected,
the M matrix is once again used to connect these pockets
to the stored ones and to make the tracking consistent, as described
above.
Pocket Crosstalk Analysis and Allosteric Signaling
With the above-mentioned ability to track pockets over time, the
volume and surface area of all pockets are stored for each frame.
Thus, the algorithm generates the full history of the volume, area,
splitting, and merging events of each pocket, along the MD trajectory.
It also generates the list of contributing atoms and residues for
all the unique pockets detected during the MD run. This allows the
estimation of the probability (estimated as a relative frequency)
that each residue belongs to a given pocket. Moreover, for each pocket,
all the merging and splitting events with other pockets are stored.
Indeed, one of the final results of the tracking process is a square
matrix called N, whose (i,j)th entry represents the merging probability
between two pockets, expressed as the number of times they merge over
the total number of frames. Similarly, the square matrix N expresses the corresponding statistics
for the splitting events. Empirically, we found that the corresponding
entries of N and N very often coincide in magnitude, indicating
that the splitting and merging events often occur at the same time.
For this reason, we define a matrix of aggregate statistics N, which contains the maximum value of the corresponding entries
in N and N (Figure ). The nature of the information stored in N calls
for effective graphical methods to represent the pocket dynamics,
particularly since N is usually very sparse. The algorithm
translates the merging and splitting frequency matrix N into a 3D network graph, where the nodes represent pockets, and
the edges indicate communication between two pockets or, in other
words, how often two pockets exchange atoms. In detail, the position
of each node is the geometric center of the atoms that form the corresponding
pocket. The color of the node indicates its persistency over the simulation
time. The red color indicates high persistency whereas blue indicates
low persistency. The thickness of the edge is directly proportional
to the frequency of the merging and splitting events between the two
connected pockets, according to the corresponding entry in the N matrix. The dimension of the sphere represents the pocket
volume (Figure and Figure ). Together with
the merging and splitting of adjacent pockets, the N matrix
returns indirectly also the long-range crosstalk network connecting
distant pockets. That is, a crosstalk network can connect distant
pockets through a sequence of neighboring pockets that exchange their
atoms during the MD run. In this way, a 3D representation of the crosstalk
network can be observed, which can be used to identify putative allosteric
signaling network.
Figure 1
(A) Representation of the merging and splitting matrix F, calculated using all the detected pockets. The matrix F allows retrieving information on merging and splitting events.
For
each frame along the MD trajectory, pockets at time t are compared with pockets at time t – Δt, using the Jaccard index. In this example, at time t, the pockets 1, 2, 3, and 4 (in rows) have been detected
and stored. At this point, the Jaccard index is computed with all
pockets detected in the previous frame at times t – Δt, i.e., with pockets 1, 2, and
3 (in columns). Moving from t – Δt to t, this example shows that pocket
1 split into two pockets, forming the new pocket 4. Concomitantly,
pockets 2 and 3 merged, forming a larger pocket that is still identified
as pocket 3, according to its Jaccard index. (B) Schematic example
of the conversion of the aggregate merging/splitting statistics N into an undirected network graph. In the matrix N, the off-diagonal red numbers indicate the frequency of the merging
and splitting events, which is then reflected by the size of the edge
connecting two pockets.
Figure 6
Networks of the most
persistent pockets found in the KDin, T315I-KDin, Myr/KDin, and Myr/T315I-KDin trajectories.
Each pocket (i.e., network node) is represented
as a sphere, with the different colors indicating the pocket’s
persistency. The pockets are connected via black lines (i.e., network
edges). The width of each edge is proportional to the communication
frequency. The networks connect the ATP and the myristate binding
sites in all systems except T315I-KDin. We performed our
analysis considering only pockets having a persistency of at least
20% and above, along the simulation time.
(A) Representation of the merging and splitting matrix F, calculated using all the detected pockets. The matrix F allows retrieving information on merging and splitting events.
For
each frame along the MD trajectory, pockets at time t are compared with pockets at time t – Δt, using the Jaccard index. In this example, at time t, the pockets 1, 2, 3, and 4 (in rows) have been detected
and stored. At this point, the Jaccard index is computed with all
pockets detected in the previous frame at times t – Δt, i.e., with pockets 1, 2, and
3 (in columns). Moving from t – Δt to t, this example shows that pocket
1 split into two pockets, forming the new pocket 4. Concomitantly,
pockets 2 and 3 merged, forming a larger pocket that is still identified
as pocket 3, according to its Jaccard index. (B) Schematic example
of the conversion of the aggregate merging/splitting statistics N into an undirected network graph. In the matrix N, the off-diagonal red numbers indicate the frequency of the merging
and splitting events, which is then reflected by the size of the edge
connecting two pockets.
Results
Protein Binding Pocket Detection and Dynamics
First,
we tested the ability of our new algorithm to identify and monitor
protein pocket dynamics. We used ∼700 ns long MD simulations
of the purine nucleoside phosphorylase (PNP), a homotrimeric enzyme
that is involved in purine metabolism and T-cell function.[43] PNP inhibition is a strategy for treating T-cell-mediated
diseases, such as leukemia and lymphoma.[44,45]The algorithm detected a total of 22 pockets that exist for
more than 30% of the overall simulations describing the dynamic docking
of DADMe-ImmH into PNP. Among the most persistent ones, our algorithm
identified three large pockets, namely, pocket ID (pID) 4, pID 9,
and pID 12. The persistency is ∼79% for pID 4 and ∼96%
for both pID 9 and pID 12. These three pockets lie at the edge between
the adjacent PNP monomers (Figure and Supporting Figure 1). Each of them corresponds to the known orthosteric binding site
that is targeted by the endogenous substrate as well as by known PNP
inhibitors such as Immucillin-H[46] and DADMe-ImmH.[47] These pockets have an average volume of 367,
894, and 977 Å3, respectively. These volumes are larger
than those observed in the holo PNP crystal, where they are 332, 327,
and 409 Å3, respectively. This suggests an elevated
plasticity of the orthosteric pockets. Notably, the reduced volume
of pID 4 is explained by the existence of a nearby stable pocket,
pID 13, which has a persistency of 94% and an average volume of 313
Å3 (Supporting Figure 1). These two pockets are merged for ∼26% of the simulation
time, resulting in a single larger binding pocket with an average
volume of 677 Å3, which is comparable with the other
two orthosteric pockets pID 9 and pID 12. Furthermore, our algorithm
detected two smaller pockets, pID 3 and pID 14, located in proximity
of the orthosteric binding pockets pID 4 and pID 12 (Figure ). These pockets are present
for 71% and 73% of the simulation time and have an average volume
of 133 Å3 and 144 Å3, respectively,
in agreement with their value in the PNP X-ray structure (203 Å3 and 232 Å3, respectively). Interestingly,
these pockets have been shown to constitute a prebinding site where
the inhibitor DADMe-ImmH transiently binds before accessing one of
the orthosteric sites.[48] Notably, this
prebinding site was not found in proximity of the orthosteric pocket
pID 9. Likely, this is because pID 9 already embeds the prebinding
site during our simulations, as suggested by its larger volume compared
to pID 4 and pID 12.
Figure 2
(A) Localization of the main pockets computed for the
PNP X-ray
structure 3K8O. On the right, the orthosteric ligand DADMe-ImmH located in the
orthosteric pocket (in orange), as in the X-ray structure, and in
the prebinding pocket (in yellow), as found in our MD simulations.
(B) Volume over time of the three orthosteric sites pID 4, pID 9,
and pID 12. The volumes have been smoothed employing a Gaussian filter.
(A) Localization of the main pockets computed for the
PNP X-ray
structure 3K8O. On the right, the orthosteric ligand DADMe-ImmH located in the
orthosteric pocket (in orange), as in the X-ray structure, and in
the prebinding pocket (in yellow), as found in our MD simulations.
(B) Volume over time of the three orthosteric sites pID 4, pID 9,
and pID 12. The volumes have been smoothed employing a Gaussian filter.Another pocket, namely, pID 22,
is detected at the center of the
trimerization interface, with an average volume of 398 Å3 (248 Å3 in the X-ray structure) and a time
persistency of 88%. Intriguingly, this pocket shows a marked crosstalk
pattern with the orthosteric binding site of each monomer during the
MD simulations (Supporting Figure 2). This
suggests a possible signaling transmission network between the different
monomers, mediated by this common interface pocket. This may explain
the negative cooperativity between PNP subunits observed by Schramm
and co-workers[49,50] (see Discussion).
Identification of Interaction Patterns between Pockets in Proteins
Our algorithm detects pocket interactions by observing pocket splitting
and merging during an MD simulation (see Theory). Here, we demonstrate this feature on two proteins with multiple
binding pockets, which are well-characterized with biochemical and
structural data (see below). These proteins are (1) the adenosinic
receptor A2A and (2) the Abelson (Abl) tyrosine kinase.
The adenosinic receptor A2A is a G-protein coupled receptor
(GPCR) with a recognized key role in several pathophysiological processes.[51] It is a promising target for pain, depression,[52] and neurological diseases such as Parkinson’s
disease.[53] The Abelson (Abl) tyrosine kinase
is involved in cell growth and survival.[54] It is a validated target for treating several types of cancer.[55]
Adenosinic Receptor A2A
The analysis performed
on 100 ns long A2A trajectories revealed 13 pockets that
were present for more than 30% of the simulations. Of these, pID 15
and pID 22 coincide with the structurally characterized orthosteric
and allosteric binding sites, respectively.[56,57] In particular, pID 15 is located on the extracellular side of the
receptor and is targeted by both agonist and antagonist drugs, such
as adenosine[58] and ZM241385.[59] In contrast, pID 22 is located in the core of
the transmembrane bundle and is normally occupied by a sodium ion,
which can be displaced by allosteric drugs, such as amiloride and
HMA (Figure ).[60,61] Our analysis revealed that, although pID 15 and pID 22 were well-separated
during most of the MD simulations, they sometimes communicated via
merging and splitting events. This means that these pockets share
a set of residues at their interface. These residues are Val 84, Ala
88, Phe 242, and Trp 246 (Figure A). They are involved in a crosstalk between these
pockets. Interestingly, both MD simulations and experimental studies
have demonstrated a ligand-dependent A2A signaling, which
involves an allosteric effect via these two pockets.[60,62,63]
Figure 3
(A) Time persistency of residues that
define the orthosteric pocket
(blue stem, pID 15) and the allosteric pocket (red stem, pID 22).
(B) Volume plot of pID 15 and pID 22 in selected frames and representation
of merging and splitting events.
(A) Time persistency of residues that
define the orthosteric pocket
(blue stem, pID 15) and the allosteric pocket (red stem, pID 22).
(B) Volume plot of pID 15 and pID 22 in selected frames and representation
of merging and splitting events.For example, at frame 844 (out of 1000 frames analyzed),
the larger
orthosteric pocket pID 15 joins the allosteric pID 22 (Figure B). Interestingly, when these
two pockets are merged, they allow a small channel to form (Figure and Figure ). As suggested by thermal
stability studies[60] and MD simulations,[64] this channel permits the passage of a sodium
ion from the extracellular side to the allosteric pocket. The residues
at the edge between the two pockets may therefore act as a gate, mediating
this channel’s opening and closing, and thus modulating the
entrance of the sodium ion. This would also account for the conformation
flexibility of Trp 246, known as the “toggle switch”
residue, which is associated with the activation mechanism of GPCRs.[56,59,65,66] Notably, the conformational flexibility of residues that lie between
two pockets has been shown to be crucial also in other enzymes, such
as those processing lipids (e.g., fatty acid amide hydrolase[67,68] and monoacylglycerol lipase[7]).
Figure 4
Pocket crosstalk
analysis reveals that when pID 15 merges with
the allosteric pID 22, a small channel is formed (Figure ). We used adiabatic biased
simulations to characterize the passage of a sodium ion through the
transient channel detected by our algorithm (see Supporting Information). (A) The green spheres indicate the
pathway followed by the sodium ion along the simulations, toward the
extracellular site. The two conserved residues Trp246 and Asp52 are
shown. In panel B we show the narrowest section of the channel, with
the gating Trp246 partially flipped so as to allow ion passage.
Pocket crosstalk
analysis reveals that when pID 15 merges with
the allosteric pID 22, a small channel is formed (Figure ). We used adiabatic biased
simulations to characterize the passage of a sodium ion through the
transient channel detected by our algorithm (see Supporting Information). (A) The green spheres indicate the
pathway followed by the sodium ion along the simulations, toward the
extracellular site. The two conserved residues Trp246 and Asp52 are
shown. In panel B we show the narrowest section of the channel, with
the gating Trp246 partially flipped so as to allow ion passage.We used adiabatic bias simulations
to test the passage of the sodium
ion through the transiently formed channel detected via pocket crosstalk
analysis (Figure );
although qualitative, our results indicate that this channel can allow
the sodium ion to access the protein through a continuous and sterically
accessible pathway (Figure ). While the calculation of the associated thermodynamics
would require more extended simulations, it is interesting to note
that this pathway well resembles the one shown by Yuan et al., identified
along 9.6 μs of plain MD simulation.[69]
Abelson (Abl) Tyrosine Kinase
Here, we analyzed the
MD simulations of the kinase domain (KD) of Abl, both in the catalytically
active DFG-in (KDin) and in the catalytically inactive
DFG-out (KDout) conformations (see Methods for details).[70] Among the most persistent
pockets in KDin, our algorithm successfully identified
the large ATP site (pID 5, see Supporting Figure 3).[71] In addition to pID 5, we found
another nearby pocket located between the DFG motif and the αC
helix (pID 3, see Supporting Figure 3),
which is known to be an allosteric site located in the KD.[72] Interestingly, pID 5 and pID 3 present a marked
crosstalk in our MD simulations, sharing a large set of residues,
which are listed in Supporting Table 1.
For ∼63% of the simulation, they remain separate, with volumes
of 439 Å3 and 167 Å3, respectively.
However, for ∼35% of the simulations, the two pockets merge
into a single larger pocket with an average volume of 497 Å3 (Figure ).
Figure 5
Representation
of the dynamical behavior of the ATP pockets during
MD simulations of the KDin (a, b) and KDout systems
(c–f). In KDin, the ATP pocket pID 5 coexists with
the nearby pID 3 for 63.0% of the simulations (a), while pID 5 is
the only pocket for 35% of the simulations (b). In KDout, the ATP pocket coexists with pID 3 and pID 28 for 11% of the MD
simulations (c). It is the only emerging pocket for 38% of the simulations
(d), and it coexists with pID 3 for 20% of the simulations (e) and
with pID 28 for 23% of the simulations (f).
Representation
of the dynamical behavior of the ATP pockets during
MD simulations of the KDin (a, b) and KDout systems
(c–f). In KDin, the ATP pocket pID 5 coexists with
the nearby pID 3 for 63.0% of the simulations (a), while pID 5 is
the only pocket for 35% of the simulations (b). In KDout, the ATP pocket coexists with pID 3 and pID 28 for 11% of the MD
simulations (c). It is the only emerging pocket for 38% of the simulations
(d), and it coexists with pID 3 for 20% of the simulations (e) and
with pID 28 for 23% of the simulations (f).In KDout, our algorithm again identified the ATP
pocket
pID 5 and the nearby allosteric pocket pID 3 (Supporting Figure 3), with an average volume of 339 Å3 and 222 Å3, respectively. However, the algorithm
identified also an additional pocket, pID 28, which is located between
the P-loop and the β3-αC-helix loop (Supporting Figure 3), and which has a smaller volume of 150
Å3. Notably, the region containing this new pocket
is targeted by a new class of diphenylamine-derived allosteric inhibitors
of MEK 1 and MEK 2 kinases.[73] Here too,
the three pockets (pID 3, pID 5, and pID 28) share a set of residues
(listed in Supporting Table 1), suggesting
a communication network. Interestingly, in this case, these three
pockets exist as individual entities for only ∼11% of the overall
simulated time. This is because merging events are more frequent than
in the catalytically active KDin system (Figure ).According to our analysis,
the DFG-in conformation limits the crosstalk
between the ATP pocket and the nearby pocket pID 3 in the catalytically
active KDin system. In contrast, in KDout, the
DFG-out conformation favors communication with the surrounding pockets
pID 3 and pID 28. Indeed, the DFG-out conformation has been demonstrated
to increase the flexibility of the ATP binding site, which leads to
a loss of KD activity.[74,75] Thus, our analysis suggests that
the specific DFG conformation (in or out) affects the formation of
transient pockets and their communication network. These, in turn,
modulate the ATP binding site’s overall shape and the resulting
KD activity.
Allosteric Signal Transmission Networks Inferred
from Pocket
Crosstalk Analysis
Here, we tested the capability of our
algorithm to reveal allosteric signal propagation pathways in proteins,
starting from an MD-generated equilibrium ensemble of structures.
As a paradigmatic example, we investigated the Abl kinase, which is
one of the most characterized systems in terms of molecular determinants
for protein allostery.[74−77] We performed a comparative analysis of four different Abl systems:
(1) the wild-type KDin form, (2) KD bound to myristate
(Myr/KDin), (3) the KD apo form of the T315I mutation (T315I-KDin), and (4) system T315I-KDin bound to myristate
(Myr/T315I-KDin).In all systems, our analysis detected
both the orthosteric ATP pocket and the allosteric myristate pocket,
which is located in the C-lobe of the KD.[78] The analysis also returned an intensive crosstalk between pockets
distributed on the protein surface. This was detected by looking at
the off-diagonal elements of the square matrix N (see Theory). Notably, this crosstalk network connects
the distal orthosteric ATP and the allosteric myristate binding sites
(Figure ), which is known to be central for the allosteric
signal propagation in Abl.[79−82] In particular, in both KDin and Myr/KDin, the ATP and the myristate pockets are connected via a long-range
crosstalk network composed of a number (∼3 to 5) of small transient
pockets (average volume ∼90 Å3, and persistency
between 20% and 70%). These pockets are mainly located in the C-lobe
and mostly involve the αG and αI helices (Figure ). Notably, the location of
the pockets corresponds to that of the new allosteric pockets reported
by Shan et al. in their study on binding pathways in Src kinase.[83] Moreover, in the MD simulations of Abl with
the drug-resistant T315I point mutation (T315I-KDin system),
the ATP ↔ myristate long-range crosstalk network is actually
disrupted (Figure ). The perturbation of this communication pathway in the T315I-KDin system might explain the dysregulation of the T315I kinase
form.[84,85] Remarkably, however, the ATP ↔ myristate
pockets’ long-range communication network is restored in our
simulations when the myristate binds to the T315IAbl mutated form
(i.e., the Myr/T315I-KDin system). This is also observed
in the wild-type system and is in line with other experimental and
computational studies showing that myristate mimesis elicits a structural
rearrangement on the ATP pocket and influences the binding affinity
of orthosteric inhibitors (see Supporting Information).[77,79] This allosteric communication network could
also explain the results of HX MS experiments, which further detailed
the presence of an allosteric-signal-propagation pathway from myristate
to the ATP pockets, passing through KD structural elements such as
the αI-helix.[79]Networks of the most
persistent pockets found in the KDin, T315I-KDin, Myr/KDin, and Myr/T315I-KDin trajectories.
Each pocket (i.e., network node) is represented
as a sphere, with the different colors indicating the pocket’s
persistency. The pockets are connected via black lines (i.e., network
edges). The width of each edge is proportional to the communication
frequency. The networks connect the ATP and the myristate binding
sites in all systems except T315I-KDin. We performed our
analysis considering only pockets having a persistency of at least
20% and above, along the simulation time.As a following step, we aimed at connecting such a mechanism
for
allosteric signaling with useful information for ligand design. Toward
this end, we performed enhanced sampling MD simulations to test whether
this network for allosteric signaling affects ligand binding at the
orthosteric site. Thus, we performed a set of scaled-MD simulations[86] on the T315I mutant of Abl, either in complex
with dasatinib bound at the orthosteric site (i.e., Das/T315I-KDin) or in complex with both dasatinib and the myristate, at
the distal myristate binding pocket (i.e., Das-Myr/T315I-KDin). Scaled-MD simulations can be used to accelerate the unbinding
process by reducing the atomic interactions in the potential energy
of the system. While facilitating the unbinding from the pocket during
the MD run, scaled-MD simulations allow also a qualitative and relative
estimation of the ligand residence time.[86,87] Here, we used scaled-MD simulations to estimate the residence time
of dasatinib in Das-Myr/T315I-KDin and Das/T315I-KDin systems, which only differ in the presence of the myristate
(Figure ). In these
simulations, dasatinib showed an average scaled residence time ∼2
times longer when unbinding from Das-Myr/T315I-KDin compared to when
it unbinds from Das/T315I-KDin (see Supporting Table 2). Converted back to nonscaled residence times, (see
ref (86) for further
details) dasatinib in the two systems returns a ratio value of more
than 4. Although only qualitative, this result further indicates that
the presence of the myristate, which coincides with the formation
of the crosstalk signaling network connecting the orthosteric ATP
and the allosteric myristate pockets (Figure ), stabilizes the complex with dasatinib.
Notably, biochemical assays confirm that dasatinib is active against
the T315IAbl form only in the presence of an allosteric binder.[79] Moreover, the residence time of a drug often
correlates with its binding affinity, due to the relationship between
the rate of dissociation (koff) and the
thermodynamic dissociation constant (Kd). Hence, a longer residence time suggests a greater affinity of
the drug for its target. Taken together, these results verify binding
cooperativity effects in the Abl system, in which the presence of
an allosteric binder at the myristate pocket is shown to impact on
ligand binding at the orthosteric site. This example suggests that
the information generated by dynamic pocket crosstalk analysis can
help identify suitable conformational states for ligand binding thermodynamics
obtained through subsequent calculations.
Discussion
In
this work, we provide new mechanistic insights on pocket dynamics
and allosteric communication in three pharmaceutically relevant proteins,
namely, the purine nucleoside phosphorylase (PNP) enzyme, the adenosinic
receptor (A2A), and the Abelson (Abl) kinase. Importantly,
these findings are obtained through the use of a novel algorithm that
analyzes pocket crosstalk events along the dynamics of a given biomolecular
system. In this study, we demonstrate that this method can effectively
reveal allosteric communication networks embedded in the ever-longer
available MD trajectories. In addition, this new method characterizes
all binding pockets through site-centered descriptors, which can be
used in machine-learning-based virtual screening protocols with enhanced
predictivity.[3]First, the algorithm
identified all three experimentally known
orthosteric binding sites in the PNP homotrimeric enzyme (pID 4, pID
9, and pID 12 in Figure ).[44] As expected, these are the largest
and most persistent pockets. The dynamical analysis also identified
smaller pockets in proximity of these three sites. Notably, these
auxiliary pockets were involved in a prebinding state of DADMe-ImmH,
a PNP inhibitor,[47] which was found to transiently
interact with this small pocket before entering the main orthosteric
site.[48] Analysis of the volume variation
over time pointed to an elevated plasticity of the orthosteric binding
sites. In fact, their average volume is higher than the crystallographic
corresponding value. This suggests that the pockets undergo a structural
rearrangement, which likely facilitates the binding of inhibitors
observed during the simulations.[48] Our
analysis also revealed a marked pattern of communication between each
orthosteric pocket and its nearby prebinding site, reflected by a
high frequency of merge and split events. Interestingly, this evidence
is in line with structural studies, which show that two of the binding
site’s structural motifs (the His 257 helix and His 64 loop)
undergo a major conformational alteration upon the binding of the
inhibitor. This alteration affects the binding pocket’s size
and shape.[49] In addition, the high flexibility
of this structural region in PNP was recently used to demonstrate
that MD can be effectively deployed to improve virtual screening results.[3]Our algorithm also identified a long-range
communication pathway
connecting the orthosteric pockets and another pocket at the center
of the trimerization interface (pID 22). This pathway suggests a possible
allosteric connection between the three orthosteric pockets that passes
through pID 22, at the trimerization interface. Intriguingly, this
allosteric signal may help explain the negative cooperation between
PNP subunits, as observed by Schramm and collaborators.[49,50] In H/D exchange experiments, dynamical coupling has also been observed
between the orthosteric sites and the pocket located at the trimerization
interface.[88] This further corroborates
our hypothesis, according to which pID 22 could be the hub of the
allosteric signal transmission between PNP monomers.We then
analyzed the MD trajectory of A2A, simulated
in POPC membrane. This analysis captured the experimentally known
pockets, which are the orthosteric binding site, located close to
the extracellular side, and the underlying sodium allosteric site
(Figure ).[56,57,59] Our analysis also showed that,
during the simulation, the two pockets communicated through a set
of residues located at the interface of the two sites (i.e., Val 84,
Ala 88, Phe 242, and Trp 246). This result is in line with earlier
experimental and computational studies, which hypothesized the presence
of an allosteric signal between the two cavities, which is crucial
for modulating A2A function.[60,64,89] In this regard, we found that the conformational
flexibility of Trp 246, also known as the “toggle switch”,
appears to be crucial for the formation of a small channel during
a two-pocket merging event. Via this channel, the Na+ ion
may access the inner part of the protein from the extracellular space,
as experimentally reported (Figure and Figure ).[56,60,90]Merging and splitting events were also shown in our MD simulations
of the Abl kinase. Here, we focused on the orthosteric ATP site, comparing
the results obtained from analyzing the catalytically active DFG-in
kinase domain and the catalytically inactive DFG-out kinase domain
(KDin and KDout, respectively). Together with
the ATP site, our algorithm correctly detected two additional smaller
adjacent pockets. One is an allosteric pocket often exploited by type
II kinase inhibitors, while the other is present only in KDout, lying between the P-loop and the β3-αC-helix loop (Figure ). Notably, this
region is targeted by a new class of MEK1 and 2 kinase inhibitors.[73] This pocket also accommodates the piperazine–phenyl–pyrimidine
moiety of SCH772984 in both ERK1 and ERK2 kinases.[91] This confirms the ability of our algorithm to identify
small hidden cavities that can be exploited in the design of new selective
inhibitors. Notably, our analysis captured even subtle protein conformational
changes, such as split and merge events between the large ATP site
and the two nearby subpockets. These events are more frequent in the
inactive KDout than in the active KDin form.
This intensive crosstalk reflects the high plasticity of the ATP binding
site in KDout, as suggested by previous computational studies,[74,75] which results in a diminished catalytic activity.Lastly,
the method was able to reveal a long-range connection between
the orthosteric ATP binding site and the allosteric myristate pocket.
This has frequently been reported in the literature as a crucial allosteric
mechanism for kinase function.[79−82] Our analysis reveals a pocket crosstalk network,
which passes through a set of small pockets located close to the αG
and αI helices of the C-lobe, in a region that can be targeted
by allosteric kinase inhibitors, as suggested by a recent computational
study (Figure ).[83] Computational and experimental studies[76,77,79] suggest that the integrity of
this communication network coincides with a proper functioning of
Abl. Targeting this pocket communication network may thus offer a
new way for modulating the activity of the protein, as exemplified
here for the binding of dasatinib to Abl, and its allosteric modulation.[79] Indeed, both HX MS experiments and MD simulations
indicated that the αG and αI helices are involved in the
allosteric signal propagation in Abl.[79] Our analysis detected a perturbation of this pocket communication
network in the dysregulated T315IAbl form only, which occurs at the
level of the αG and αI helices of the C-lobe (Figure ). That is, we found
that the T315I point mutation in the absence of the myristate interrupts
this pocket crosstalk network, which might explain the dysregulation
of the T315IAbl form.[84] Ultimately, within
the sampling limitations of the input trajectories, this new algorithm’s
dynamical analysis detects interaction networks, which involve rearrangements
that may reveal new binding sites on the protein surface.[92] It can also point to possible mechanisms for
allosteric signal propagation, spotting protein configurations that
may be used as target structures for ligand binding thermodynamics
through computations for structure-based drug design.[93]
Conclusions
In this work, we provide new mechanistic
understandings about pocket
formation and allosteric communication in three relevant drug discovery
targets, namely, PNP, A2A, and Abl kinase. Importantly, these findings
are obtained through the use of an original and fully automated method
for analyzing pocket crosstalk along microsecond-long MD simulations,
which are performed to examine the spatiotemporal evolution of proteins.
We demonstrate that this unprecedented dynamical analysis can reveal
otherwise hidden connections between pockets, which may also underlie
allosteric communication networks in proteins, as discussed for the
biomolecular systems here investigated. Ultimately, we propose dynamic
pocket crosstalk analysis for a more detailed understanding of the
structural dynamics of proteins and biological regulation through
allosteric communication, suggesting a prospective use of this method
in structure-based drug design.
Methods
Structural
Models
In the present work, we analyzed
the MD simulations of three different systems, namely, purine nucleoside
phosphorylase (PNP), the adenosinic receptor (A2A), and
the Abelson (Abl) kinase. The homotrimeric construct of PNP was modeled
using 3K8O X-ray
structures and was simulated in the presence of nine DADME-ImmH ligands
and phosphate ions, retrieved from 1RSZ and 1RR6(94) PDB structures,
respectively. The A2A receptor was mostly built using the 3UZC X-ray structure,[95] while the 4EIY X-ray structure[56] was used as a template for the missing ECL2 in 3UZC. The apo protein
was embedded in POPC membrane. For Abl, we built five different model
systems. Two included the apo wild-type KD alone in both the DFG-in
(KDin) and DFG-out conformations (KDout). Two
comprised the KD mutated at the gatekeeper residue T315 (located at
the ATP binding site) with an isoleucine residue. These mutated forms
are either in the apo form (T315I-KDin) or in complex with
myristate (Myr/T315I-KDin). The last system is the wild-type
KD in complex with myristate (Myr/KDin). The KDout system was built starting from the 1OPL chain B X-ray structure[96] after removal of the SH2 domain. The remaining systems
were modeled starting from the 2F4J X-ray structure.[97] For further details, see the Supporting Information.
MD Simulations
PNP was parametrized using the Amber
ff99SB-ILDN force field,[98] while the General
Amber Force Field (GAFF)[99] was used to
parametrize the ligands and the ions. Ligand partial charges were
fitted using the RESP procedure via Antechamber.[100] The system was immersed in a TIP3P water box[101] and comprised ∼100,000 atoms. After
350 ns of equilibration, which led the system to a temperature of
300 K and to a pressure of 1 bar, we ran the production in NVT ensemble.
For the analysis, we collected ∼700 ns of simulation stored
in ∼1800 frames.A2A was parametrized using
the Amber ff99SB-ILDN force field.[98] The
protein and the POPC membrane were immersed in a TIP3P water box[101] reaching a total of 65,802 atoms. Here too,
the system was first equilibrated to reach a temperature of 300 K
and a pressure of 1 bar, while the production was run for 100 ns in
NPT ensemble. We extracted ∼1000 frames from the trajectory
for analysis. For adiabatic bias simulations, we used Plumed2.[102] The reaction coordinate was the distance between
the sodium ion and the geometric center of the alpha carbons of Leu267,
Asp175, and Gln163 (pdb 3EML). The line connecting this geometric center and the
sodium ion is quasi parallel to the principal axis of the GPCR and
thus is a reasonable steering coordinate. The spring constant was
set to 500 kJ/mol/ Å2.The five Abl kinase systems
were parametrized using the Amber ff99SB
force field.[103] For Myr/T315I-KDin and Myr/KDin models, the myristate was parametrized using
the General Amber Force Field (GAFF)[99] after
computing the point charges at the HF/6-31G* level of theory. The
systems were solvated in TIP3P water and comprised ∼45000 atoms.
After 5 ns of equilibration, the systems reached a temperature of
300 K and a pressure of 1 bar, and we then ran the production in NPT
ensemble for a simulation time that ranged from ∼0.8 μs
to ∼2.5 μs. Here, our algorithm was used to analyze ∼16,000
to ∼50,000 frames. See the Supporting Information for more details.
Algorithm Parameters
For each analysis,
we set the
small and large probe radii to 1.4 and 3 Å, respectively. To
avoid unnecessary noise due to the detection of very small pockets,
we set the minimum volume of a detectable pocket to the equivalent
of 5 water molecules for PNP and A2A, and 4 water molecules
for Abl kinase. Finally, for tracking purposes, we removed all the
ligands, ions, membrane, and water molecules from each trajectory,
taking into account the sole protein.
Authors: Laura Riccardi; Jose M Arencibia; Luca Bono; Andrea Armirotti; Stefania Girotto; Marco De Vivo Journal: Biochim Biophys Acta Mol Cell Biol Lipids Date: 2017-01-12 Impact factor: 4.698
Authors: Javier Suarez; Antti M Haapalainen; Sean M Cahill; Meng-Chiao Ho; Funing Yan; Steven C Almo; Vern L Schramm Journal: Chem Biol Date: 2013-02-21
Authors: Bhushan Nagar; Oliver Hantschel; Matthew A Young; Klaus Scheffzek; Darren Veach; William Bornmann; Bayard Clarkson; Giulio Superti-Furga; John Kuriyan Journal: Cell Date: 2003-03-21 Impact factor: 41.582