Amaurys A Ibarra1, Gail J Bartlett2, Zsöfia Hegedüs3,4, Som Dutt3,4, Fruzsina Hobor4,5, Katherine A Horner3,4, Kristina Hetherington3,4, Kirstin Spence4,5, Adam Nelson3,4, Thomas A Edwards4,5, Derek N Woolfson1,2,6, Richard B Sessions1,6, Andrew J Wilson3,4. 1. School of Biochemistry , University of Bristol , Medical Sciences Building, University Walk , Bristol BS8 1TD , U.K. 2. School of Chemistry , University of Bristol , Cantock's Close , Bristol BS8 1TS , U.K. 3. School of Chemistry , University of Leeds , Woodhouse Lane , Leeds LS2 9JT , U.K. 4. Astbury Centre for Structural Molecular Biology , University of Leeds , Woodhouse Lane , Leeds LS2 9JT , U.K. 5. School of Molecular and Cellular Biology , University of Leeds , Woodhouse Lane , Leeds LS2 9JT , U.K. 6. BrisSynBio , University of Bristol , Life Sciences Building, Tyndall Avenue , Bristol BS8 1TQ , U.K.
Abstract
Protein-protein interactions (PPIs) are vital to all biological processes. These interactions are often dynamic, sometimes transient, typically occur over large topographically shallow protein surfaces, and can exhibit a broad range of affinities. Considerable progress has been made in determining PPI structures. However, given the above properties, understanding the key determinants of their thermodynamic stability remains a challenge in chemical biology. An improved ability to identify and engineer PPIs would advance understanding of biological mechanisms and mutant phenotypes and also provide a firmer foundation for inhibitor design. In silico prediction of PPI hot-spot amino acids using computational alanine scanning (CAS) offers a rapid approach for predicting key residues that drive protein-protein association. This can be applied to all known PPI structures; however there is a trade-off between throughput and accuracy. Here we describe a comparative analysis of multiple CAS methods, which highlights effective approaches to improve the accuracy of predicting hot-spot residues. Alongside this, we introduce a new method, BUDE Alanine Scanning, which can be applied to single structures from crystallography and to structural ensembles from NMR or molecular dynamics data. The comparative analyses facilitate accurate prediction of hot-spots that we validate experimentally with three diverse targets: NOXA-B/MCL-1 (an α-helix-mediated PPI), SIMS/SUMO, and GKAP/SHANK-PDZ (both β-strand-mediated interactions). Finally, the approach is applied to the accurate prediction of hot-spot residues at a topographically novel Affimer/BCL-xL protein-protein interface.
Protein-protein interactions (PPIs) are vital to all biological processes. These interactions are often dynamic, sometimes transient, typically occur over large topographically shallow protein surfaces, and can exhibit a broad range of affinities. Considerable progress has been made in determining PPI structures. However, given the above properties, understanding the key determinants of their thermodynamic stability remains a challenge in chemical biology. An improved ability to identify and engineer PPIs would advance understanding of biological mechanisms and mutant phenotypes and also provide a firmer foundation for inhibitor design. In silico prediction of PPI hot-spot amino acids using computational alanine scanning (CAS) offers a rapid approach for predicting key residues that drive protein-protein association. This can be applied to all known PPI structures; however there is a trade-off between throughput and accuracy. Here we describe a comparative analysis of multiple CAS methods, which highlights effective approaches to improve the accuracy of predicting hot-spot residues. Alongside this, we introduce a new method, BUDE Alanine Scanning, which can be applied to single structures from crystallography and to structural ensembles from NMR or molecular dynamics data. The comparative analyses facilitate accurate prediction of hot-spots that we validate experimentally with three diverse targets: NOXA-B/MCL-1 (an α-helix-mediated PPI), SIMS/SUMO, and GKAP/SHANK-PDZ (both β-strand-mediated interactions). Finally, the approach is applied to the accurate prediction of hot-spot residues at a topographically novel Affimer/BCL-xL protein-protein interface.
Protein–protein
interactions
(PPIs) play a central regulatory role in the majority of cellular
signaling processes. Aberrant PPIs lead to disease and represent intervention
points for targeted therapeutics, motivating efforts to discover small-molecule
or peptide/peptidomimetic ligands.[1−7] PPIs generally occur between significantly larger and topographically
shallower surfaces than the lock-and-key-like pockets found in conventional
drug targets, for example, enzymes and receptors, such as G-protein
coupled receptors (GPCRs), making PPIs a challenge for ligand discovery.[2,5,8−12] Significant recent successes, for example, ABT-199[13] and RG7112,[14] have
countered the perception that PPIs are undruggable. Nonetheless, it
is not clear which PPIs will be therapeutically tractable and relevant
to disease,[15−17] a problem that may be explored using chemical probes.[18] In turn, this prompts the question: which PPIs might feasibly be modulated with small molecules or peptides/peptidomimetics? Though significant progress has been made in structurally characterizing
PPIs, in order to understand the molecular basis of recognition it
is also necessary to establish the determinants of favorable interaction,
that is, which amino-acid residues contribute most to binding. Such
information is essential in understanding biological mechanism, rationalizing
dynamic conformational transitions, interpreting the functional role
of mutation, and identifying starting points for inhibitor design.Cunningham and Wells first used experimental alanine scanning to
map PPIs.[19] In this approach, variant proteins
or peptides with Xaa-to-Ala mutations are used to evaluate the role
of each amino acid side chain in a PPI. Assuming the absence of any
changes in the backbone conformation of the protein, the changes indicate
the contribution that the side chain makes to the binding free energy.
Such studies have established that the thermodynamic driving force
for PPIs can be dominated by a few key amino acid residues, termed
hot-spots[20] or hot regions.[21] This experimental approach has been widely applied,[22] but it is time-consuming and difficult to implement
across entire protein–protein interactomes.[23] Moreover, assay-dependent variations in binding affinities
make comparative analyses of the literature problematic.[24]In silico methods can
be implemented to predict
hot-spot residues from extant protein structures,[24−28] and efforts have been made to classify PPIs in the
Protein Data Bank (PDB)[29] by secondary
or tertiary structure at the interfaces and to predict hot-spot residues en masse.[30−32] For computational alanine scanning (CAS), the binding
free energies for the native and variant PPIs are calculated and the
difference in these two values (ΔΔG)
is used to evaluate the participation of the variant amino acid in
the interface. Several approaches to CAS have been based on classical
molecular dynamics simulations to capture the free energy change on
amino acid variation. These methods include: Free Energy Perturbation,[33−35] Thermodynamic Integration,[34−36] the popular hybrid MM/PBSA method[37] that has recently been improved by inclusion
of interaction-entropy calculation,[38] and
by localizing the simulation volume around the residue in question.[39] The Flex_ddG method in Rosetta[28] uses a combination of sophisticated Monte Carlo sampling,
minimization, and specialized force fields to provide one of the most
accurate current methods for CAS. We compare our new fast approach,
BudeAlaScan, with this method and other rapid methods that use a variety
of scoring functions, which fall broadly into the following categories:
physicochemical energy functions or statistical functions. Examples
currently accessible via web servers include KFC,[40] HotPoint,[41] DrugScore-PPI,[42] and PredHS.[43] We
chose FoldX[44,45] as one of the first methods for
rapid computational alanine scanning using a physicochemical force
field, Rosetta Flex_ddG[28] as the most recent
tool exploiting physicochemical energy terms, and BeAtMuSiC[46] and mCSM[47] as tools
employing statistical potentials and machine learning. We have avoided
CAS methods that rely on molecular dynamics simulations for conformational
sampling as being slow for high throughput calculations.Typically,
fast CAS methods operate on single structures and thus
do not account for protein dynamics. In some cases, the methods can
be, and have been, performed on multiple structures,[25,48−50] for example, from NMR ensembles or MD trajectories.
This is an important consideration given that many PPIs exploit intrinsically
disordered regions (IDRs)[51] and transient
or dynamic noncovalent contacts can play important mechanistic roles
in recognition.[52,53] Here we perform a comparative
analysis of common fast tools for CAS enabling informed consensus
selection of hot-spot residues. This is benchmarked against an extensive
literature data set and further validated through experimental analyses
of four diverse PPIs, for which further analyses of NMR ensembles
or molecular dynamics trajectories provides insight on dynamics to
identify bona fide hot-spot residues.FoldX
is based on empirical potentials built from optimized combinations
of various physical energy terms.[44,45] The original
Robetta method[54] is a physical energy function
parametrized on a monomeric protein data set (ProTherm).[55,56] The Rosetta method used herein is Flex_ddG, based on the ΔΔG monomer method, using both the current general Rosetta
force field,[28] Ref2015, and the specialized
force field Talaris2014.[28,57−59] BeAtMuSiC[46] is a coarse-grained predictor
of changes in binding free energy induced by point mutations and uses
a set of statistical potentials derived from known protein structures.
The statistical potentials are trained on data from ProTherm[56] and validated using the SKEMPI database.[60] SKEMPI is a database of 3047 binding free energy
changes upon mutation collated from the literature, for PPIs with
known structure. In this study, we used all the ΔΔG data for the 748 single mutations to alanine present in
the SKEMPI database that were compiled, ratified, and published recently.[26] The method mCSM[47] describes the protein environment upon mutation using signature
vector patterns for each amino acid and was trained and tested on
SKEMPI.[60] Finally, BudeAlaScan, which we
introduce here, is an empirical free-energy approach adapted from
a small-molecule-docking algorithm, BUDE,[61] using the standard force field (version heavy_by_atom_2016.bhff)
provided with the current BUDE release 1.2.9. BudeAlaScan is a command-line
application where the user assigns each protein of the complex as
the receptor and the ligand. The application varies the ligand residues
for alanine and reports the interfacial ΔΔG as ΔGALA – ΔGWT (hence a hot residue will have a positive
value). For residues in the wild-type complex that may carry a charge
(types DERKH) a rotamer library is used to estimate and account for
the configurational entropy loss on forming an interfacial salt bridge.
We will make a web-based graphical user interface available for BudeAlaScan
in due course. All our selected tools for this comparison could be
used to perform calculations on the SKEMPI data set or had been trained
on this data set, further supporting our choice of in silico tools.To compare the predictive capability of the five methods
further,
we perform detailed experimental analyses from the predictions on
three diverse PPI targets: (i) the NOXA-B/MCL-1[62] interaction, an important current target in oncology;[63] (ii) the SIMS/SUMO[64] interaction, representative of a number of regulatory PPIs that
depend on the SUMOylation post-translational modification;[65] (iii) the GKAP/SHANK-PDZ interaction,[66] which performs a scaffolding function at synaptic
junctions.[67] Example i is an α-helix-mediated
PPI, while examples ii and iii are both β-strand-mediated interactions.
Thus, these provide diversity in secondary structure interfaces and
biological functions. Comparison of predicted and experimental data
reveals that averaging the ΔΔG values
for each residue across the five methods leads to more accurate prediction
than any single method alone. Finally, we present further support
for the implementation of this approach through prediction of hot
residues for a topographically novel Affimer/BCL-xL PPI in which two loops project into a hydrophobic cleft.
This allowed selection of a minimal number of residues that were experimentally
validated.
Results and Discussion
Scope of the Methods
The CAS methods
used here process
structure types differently, that is, single X-ray crystal structures,
NMR ensembles, MD trajectories. Figure illustrates the capabilities of each in silico prediction tool and how structures are processed (see also Table S1 and Computational Methods in Supporting Information (SI)). BudeAlaScan is
the only tool that allows processing of structure ensembles and scanning
of multiple mutations to alanine at a time (i.e., hot-residue clusters).
Figure 1
Schematic
overview of approach for experimentally validated predictive
alanine scanning using different methods. Different structural starting
points can be used (single structure, black; NMR ensemble, blue; MD
ensemble, orange) together with different in silico hot-spot residue prediction tools. Users can navigate through this
workflow in different ways according to their requirements.
Schematic
overview of approach for experimentally validated predictive
alanine scanning using different methods. Different structural starting
points can be used (single structure, black; NMR ensemble, blue; MD
ensemble, orange) together with different in silico hot-spot residue prediction tools. Users can navigate through this
workflow in different ways according to their requirements.
Different in Silico Prediction
Tools Function
in Different Ways and Provide Varying Results
First, we compared
the results of alanine scanning on the SKEMPI data set for each of
the tools, focusing on PPIs with natural amino acids and without post-translational
modifications. Data for FoldX, mCSM, BeAtMuSiC, and BudeAlaScan were
calculated in-house, while data for the Rosetta ddG methods were taken
from the literature.[28,46] This analysis took 8 min for
FoldX, 5 min for BudeAlaScan, and 1–2 h per amino acid variation
for Rosetta Flex_ddG on a single core of a linux X86_64 workstation
running at 3.5 GHz. The Pearson correlation coefficient between the
computed and experimental data in Figure are given in Table .
Figure 2
Comparison of experimental (SKEMPI) and predicted
ΔΔG values for different prediction tools:
(a–f) correlation
plots (solid line is the 1:1 correlation) for predicted versus experimental
ΔΔG for each of (a) BudeAlaScan, (b)
FoldX, (c) Flex_ddG Talaris, (d) Flex_ddG Ref2015, (e) BeAtMuSiC,
(f) mCSM, (g) Average, (h) Fraction Correct overall, and (i) Fraction
Correct by residue type. Data for panels c and d are taken from Barlow
et al.[28] and plotted assuming 1 REU = 1
kcal/mol.
Table 1
Pearson Correlation
Coefficient between
Experimental and Predicted ΔΔG Values
for Mutations to Alanine in the SKEMPI Dataset
Flex_ddG
BUDE
FoldX
Talaris2014
Ref2015
BeAtMuSiC
mCSM
Average
Pearson R
0.50
0.28
0.51
0.49
0.47
0.66
0.60
fraction correct
0.76
0.51
0.76
0.74
0.72
0.77
0.78
Comparison of experimental (SKEMPI) and predicted
ΔΔG values for different prediction tools:
(a–f) correlation
plots (solid line is the 1:1 correlation) for predicted versus experimental
ΔΔG for each of (a) BudeAlaScan, (b)
FoldX, (c) Flex_ddG Talaris, (d) Flex_ddG Ref2015, (e) BeAtMuSiC,
(f) mCSM, (g) Average, (h) Fraction Correct overall, and (i) Fraction
Correct by residue type. Data for panels c and d are taken from Barlow
et al.[28] and plotted assuming 1 REU = 1
kcal/mol.The predictive performance of
BudeAlaScan is comparable with Rosetta
Flex_ddG and BeAtMuSiC. mCSM outperforms the physicochemical based
methods, but it should be emphasized that this machine-learning approach
is trained on the SKEMPI data it is fitting. Classifying the SKEMPI
data as being positive (a hot residue) ΔΔG ≥ 4.184 kJ/mol, neutral (little effect) 4.184 kJ/mol <
ΔΔG > −4.184 kJ/mol, or negative
(affinity enhancing) ΔΔG ≤ −4.184
kJ/mol allows the determination of the fraction of correct predictions
(FC). These results vary across the series (FC 0.51–0.78, see Figure h and Tables S2 and S3), although crucially different
residues are correctly or incorrectly predicted from tool to tool.
Calculating the fraction correct within each of these three categories
(positive, neutral, and negative, Supporting Information, see Figure S1) reveals that the accuracy in correctly predicting
a hot-residue decreases for all methods (FC ≈ 0.6) with FoldX
performing least well. Conversely, the accuracy in predicting neutral
residues increases for all methods (FC ≈ 0.8) apart from FoldX.
There are only four negative examples in the SKEMPI set where mutation
to alanine improves binding, and only FoldX predicts 2 of these, albeit
in the context of many false negative predictions. BudeAlaScan uses
a set of side-chain rotamers to estimate the configurational entropy
loss on forming interactions involving DERKH residues and a fixed
backbone. FoldX performs a local side-chain relaxation and Rosetta
Flex_ddG allows local backbone and side-chain sampling. When operating
on a single structure, none of the methods has sufficient conformational
sampling, consideration of dynamics, or explicit solvation to provide
the accuracy that might be achieved through more computationally intensive
methods.[26,28,40−47]PPI’s may possess interfaces where the binding energy
is
evenly distributed across the surface or concentrated in a few hot
residues or regions. Prediction of the most influential residues in
the former case is likely to be hampered by the accumulation of systematic
errors. Thus, when using a single method, it is necessary to exercise
caution in prioritizing one predicted residue over another based on
the magnitude of its predicted ΔΔG.
Comparison of Computational and Experimental Alanine Scanning
for Three Representative PPIs
While the SKEMPI data set provided
an extensive set of data from the literature, as noted previously
this is drawn from multiple different studies exploiting different
assay methods, motivating our choice to supplement our comparative
analysis with high-quality in-house experimental data. For comparison
between prediction and experiment, we focused on protein/peptide interactions
for several reasons: (1) peptide interacting motifs (PIMS)[68] play a crucial role in many PPIs and are known
to be effective starting points[69] for chemical
probe discovery campaigns; (2) a significant proportion of structures
in the PDB are protein/peptide interactions with the peptide excised
from the full-length protein; (3) ease of preparation/purification
and standard N-terminal fluorescent labeling support quantification
and a consistent assay format.
Experimental Alanine Scanning
Experimental
alanine
scanning was performed for structural representatives of three target
systems: NOXA-B/MCL-1 (2JM6), SIMS/SUMO (2LAS), and GKAP/SHANK-PDZ (1Q3P). Each of the proteins was expressed
in E. coli, and a series of labeled and unlabeled
variant peptides was synthesized and purified (see SI). A number of different biophysical assays were implemented
for each target (Table and Figure ). For
NOXA-B/MCL-1 (Figure a), a competition fluorescence anisotropy (FA) assay (Figure b and SI, Figure S2a and Table S4) was used in which titration of variant
peptides was used to displace a FITC-labeled NOXA-B sequence from
MCL-1 (Kd of the FITC-labeled tracer =
80 nM). ΔΔG values were calculated using
the difference in IC50 values. We also carried out a smaller
number of direct titrations using FITC-labeled variant peptides with
results concordant to those obtained by competition titration (see SI, Figure S2b and Table S5). Circular dichroism
(CD) spectroscopy (see SI Figure S4) established
that the effects of the amino-acid changes on helicity of the peptides
were small compared to effects arising from side-chain contacts with
the protein target.
Table 2
Kd and
IC50 Values for NOXA-B/MCL-1 (2JM6), SIMS/SUMO (2LAS), and GKAP/SHANK-PDZ (1Q3P) Variantsa
NOXA
wt
L78A
R79A
R80A
I81A
D83A
V85A
Kd (nM)b
10 ± 1
e
19 ± 3
e
1300 ± 50
4300 ± 200
e
IC50 (μM)c
1.77 ± 0.03
7000 ± 2000
8.3 ± 0.3
1.9 ± 0.3
1220 ± 70
13000 ± 8000
1200 ± 200
Amino acid numbering
taken from
PDB ID.
Fluorescence anisotropy
direct titration.
Fluorescence
anisotropy competition
assay.
Isothermal titration
calorimetry.
Not tested.
Did not bind/inhibit.
Figure 3
Experimental
alanine scanning results for PPIs (amino acid numbering
taken from PDB ID): (a) NMR structure for NOXA-B/MCL-1 (PDB ID 2JM6, model 1)), (b)
representative competition fluorescence anisotropy inhibition curves
for inhibition of the FITC–Ahx–NOXA-B/MCL-1 (50 mM Tris,
150 mM NaCl, 0.01% Triton-X, pH 7.4. using FITC–Ahx–NOXA-B
as a tracer at 25 nM concentration and MCL-1 at 200 nM concentration)
interaction using variant NOXA-B peptides, (c) NMR structure for SIMS/SUMO
(PDB ID 2LAS, model 1), (d) representative fluorescence anisotropy titration
curves for interaction of FAM labeled variant SIMS peptides with SUMO
(20 mM Tris, 150 mM NaCl, 0.01% Triton-X, pH 7.4, using 50 nM FAM–PEG–SIM
tracer), (e) crystal structure for GKAP/SHANK-PDZ (PDB ID 1Q3P), (f) representative
isothermal titration calorimetry curves for interaction of acetylated
GKAP variant peptides with SHANK-PDZ (20 mM Tris, 150 mM NaCl, pH
7.5, at 25 °C, 150 μM SHANK-PDZ).
Amino acid numbering
taken from
PDB ID.Fluorescence anisotropy
direct titration.Fluorescence
anisotropy competition
assay.Isothermal titration
calorimetry.Not tested.Did not bind/inhibit.Experimental
alanine scanning results for PPIs (amino acid numbering
taken from PDB ID): (a) NMR structure for NOXA-B/MCL-1 (PDB ID 2JM6, model 1)), (b)
representative competition fluorescence anisotropy inhibition curves
for inhibition of the FITC–Ahx–NOXA-B/MCL-1 (50 mM Tris,
150 mMNaCl, 0.01% Triton-X, pH 7.4. using FITC–Ahx–NOXA-B
as a tracer at 25 nM concentration and MCL-1 at 200 nM concentration)
interaction using variant NOXA-B peptides, (c) NMR structure for SIMS/SUMO
(PDB ID 2LAS, model 1), (d) representative fluorescence anisotropy titration
curves for interaction of FAM labeled variant SIMS peptides with SUMO
(20 mM Tris, 150 mMNaCl, 0.01% Triton-X, pH 7.4, using 50 nM FAM–PEG–SIM
tracer), (e) crystal structure for GKAP/SHANK-PDZ (PDB ID 1Q3P), (f) representative
isothermal titration calorimetry curves for interaction of acetylated
GKAP variant peptides with SHANK-PDZ (20 mM Tris, 150 mMNaCl, pH
7.5, at 25 °C, 150 μM SHANK-PDZ).For SIMS/SUMO (Figure c), a full series of FAM-labeled 13-residue SIMS peptides
was prepared with each amino acid sequentially exchanged for alanine.
Direct FA titration against SUMO (Figure d and SI, Figure
S5, Table S6) provided Kd values that
were used to determine ΔΔG based on comparison
with the wild-type (Kd of the FAM-labeled
wild-type tracer = 3.7 μM). As a secondary evaluation, several
unlabeled peptides were used to compete against the FAM-labeled wild-type
peptide for the SUMO binding site, and again the results were consistent
with those obtained from direct titration (see SI, Figure S6 and Table S7). Here, CD spectra (see SI, Figure S7) indicated that all peptides were
unstructured in solution except for the Glu2709 → Ala and Glu2715
→ Ala; both of these variants promoted an alternative conformation
with significant β-strand content. Given that such a conformation
might be expected to bind more effectively due to preorganization,
we interpret the moderate loss in potency observed for both variants
as indicating a minor role for the side chains in the PPI.Finally,
for GKAP/SHANK-PDZ (Figure e), isothermal titration calorimetry (Figure f and SI, Figure
S8 and Table S8) was used to obtain Kd values (WT Kd = 800 nM) from
the titration of variant peptides (including the C-terminal amide
and Ala2 → Gly variants) against SHANK-PDZ, while a competition
assay using a FITC-labeled GKAP peptide was elaborated to give IC50 values (see SI, Figure S9 and
Table S9). The two methods yielded similar values, while CD spectra
(see SI, Figure S10) indicated that the
6-residue peptides were unstructured in solution as expected. Average
ΔΔG values for each of the six predictive
methods and the experimental data were then compared (Figures and 5).
Figure 4
Comparison of experimental (from this work) and predicted ΔΔG values for different prediction tools: (a–f) correlation
plots (dotted line is the 1:1 correlation) for predicted versus experimental
ΔΔG for each of (a) BudeAlaScan, (b)
FoldX, (c) Flex_ddG Talaris, (d) Robetta, (e) BeAtMuSiC, (f) mCSM,
(g) Average, (h) Fraction Correct overall, and (i) Fraction Correct
by residue type. There are 6 fewer points for panel d due to Robetta
automatically defining the interface. Data for panels c and d are
plotted assuming 1 REU = 1 kcal/mol.
Figure 5
Comparison
of predictive and experimental ΔΔG values
(bars represent experimental data for each target):
(a) NOXA-B/MCL-1 (PDB model 1); (b) SIMS/SUMO (PDB model 1); (c) GKAP/SHANK-PDZ
(chains A and C). Two data points for FoldX are missing from the plot
since they are less than −5.0 kJ/mol (R79A and W2714A).
Comparison of experimental (from this work) and predicted ΔΔG values for different prediction tools: (a–f) correlation
plots (dotted line is the 1:1 correlation) for predicted versus experimental
ΔΔG for each of (a) BudeAlaScan, (b)
FoldX, (c) Flex_ddG Talaris, (d) Robetta, (e) BeAtMuSiC, (f) mCSM,
(g) Average, (h) Fraction Correct overall, and (i) Fraction Correct
by residue type. There are 6 fewer points for panel d due to Robetta
automatically defining the interface. Data for panels c and d are
plotted assuming 1 REU = 1 kcal/mol.Comparison
of predictive and experimental ΔΔG values
(bars represent experimental data for each target):
(a) NOXA-B/MCL-1 (PDB model 1); (b) SIMS/SUMO (PDB model 1); (c) GKAP/SHANK-PDZ
(chains A and C). Two data points for FoldX are missing from the plot
since they are less than −5.0 kJ/mol (R79A and W2714A).
Computational Alanine Scanning
Computational
alanine
scans were carried out for the same three target systems: NOXA-B/MCL-1
(2JM6), SIMS/SUMO
(2LAS), and
GKAP/SHANK-PDZ (1Q3P). We chose the simplest approach, namely, a single structure, in
the first round of CAS; hence only the first structure of each NMR
ensemble for 2JM6 and 2LAS was
used along with the crystal structure 1Q3P. Each structure was subjected to each
of six representative computational alanine-scanning methods. The
GKAP sequence had an alanine (Ala2), and Gly was used to evaluate
the role of this side chain. Crucially, crystallographic analyses
suggested a key role for the C-terminal carboxylate,
not just the side chains. Since BudeAlaScan is built on BUDE, it is
simple to submit the modified backbone to BUDE itself to dissect out
the roles of the side chain and carboxylate independently yielding
a ΔΔG of 19.6 kJ/mol for replacing the
carboxylate by the corresponding amide compared with the experimental
value of 11.4 kJ/mol (Table ). This feature is unique to BudeAlaScan among the in silico methods and, therefore, applicable to multiple
other unconventional changes to peptide structure (vide infra).Based on comparison of the in silico and
experimental data, in each case, most of the computational methods
correctly identified the majority of experimentally defined hot residues
(Figures and 5), but they differed in the residues predicted incorrectly.
For example, mCSM predicted Thr4 in GKAP as a hot-residue particularly
well and with good accuracy, but wrongly predicted Arg5 as a hot-residue
in GKAP. On the other hand, BudeAlaScan over-predicted the contribution of Arg79 in NOXA-B,
but under-predicted Asp83. Encouragingly, by taking the average ΔΔG of all the methods for each residue, all but two hot-spot
residues were correctly predicted giving a fraction correct of 0.92
(Table ). One residue
(Ile2711) is predicted rather well, but the experimental value is
close to the selection criterion of 4.184 kJ/mol and the predicted
and experimental values straddle this cutoff. The other (Val2713)
is solvent exposed and hydrophobic, favors beta structure, and is
poorly predicted by all methods; hence it may perform a structural
role in supporting the extended β-strand conformation required
for binding, rather than contributing to the PPI per se.
Table 3
Pearson Correlation Coefficient between
Experimental and Predicted ΔΔG Values
for Mutations to Alanine in This Work
BUDE
FoldX
Flex_ddG
Talaris2014
Robetta
BeAtMuSiC
mCSM
Average
Pearson R
0.58
0.52
0.56
0.71
0.71
0.36
0.76
fraction correct
0.80
0.68
0.72
0.88
0.84
0.68
0.92
The Use of Multiple Structures
Improves Predictive Alanine Scanning
and Includes the Role of Dynamic Conformational Variation
A single protein structure, for example, that determined by X-ray
crystallography, represents a single point on the global potential-energy
landscape, but it is well established that proteins in their native
state occupy a family of conformations clustered around such a local
minimum. In other words, proteins are dynamic and show thermally induced
fluctuations in their native folded states. This dynamic behavior
makes accurate predictions of the free energy difference between two
variants difficult to achieve with a single pair of structures. A
number of the tools we evaluate could be used for this purpose, however,
here as a proof-of-concept we used BudeAlaScan to probe these fluctuations.
The BUDE force field goes some way to mitigating the effects of dynamics
by using a soft-core potential to ameliorate geometrical inaccuracies
in a structure. BudeAlaScan also employs multiple rotamers for side
chains that can carry charge (DERKH) to account for entropic loss
on freezing such side chains in interfacial salt bridges. However,
we anticipated that greater accuracy would be achieved using an ensemble
of conformations that include backbone flexibility. There are two
ways to do this: by using (a) structures from solution phase NMR ensembles
or (b) snapshots from molecular dynamics (MD) simulations. Although
the two methods typically report on different time scales, the purpose
here is not to compare the data from NMR with MD, rather to highlight
that either can be useful in establishing where potential hot-residues
are persistent. Crucially, such an approach begins to acknowledge
the dynamic and conformationally varied nature of PPIs. The calculation
of interaction energies with BUDE is very fast; hence BudeAlaScan
rapidly processes ensembles of conformations. For our analysis, each
structure was subjected to a 1 μs MD simulation, and the BudeAlaScan
calculation was carried out on each of 100 snapshots (every 10 ns
of a 1 μs trajectory). In addition, BudeAlaScan was performed
on each of the submitted models in the NMR ensembles for the two PPIs
with NMR structures. In general, the predictions improved through
the use of this approach for all targets (Figure ).
Figure 6
Comparison of BudeAlaScan predicted and experimental
ΔΔG values (gray bars). Green, average
and standard deviation
of 100 structures from 1 μs MD simulations; blue average and
standard deviation of 20 (2JM6) and 10 (2LAS) NMR structures; (a) NOXA-B/MCL-1; (b) SIMS/SUMO;
(c) GKAP/SHANK-PDZ.
Comparison of BudeAlaScan predicted and experimental
ΔΔG values (gray bars). Green, average
and standard deviation
of 100 structures from 1 μs MD simulations; blue average and
standard deviation of 20 (2JM6) and 10 (2LAS) NMR structures; (a) NOXA-B/MCL-1; (b) SIMS/SUMO;
(c) GKAP/SHANK-PDZ.These data also allow
the average strength and persistence of interactions
to be assessed as a standard deviation: residues that are tightly
packed into the interface with little conformational freedom tended
to show low standard deviations. This behavior is exemplified by Leu78,
Ile81, and Val85 for the NOXA-B/MCL-1 interaction (Figure a). In contrast, the surface-exposed
residues Arg79 and Asp83 for the NOXA-B/MCL-1 interaction that transiently
form salt bridges exhibit higher standard deviations. SIMS/SUMO shows
similar behavior (Figure b). The smaller interface in GKAP/SHANK (Figure c) is much more mobile and
shows large standard deviations throughout the sequence. In the absence
of a corresponding NMR structure, we are unable to say if this is
a consequence of excessive mobility in the MD simulation, although
the poorer match with experimental alanine scanning results suggests
this is the case. The match between prediction and experiment is generally
somewhat improved by using ensembles of structures. Hence, the Pearson
correlation coefficient for NOXA-B/MCL-1 and SIMS/SUMO between the
BudeAlaScan data and experiment is 0.58 for single structures (Figure a,b) and 0.66 and
0.62 for the MD and NMR ensembles, respectively (Figure a,b).
Use of Multiple Tools to
Predict Hot-Residues at a Topographically
Distinct PPI
Having experimentally validated the approach
with three model systems, we applied the workflow to a novel protein–protein
interface with a different topography. For this, we selected a recently
described PPI between an Affimer and BCL-xL,[70] which, like MCL-1, is an antiapoptotic member
of the BCL-2 family of apoptotic regulators (Figure a).[71] In this
structure, two nine-residue loops from the Affimer interact with the
BH3 helix binding cleft on BCL-xL.
Figure 7
Application of predictive
workflow to a topographically distinct
protein–protein interaction. (a) Affimer/BCL-xL cocrystal
structure (PDB ID 6HJL), highlighting residues Trp41, Trp44, and Trp76 (orange) identified
for experimental characterization (Affimer in cyan, BCL-xL in green).
(b) Predicted ΔΔG values calculated using
each predictive tool together with average values for alanine variants
of residues within the two 9-residue variable loops of the Affimer
scaffold; gray boxes show ΔΔG limits
estimated from the experimental W to A changes; residues 41 and 44
≥ 15 kJ/mol, residue 76 ≤ 1 kJ/mol, (3 points for Foldx,
D43A, E46A, and W76A, are missing from the plot as their values are
less than −5 kJ/mol). (c) Competition fluorescence anisotropy
inhibition curves for inhibition of the FITC–BID/BCL-xL interaction (20 mM Tris, 150 mM NaCl, 0.01% Triton X-100,
pH 7.4, using FITC–BID as a tracer at 25 nM and BCL-xL at 100 nM concentration) using variant Affimer sequences.
Application of predictive
workflow to a topographically distinct
protein–protein interaction. (a) Affimer/BCL-xL cocrystal
structure (PDB ID 6HJL), highlighting residues Trp41, Trp44, and Trp76 (orange) identified
for experimental characterization (Affimer in cyan, BCL-xL in green).
(b) Predicted ΔΔG values calculated using
each predictive tool together with average values for alanine variants
of residues within the two 9-residue variable loops of the Affimer
scaffold; gray boxes show ΔΔG limits
estimated from the experimental W to A changes; residues 41 and 44
≥ 15 kJ/mol, residue 76 ≤ 1 kJ/mol, (3 points for Foldx,
D43A, E46A, and W76A, are missing from the plot as their values are
less than −5 kJ/mol). (c) Competition fluorescence anisotropy
inhibition curves for inhibition of the FITC–BID/BCL-xL interaction (20 mM Tris, 150 mMNaCl, 0.01% Triton X-100,
pH 7.4, using FITC–BID as a tracer at 25 nM and BCL-xL at 100 nM concentration) using variant Affimer sequences.Predictions for each loop were performed using
each of the tools,
and the average ΔΔG was determined (Figure b). We then selected
three residues to test experimentally: two predicted to be hot-residues
(Trp41 and Trp44), and one predicted as not being a hot-residue (Trp76)
according to the average values of ΔΔG across the six methods. The parent and variant Affimer sequences
were cloned, expressed, and purified and then tested in a competitive
FA assay, that is, by displacement of FITC–BID fluorescently
labeled peptide from BCL-xL. These experiments fully confirmed
the predictions (Figure c and SI, Table S9). Crucially, at least
one of the individual prediction tools would not have predicted these
residues accurately, further validating the use of comparative analyses.
Conclusions
We outline effective approaches for rapid predictive in
silico examination of PPI interfaces. This employs multiple
computational tools, allowing comparative analyses to predict hot-residues
accurately. In the broader context of providing improved and accessible
methods for in silico Ala scanning, we introduce
BudeAlaScan. This is a versatile tool for the rapid analysis of PPIs
using X-ray structures, NMR ensembles, or snapshots from MD trajectories
as input files. Exploiting six in silico prediction
tools illustrates that use of multiple methods leads to improved accuracy
in the prediction of hot-residues. We demonstrate this further by
comparing the predictions with experimental studies for three different
PPIs. We note that the predicted ΔΔG values
for single-alanine variants vary between the in silico methods and also change through structural ensembles analyzed with
any single method. Thus, in terms of informing on the role of individual
amino acids in PPIs and developing starting points for ligand design,
we advocate using a combination of methods to predict hot-residues
and hot-spots. Additional use of NMR ensembles or snapshots from MD
trajectories allows an assessment of the persistence of noncovalent
interactions. Taken together this allows prioritization of potential
hot-residues to reduce the number of predictions that require experimental
validation and provides greater confidence in those that should be
selected for mimicry in ligand based design.
Methods
Computational
Methods and MD
SKEMPI Data Set and Predictions
The set of 748 single
mutations to alanine from the SKEMPI database with associated binding
data is that described by Barlow et al.[28] and provided in that paper’s Supporting Information (jp7b11367_si_002.xlsx).
FoldX
The program FoldX was downloaded from http://FoldXsuite.crg.eu and
used in AlaScan mode with default parameters.
Rosetta
Flex_ddG
Data were taken from the Supporting Information
of previously published work[28] for benchmarking
against SKEMPI. Mutations specific to this work were calculated using
the flex_ddG method in Rosetta Commons release and the flex_ddG.xml
script provided in the SI of Barlow et al.[28] The Talaris 2014 force field was used, and each result is the average
of 50 repeats.
Robetta
Mutations specific to this
work were also calculated
with the Robetta server, http://robetta.bakerlab.org or via the Rosetta Commons release.
BudeAlaScan
BudeAlaScan
is a command-line python application
for computational alanine scanning. It employs ISAMBARD[5] for structure manipulation and a customized version
of the Bristol University Docking Engine (BUDE)[6] for energy calculations. The program was run in scan mode
with default parameters. The application is available via the BAlaS
server: http://coiledcoils.chm.bris.ac.uk/balas
BeAtMuSiC
Data used in the BeAtMuSiC publication were
taken from the MCSM Web site, http://biosig.unimelb.edu.au/mcsm/data, for benchmarking against SKEMPI. Mutations specific to this work
were calculated using the BeAtMuSiC server: http://babylone.ulb.ac.be/beatmusic.
mCSM
The mCSM data were calculated in-house by querying
the server http://biosig.unimelb.edu.au/mcsm/protein_protein with a python
script.
Molecular Dynamics Simulations
All simulations were
performed using the GROMACS 5.1.4 suite and the following general
protocols. Structures from the protein database (SIMS/SUMO 2LAS; GKAP/SHANK-PDZ 1Q3P; NOXA-B/MCL-1 2JM6) were processed
with the GROMACS tool chain. The utility pdb2gmx was used to add hydrogen
atoms consistent with pH 7 and virtual-site hydrogens for mobile groups
and parametrize the protein with the amber99SB-ildn force field. The
protein was placed in an orthorhombic box 2 nm larger than the protein
in each dimension and filled with TIP3P water containing 0.15 M sodium
chloride ions to give a charge-neutral system overall. After 10000
steps of steepest descent minimization, molecular dynamics was initiated
with random velocities while restraining the protein backbone to its
original position with a force constant of 1000 kJ/nm for 0.2 ns.
Simulations were developed for a further 1 μs without backbone
position restraints under periodic boundary conditions. The Particle
Mesh Ewald’s method was used for long-range electrostatic interactions,
while short-range Coulombic and van der Waals energies were truncated
at 1.2 nm. The temperature was maintained at 310 K using the v-rescale
method and the pressure at 1 bar with the Berendsen barostat. The
use of virtual-site hydrogens allowed a 5 fs time step for the leapfrog
integrator. Bond constraints were implemented with the LINCS method,
and SETTLE was used for waters. Trajectories were processed and analyzed
with the GROMACS tools and visualized with VMD 1.9.3.
Expression
and Purification of Proteins
Proteins were
expressed using standard protocols and characterized as described
in the Supporting Information.
Peptide Synthesis
and Purification
Peptides were prepared
using microwave assisted solid-phase FMoc based synthesis using a
CEM Liberty Blue peptide synthesizer and purified by reverse phase
HPLC. Full details and characterization are available in the Supporting Information.
Isothermal Titration Calorimetry
(ITC)
ITC experiments
were carried out using Microcal ITC200i instrument (Malvern) at 25
°C in 20 mM Tris, 150 mMNaCl, pH 7.5, buffer. SHANK-PDZ was
dialyzed against the buffer prior to experiment; lyophilized peptides
were dissolved in the same buffer. SHANK-PDZ (150 μM) was present
in the cell and titrated with 1.4–2 mM peptide solutions loaded
into the syringe using 2 μL injections with 120 s spacing between
the injections for 20 injections. Heats of peptide dilution were subtracted
from each measurement raw data. Data was analyzed using Microcal Origin
8 and fitted to a one binding site model.
Fluorescence Anisotropy
Fluorescence anisotropy assays
were performed in 384-well plates (Greiner Bio-one). Each experiment
was run in triplicate, and the fluorescence anisotropy was measured
using a PerkinElmer EnVisionTM 2103 MultiLabel plate reader with excitation
at 480 nm (30 nm bandwidth), polarized dichroic mirror at 505 nm,
and emission at 535 nm (40 nm bandwidth, S and P polarized) for FAM
and FITC labeled peptides. The excitation and emission wavelengths
for BODIPY labeled BAK peptide were set to 531 and 595 nm, respectively.
The excitation and emission wavelengths for FITC labeled BID peptide
were set to 490 and 535 nm, respectively.
Direct Binding Assays
Fluorescence anisotropy direct
titration assays were performed with protein concentration diluted
over 16–24 points using 1/2 dilutions. Twenty microliters of
buffer was first added to each well. Twenty microliters of a solution
of protein was added to the first column. The solution was well mixed,
and 20 μL was taken out and added to the next column and so
on. This operation consists of serial dilution of the protein across
the plate. Finally, 20 μL of tracer was added to the wells.
For control wells, the tracer peptide was replaced with an identical
volume of assay buffer, and plates were read after 1 h.
Competition
Binding Assays
FA competition assays were
performed in 384 well plates with the concentration of variant peptide
competitor typically from 10 to 1500 μM, diluted over 16–24
points in 1/2 regime with fixed protein and tracer concentrations.
For control wells, the tracer peptide was replaced with an identical
volume of assay buffer. The total volume in each well was 60 μL.
Plates were read after 1 h (and 16 h for BCL-xL assays)
of incubation at RT.Additional details on buffer composition,
reagent concentration for individual PPIs, and data analyses are given
in the Supporting Information.
Authors: Richard T Bradshaw; Bhavesh H Patel; Edward W Tate; Robin J Leatherbarrow; Ian R Gould Journal: Protein Eng Des Sel Date: 2010-07-23 Impact factor: 1.650
Authors: Isabelle Ray-Coquard; Jean-Yves Blay; Antoine Italiano; Axel Le Cesne; Nicolas Penel; Jianguo Zhi; Florian Heil; Ruediger Rueger; Bradford Graves; Meichun Ding; David Geho; Steven A Middleton; Lyubomir T Vassilev; Gwen L Nichols; Binh Nguyen Bui Journal: Lancet Oncol Date: 2012-10-17 Impact factor: 41.316
Authors: Fruzsina Hóbor; Zsófia Hegedüs; Amaurys Avila Ibarra; Vencel L Petrovicz; Gail J Bartlett; Richard B Sessions; Andrew J Wilson; Thomas A Edwards Journal: RSC Chem Biol Date: 2022-04-11
Authors: A Sofia F Oliveira; Amaurys Avila Ibarra; Isabel Bermudez; Lorenzo Casalino; Zied Gaieb; Deborah K Shoemark; Timothy Gallagher; Richard B Sessions; Rommie E Amaro; Adrian J Mulholland Journal: Biophys J Date: 2021-02-18 Impact factor: 4.033
Authors: Sergio Celis; Fruzsina Hobor; Thomas James; Gail J Bartlett; Amaurys A Ibarra; Deborah K Shoemark; Zsófia Hegedüs; Kristina Hetherington; Derek N Woolfson; Richard B Sessions; Thomas A Edwards; David M Andrews; Adam Nelson; Andrew J Wilson Journal: Chem Sci Date: 2021-03-02 Impact factor: 9.825
Authors: Zsófia Hegedüs; Fruzsina Hóbor; Deborah K Shoemark; Sergio Celis; Lu-Yun Lian; Chi H Trinh; Richard B Sessions; Thomas A Edwards; Andrew J Wilson Journal: Chem Sci Date: 2021-01-06 Impact factor: 9.825