Literature DB >> 31525028

Predicting and Experimentally Validating Hot-Spot Residues at Protein-Protein Interfaces.

Amaurys A Ibarra¹, Gail J Bartlett², Zsöfia Hegedüs^3,4, Som Dutt^3,4, Fruzsina Hobor^4,5, Katherine A Horner^3,4, Kristina Hetherington^3,4, Kirstin Spence^4,5, Adam Nelson^3,4, Thomas A Edwards^4,5, Derek N Woolfson^1,2,6, Richard B Sessions^1,6, Andrew J Wilson^3,4.

Abstract

Protein-protein interactions (PPIs) are vital to all biological processes. These interactions are often dynamic, sometimes transient, typically occur over large topographically shallow protein surfaces, and can exhibit a broad range of affinities. Considerable progress has been made in determining PPI structures. However, given the above properties, understanding the key determinants of their thermodynamic stability remains a challenge in chemical biology. An improved ability to identify and engineer PPIs would advance understanding of biological mechanisms and mutant phenotypes and also provide a firmer foundation for inhibitor design. In silico prediction of PPI hot-spot amino acids using computational alanine scanning (CAS) offers a rapid approach for predicting key residues that drive protein-protein association. This can be applied to all known PPI structures; however there is a trade-off between throughput and accuracy. Here we describe a comparative analysis of multiple CAS methods, which highlights effective approaches to improve the accuracy of predicting hot-spot residues. Alongside this, we introduce a new method, BUDE Alanine Scanning, which can be applied to single structures from crystallography and to structural ensembles from NMR or molecular dynamics data. The comparative analyses facilitate accurate prediction of hot-spots that we validate experimentally with three diverse targets: NOXA-B/MCL-1 (an α-helix-mediated PPI), SIMS/SUMO, and GKAP/SHANK-PDZ (both β-strand-mediated interactions). Finally, the approach is applied to the accurate prediction of hot-spot residues at a topographically novel Affimer/BCL-xL protein-protein interface.

Entities: Chemical Disease Gene Mutation Species

Mesh：

Substances：

Year: 2019 PMID： 31525028 PMCID： PMC6804253 DOI： 10.1021/acschembio.9b00560

Source DB: PubMed Journal: ACS Chem Biol ISSN： 1554-8929 Impact factor: 5.100

Protein–protein interactions (PPIs) play a central regulatory role in the majority of cellular signaling processes. Aberrant PPIs lead to disease and represent intervention points for targeted therapeutics, motivating efforts to discover small-molecule or peptide/peptidomimetic ligands.[1−7] PPIs generally occur between significantly larger and topographically shallower surfaces than the lock-and-key-like pockets found in conventional drug targets, for example, enzymes and receptors, such as G-protein coupled receptors (GPCRs), making PPIs a challenge for ligand discovery.[2,5,8−12] Significant recent successes, for example, ABT-199[13] and RG7112,[14] have countered the perception that PPIs are undruggable. Nonetheless, it is not clear which PPIs will be therapeutically tractable and relevant to disease,[15−17] a problem that may be explored using chemical probes.[18] In turn, this prompts the question: which PPIs might feasibly be modulated with small molecules or peptides/peptidomimetics? Though significant progress has been made in structurally characterizing PPIs, in order to understand the molecular basis of recognition it is also necessary to establish the determinants of favorable interaction, that is, which amino-acid residues contribute most to binding. Such information is essential in understanding biological mechanism, rationalizing dynamic conformational transitions, interpreting the functional role of mutation, and identifying starting points for inhibitor design. Cunningham and Wells first used experimental alanine scanning to map PPIs.[19] In this approach, variant proteins or peptides with Xaa-to-Ala mutations are used to evaluate the role of each amino acid side chain in a PPI. Assuming the absence of any changes in the backbone conformation of the protein, the changes indicate the contribution that the side chain makes to the binding free energy. Such studies have established that the thermodynamic driving force for PPIs can be dominated by a few key amino acid residues, termed hot-spots[20] or hot regions.[21] This experimental approach has been widely applied,[22] but it is time-consuming and difficult to implement across entire protein–protein interactomes.[23] Moreover, assay-dependent variations in binding affinities make comparative analyses of the literature problematic.[24] In silico methods can be implemented to predict hot-spot residues from extant protein structures,[24−28] and efforts have been made to classify PPIs in the Protein Data Bank (PDB)[29] by secondary or tertiary structure at the interfaces and to predict hot-spot residues en masse.[30−32] For computational alanine scanning (CAS), the binding free energies for the native and variant PPIs are calculated and the difference in these two values (ΔΔG) is used to evaluate the participation of the variant amino acid in the interface. Several approaches to CAS have been based on classical molecular dynamics simulations to capture the free energy change on amino acid variation. These methods include: Free Energy Perturbation,[33−35] Thermodynamic Integration,[34−36] the popular hybrid MM/PBSA method[37] that has recently been improved by inclusion of interaction-entropy calculation,[38] and by localizing the simulation volume around the residue in question.[39] The Flex_ddG method in Rosetta[28] uses a combination of sophisticated Monte Carlo sampling, minimization, and specialized force fields to provide one of the most accurate current methods for CAS. We compare our new fast approach, BudeAlaScan, with this method and other rapid methods that use a variety of scoring functions, which fall broadly into the following categories: physicochemical energy functions or statistical functions. Examples currently accessible via web servers include KFC,[40] HotPoint,[41] DrugScore-PPI,[42] and PredHS.[43] We chose FoldX[44,45] as one of the first methods for rapid computational alanine scanning using a physicochemical force field, Rosetta Flex_ddG[28] as the most recent tool exploiting physicochemical energy terms, and BeAtMuSiC[46] and mCSM[47] as tools employing statistical potentials and machine learning. We have avoided CAS methods that rely on molecular dynamics simulations for conformational sampling as being slow for high throughput calculations. Typically, fast CAS methods operate on single structures and thus do not account for protein dynamics. In some cases, the methods can be, and have been, performed on multiple structures,[25,48−50] for example, from NMR ensembles or MD trajectories. This is an important consideration given that many PPIs exploit intrinsically disordered regions (IDRs)[51] and transient or dynamic noncovalent contacts can play important mechanistic roles in recognition.[52,53] Here we perform a comparative analysis of common fast tools for CAS enabling informed consensus selection of hot-spot residues. This is benchmarked against an extensive literature data set and further validated through experimental analyses of four diverse PPIs, for which further analyses of NMR ensembles or molecular dynamics trajectories provides insight on dynamics to identify bona fide hot-spot residues. FoldX is based on empirical potentials built from optimized combinations of various physical energy terms.[44,45] The original Robetta method[54] is a physical energy function parametrized on a monomeric protein data set (ProTherm).[55,56] The Rosetta method used herein is Flex_ddG, based on the ΔΔG monomer method, using both the current general Rosetta force field,[28] Ref2015, and the specialized force field Talaris2014.[28,57−59] BeAtMuSiC[46] is a coarse-grained predictor of changes in binding free energy induced by point mutations and uses a set of statistical potentials derived from known protein structures. The statistical potentials are trained on data from ProTherm[56] and validated using the SKEMPI database.[60] SKEMPI is a database of 3047 binding free energy changes upon mutation collated from the literature, for PPIs with known structure. In this study, we used all the ΔΔG data for the 748 single mutations to alanine present in the SKEMPI database that were compiled, ratified, and published recently.[26] The method mCSM[47] describes the protein environment upon mutation using signature vector patterns for each amino acid and was trained and tested on SKEMPI.[60] Finally, BudeAlaScan, which we introduce here, is an empirical free-energy approach adapted from a small-molecule-docking algorithm, BUDE,[61] using the standard force field (version heavy_by_atom_2016.bhff) provided with the current BUDE release 1.2.9. BudeAlaScan is a command-line application where the user assigns each protein of the complex as the receptor and the ligand. The application varies the ligand residues for alanine and reports the interfacial ΔΔG as ΔGALA – ΔGWT (hence a hot residue will have a positive value). For residues in the wild-type complex that may carry a charge (types DERKH) a rotamer library is used to estimate and account for the configurational entropy loss on forming an interfacial salt bridge. We will make a web-based graphical user interface available for BudeAlaScan in due course. All our selected tools for this comparison could be used to perform calculations on the SKEMPI data set or had been trained on this data set, further supporting our choice of in silico tools. To compare the predictive capability of the five methods further, we perform detailed experimental analyses from the predictions on three diverse PPI targets: (i) the NOXA-B/MCL-1[62] interaction, an important current target in oncology;[63] (ii) the SIMS/SUMO[64] interaction, representative of a number of regulatory PPIs that depend on the SUMOylation post-translational modification;[65] (iii) the GKAP/SHANK-PDZ interaction,[66] which performs a scaffolding function at synaptic junctions.[67] Example i is an α-helix-mediated PPI, while examples ii and iii are both β-strand-mediated interactions. Thus, these provide diversity in secondary structure interfaces and biological functions. Comparison of predicted and experimental data reveals that averaging the ΔΔG values for each residue across the five methods leads to more accurate prediction than any single method alone. Finally, we present further support for the implementation of this approach through prediction of hot residues for a topographically novel Affimer/BCL-xL PPI in which two loops project into a hydrophobic cleft. This allowed selection of a minimal number of residues that were experimentally validated.

Results and Discussion

Scope of the Methods

The CAS methods used here process structure types differently, that is, single X-ray crystal structures, NMR ensembles, MD trajectories. Figure illustrates the capabilities of each in silico prediction tool and how structures are processed (see also Table S1 and Computational Methods in Supporting Information (SI)). BudeAlaScan is the only tool that allows processing of structure ensembles and scanning of multiple mutations to alanine at a time (i.e., hot-residue clusters).

Figure 1

Schematic overview of approach for experimentally validated predictive alanine scanning using different methods. Different structural starting points can be used (single structure, black; NMR ensemble, blue; MD ensemble, orange) together with different in silico hot-spot residue prediction tools. Users can navigate through this workflow in different ways according to their requirements.

Different in Silico Prediction Tools Function in Different Ways and Provide Varying Results

First, we compared the results of alanine scanning on the SKEMPI data set for each of the tools, focusing on PPIs with natural amino acids and without post-translational modifications. Data for FoldX, mCSM, BeAtMuSiC, and BudeAlaScan were calculated in-house, while data for the Rosetta ddG methods were taken from the literature.[28,46] This analysis took 8 min for FoldX, 5 min for BudeAlaScan, and 1–2 h per amino acid variation for Rosetta Flex_ddG on a single core of a linux X86_64 workstation running at 3.5 GHz. The Pearson correlation coefficient between the computed and experimental data in Figure are given in Table .

Figure 2

Table 1

Pearson Correlation Coefficient between Experimental and Predicted ΔΔG Values for Mutations to Alanine in the SKEMPI Dataset

			Flex_ddG
	BUDE	FoldX	Talaris2014	Ref2015	BeAtMuSiC	mCSM	Average
Pearson R	0.50	0.28	0.51	0.49	0.47	0.66	0.60
fraction correct	0.76	0.51	0.76	0.74	0.72	0.77	0.78

Comparison of experimental (SKEMPI) and predicted ΔΔG values for different prediction tools: (a–f) correlation plots (solid line is the 1:1 correlation) for predicted versus experimental ΔΔG for each of (a) BudeAlaScan, (b) FoldX, (c) Flex_ddG Talaris, (d) Flex_ddG Ref2015, (e) BeAtMuSiC, (f) mCSM, (g) Average, (h) Fraction Correct overall, and (i) Fraction Correct by residue type. Data for panels c and d are taken from Barlow et al.[28] and plotted assuming 1 REU = 1 kcal/mol. The predictive performance of BudeAlaScan is comparable with Rosetta Flex_ddG and BeAtMuSiC. mCSM outperforms the physicochemical based methods, but it should be emphasized that this machine-learning approach is trained on the SKEMPI data it is fitting. Classifying the SKEMPI data as being positive (a hot residue) ΔΔG ≥ 4.184 kJ/mol, neutral (little effect) 4.184 kJ/mol < ΔΔG > −4.184 kJ/mol, or negative (affinity enhancing) ΔΔG ≤ −4.184 kJ/mol allows the determination of the fraction of correct predictions (FC). These results vary across the series (FC 0.51–0.78, see Figure h and Tables S2 and S3), although crucially different residues are correctly or incorrectly predicted from tool to tool. Calculating the fraction correct within each of these three categories (positive, neutral, and negative, Supporting Information, see Figure S1) reveals that the accuracy in correctly predicting a hot-residue decreases for all methods (FC ≈ 0.6) with FoldX performing least well. Conversely, the accuracy in predicting neutral residues increases for all methods (FC ≈ 0.8) apart from FoldX. There are only four negative examples in the SKEMPI set where mutation to alanine improves binding, and only FoldX predicts 2 of these, albeit in the context of many false negative predictions. BudeAlaScan uses a set of side-chain rotamers to estimate the configurational entropy loss on forming interactions involving DERKH residues and a fixed backbone. FoldX performs a local side-chain relaxation and Rosetta Flex_ddG allows local backbone and side-chain sampling. When operating on a single structure, none of the methods has sufficient conformational sampling, consideration of dynamics, or explicit solvation to provide the accuracy that might be achieved through more computationally intensive methods.[26,28,40−47] PPI’s may possess interfaces where the binding energy is evenly distributed across the surface or concentrated in a few hot residues or regions. Prediction of the most influential residues in the former case is likely to be hampered by the accumulation of systematic errors. Thus, when using a single method, it is necessary to exercise caution in prioritizing one predicted residue over another based on the magnitude of its predicted ΔΔG.

Comparison of Computational and Experimental Alanine Scanning for Three Representative PPIs

While the SKEMPI data set provided an extensive set of data from the literature, as noted previously this is drawn from multiple different studies exploiting different assay methods, motivating our choice to supplement our comparative analysis with high-quality in-house experimental data. For comparison between prediction and experiment, we focused on protein/peptide interactions for several reasons: (1) peptide interacting motifs (PIMS)[68] play a crucial role in many PPIs and are known to be effective starting points[69] for chemical probe discovery campaigns; (2) a significant proportion of structures in the PDB are protein/peptide interactions with the peptide excised from the full-length protein; (3) ease of preparation/purification and standard N-terminal fluorescent labeling support quantification and a consistent assay format.

Experimental Alanine Scanning

Experimental alanine scanning was performed for structural representatives of three target systems: NOXA-B/MCL-1 (2JM6), SIMS/SUMO (2LAS), and GKAP/SHANK-PDZ (1Q3P). Each of the proteins was expressed in E. coli, and a series of labeled and unlabeled variant peptides was synthesized and purified (see SI). A number of different biophysical assays were implemented for each target (Table and Figure ). For NOXA-B/MCL-1 (Figure a), a competition fluorescence anisotropy (FA) assay (Figure b and SI, Figure S2a and Table S4) was used in which titration of variant peptides was used to displace a FITC-labeled NOXA-B sequence from MCL-1 (Kd of the FITC-labeled tracer = 80 nM). ΔΔG values were calculated using the difference in IC50 values. We also carried out a smaller number of direct titrations using FITC-labeled variant peptides with results concordant to those obtained by competition titration (see SI, Figure S2b and Table S5). Circular dichroism (CD) spectroscopy (see SI Figure S4) established that the effects of the amino-acid changes on helicity of the peptides were small compared to effects arising from side-chain contacts with the protein target.

Table 2

Kd and IC50 Values for NOXA-B/MCL-1 (2JM6), SIMS/SUMO (2LAS), and GKAP/SHANK-PDZ (1Q3P) Variantsa

NOXA	wt	L78A	R79A	R80A	I81A	D83A	V85A
K_d (nM)b	10 ± 1	e	19 ± 3	e	1300 ± 50	4300 ± 200	e
IC₅₀ (μM)c	1.77 ± 0.03	7000 ± 2000	8.3 ± 0.3	1.9 ± 0.3	1220 ± 70	13000 ± 8000	1200 ± 200

Amino acid numbering taken from PDB ID.

Fluorescence anisotropy direct titration.

Fluorescence anisotropy competition assay.

Isothermal titration calorimetry.

Not tested.

Did not bind/inhibit.

Figure 3

Experimental alanine scanning results for PPIs (amino acid numbering taken from PDB ID): (a) NMR structure for NOXA-B/MCL-1 (PDB ID 2JM6, model 1)), (b) representative competition fluorescence anisotropy inhibition curves for inhibition of the FITC–Ahx–NOXA-B/MCL-1 (50 mM Tris, 150 mM NaCl, 0.01% Triton-X, pH 7.4. using FITC–Ahx–NOXA-B as a tracer at 25 nM concentration and MCL-1 at 200 nM concentration) interaction using variant NOXA-B peptides, (c) NMR structure for SIMS/SUMO (PDB ID 2LAS, model 1), (d) representative fluorescence anisotropy titration curves for interaction of FAM labeled variant SIMS peptides with SUMO (20 mM Tris, 150 mM NaCl, 0.01% Triton-X, pH 7.4, using 50 nM FAM–PEG–SIM tracer), (e) crystal structure for GKAP/SHANK-PDZ (PDB ID 1Q3P), (f) representative isothermal titration calorimetry curves for interaction of acetylated GKAP variant peptides with SHANK-PDZ (20 mM Tris, 150 mM NaCl, pH 7.5, at 25 °C, 150 μM SHANK-PDZ).

Amino acid numbering taken from PDB ID. Fluorescence anisotropy direct titration. Fluorescence anisotropy competition assay. Isothermal titration calorimetry. Not tested. Did not bind/inhibit. Experimental alanine scanning results for PPIs (amino acid numbering taken from PDB ID): (a) NMR structure for NOXA-B/MCL-1 (PDB ID 2JM6, model 1)), (b) representative competition fluorescence anisotropy inhibition curves for inhibition of the FITC–Ahx–NOXA-B/MCL-1 (50 mM Tris, 150 mM NaCl, 0.01% Triton-X, pH 7.4. using FITC–Ahx–NOXA-B as a tracer at 25 nM concentration and MCL-1 at 200 nM concentration) interaction using variant NOXA-B peptides, (c) NMR structure for SIMS/SUMO (PDB ID 2LAS, model 1), (d) representative fluorescence anisotropy titration curves for interaction of FAM labeled variant SIMS peptides with SUMO (20 mM Tris, 150 mM NaCl, 0.01% Triton-X, pH 7.4, using 50 nM FAM–PEG–SIM tracer), (e) crystal structure for GKAP/SHANK-PDZ (PDB ID 1Q3P), (f) representative isothermal titration calorimetry curves for interaction of acetylated GKAP variant peptides with SHANK-PDZ (20 mM Tris, 150 mM NaCl, pH 7.5, at 25 °C, 150 μM SHANK-PDZ). For SIMS/SUMO (Figure c), a full series of FAM-labeled 13-residue SIMS peptides was prepared with each amino acid sequentially exchanged for alanine. Direct FA titration against SUMO (Figure d and SI, Figure S5, Table S6) provided Kd values that were used to determine ΔΔG based on comparison with the wild-type (Kd of the FAM-labeled wild-type tracer = 3.7 μM). As a secondary evaluation, several unlabeled peptides were used to compete against the FAM-labeled wild-type peptide for the SUMO binding site, and again the results were consistent with those obtained from direct titration (see SI, Figure S6 and Table S7). Here, CD spectra (see SI, Figure S7) indicated that all peptides were unstructured in solution except for the Glu2709 → Ala and Glu2715 → Ala; both of these variants promoted an alternative conformation with significant β-strand content. Given that such a conformation might be expected to bind more effectively due to preorganization, we interpret the moderate loss in potency observed for both variants as indicating a minor role for the side chains in the PPI. Finally, for GKAP/SHANK-PDZ (Figure e), isothermal titration calorimetry (Figure f and SI, Figure S8 and Table S8) was used to obtain Kd values (WT Kd = 800 nM) from the titration of variant peptides (including the C-terminal amide and Ala2 → Gly variants) against SHANK-PDZ, while a competition assay using a FITC-labeled GKAP peptide was elaborated to give IC50 values (see SI, Figure S9 and Table S9). The two methods yielded similar values, while CD spectra (see SI, Figure S10) indicated that the 6-residue peptides were unstructured in solution as expected. Average ΔΔG values for each of the six predictive methods and the experimental data were then compared (Figures and 5).

Figure 4

Figure 5

Comparison of predictive and experimental ΔΔG values (bars represent experimental data for each target): (a) NOXA-B/MCL-1 (PDB model 1); (b) SIMS/SUMO (PDB model 1); (c) GKAP/SHANK-PDZ (chains A and C). Two data points for FoldX are missing from the plot since they are less than −5.0 kJ/mol (R79A and W2714A).

Comparison of experimental (from this work) and predicted ΔΔG values for different prediction tools: (a–f) correlation plots (dotted line is the 1:1 correlation) for predicted versus experimental ΔΔG for each of (a) BudeAlaScan, (b) FoldX, (c) Flex_ddG Talaris, (d) Robetta, (e) BeAtMuSiC, (f) mCSM, (g) Average, (h) Fraction Correct overall, and (i) Fraction Correct by residue type. There are 6 fewer points for panel d due to Robetta automatically defining the interface. Data for panels c and d are plotted assuming 1 REU = 1 kcal/mol. Comparison of predictive and experimental ΔΔG values (bars represent experimental data for each target): (a) NOXA-B/MCL-1 (PDB model 1); (b) SIMS/SUMO (PDB model 1); (c) GKAP/SHANK-PDZ (chains A and C). Two data points for FoldX are missing from the plot since they are less than −5.0 kJ/mol (R79A and W2714A).

Computational Alanine Scanning

Computational alanine scans were carried out for the same three target systems: NOXA-B/MCL-1 (2JM6), SIMS/SUMO (2LAS), and GKAP/SHANK-PDZ (1Q3P). We chose the simplest approach, namely, a single structure, in the first round of CAS; hence only the first structure of each NMR ensemble for 2JM6 and 2LAS was used along with the crystal structure 1Q3P. Each structure was subjected to each of six representative computational alanine-scanning methods. The GKAP sequence had an alanine (Ala2), and Gly was used to evaluate the role of this side chain. Crucially, crystallographic analyses suggested a key role for the C-terminal carboxylate, not just the side chains. Since BudeAlaScan is built on BUDE, it is simple to submit the modified backbone to BUDE itself to dissect out the roles of the side chain and carboxylate independently yielding a ΔΔG of 19.6 kJ/mol for replacing the carboxylate by the corresponding amide compared with the experimental value of 11.4 kJ/mol (Table ). This feature is unique to BudeAlaScan among the in silico methods and, therefore, applicable to multiple other unconventional changes to peptide structure (vide infra). Based on comparison of the in silico and experimental data, in each case, most of the computational methods correctly identified the majority of experimentally defined hot residues (Figures and 5), but they differed in the residues predicted incorrectly. For example, mCSM predicted Thr4 in GKAP as a hot-residue particularly well and with good accuracy, but wrongly predicted Arg5 as a hot-residue in GKAP. On the other hand, BudeAlaScan over-predicted the contribution of Arg79 in NOXA-B, but under-predicted Asp83. Encouragingly, by taking the average ΔΔG of all the methods for each residue, all but two hot-spot residues were correctly predicted giving a fraction correct of 0.92 (Table ). One residue (Ile2711) is predicted rather well, but the experimental value is close to the selection criterion of 4.184 kJ/mol and the predicted and experimental values straddle this cutoff. The other (Val2713) is solvent exposed and hydrophobic, favors beta structure, and is poorly predicted by all methods; hence it may perform a structural role in supporting the extended β-strand conformation required for binding, rather than contributing to the PPI per se.

Table 3

Pearson Correlation Coefficient between Experimental and Predicted ΔΔG Values for Mutations to Alanine in This Work

	BUDE	FoldX	Flex_ddG Talaris2014	Robetta	BeAtMuSiC	mCSM	Average
Pearson R	0.58	0.52	0.56	0.71	0.71	0.36	0.76
fraction correct	0.80	0.68	0.72	0.88	0.84	0.68	0.92

The Use of Multiple Structures Improves Predictive Alanine Scanning and Includes the Role of Dynamic Conformational Variation

A single protein structure, for example, that determined by X-ray crystallography, represents a single point on the global potential-energy landscape, but it is well established that proteins in their native state occupy a family of conformations clustered around such a local minimum. In other words, proteins are dynamic and show thermally induced fluctuations in their native folded states. This dynamic behavior makes accurate predictions of the free energy difference between two variants difficult to achieve with a single pair of structures. A number of the tools we evaluate could be used for this purpose, however, here as a proof-of-concept we used BudeAlaScan to probe these fluctuations. The BUDE force field goes some way to mitigating the effects of dynamics by using a soft-core potential to ameliorate geometrical inaccuracies in a structure. BudeAlaScan also employs multiple rotamers for side chains that can carry charge (DERKH) to account for entropic loss on freezing such side chains in interfacial salt bridges. However, we anticipated that greater accuracy would be achieved using an ensemble of conformations that include backbone flexibility. There are two ways to do this: by using (a) structures from solution phase NMR ensembles or (b) snapshots from molecular dynamics (MD) simulations. Although the two methods typically report on different time scales, the purpose here is not to compare the data from NMR with MD, rather to highlight that either can be useful in establishing where potential hot-residues are persistent. Crucially, such an approach begins to acknowledge the dynamic and conformationally varied nature of PPIs. The calculation of interaction energies with BUDE is very fast; hence BudeAlaScan rapidly processes ensembles of conformations. For our analysis, each structure was subjected to a 1 μs MD simulation, and the BudeAlaScan calculation was carried out on each of 100 snapshots (every 10 ns of a 1 μs trajectory). In addition, BudeAlaScan was performed on each of the submitted models in the NMR ensembles for the two PPIs with NMR structures. In general, the predictions improved through the use of this approach for all targets (Figure ).

Figure 6

Comparison of BudeAlaScan predicted and experimental ΔΔG values (gray bars). Green, average and standard deviation of 100 structures from 1 μs MD simulations; blue average and standard deviation of 20 (2JM6) and 10 (2LAS) NMR structures; (a) NOXA-B/MCL-1; (b) SIMS/SUMO; (c) GKAP/SHANK-PDZ. These data also allow the average strength and persistence of interactions to be assessed as a standard deviation: residues that are tightly packed into the interface with little conformational freedom tended to show low standard deviations. This behavior is exemplified by Leu78, Ile81, and Val85 for the NOXA-B/MCL-1 interaction (Figure a). In contrast, the surface-exposed residues Arg79 and Asp83 for the NOXA-B/MCL-1 interaction that transiently form salt bridges exhibit higher standard deviations. SIMS/SUMO shows similar behavior (Figure b). The smaller interface in GKAP/SHANK (Figure c) is much more mobile and shows large standard deviations throughout the sequence. In the absence of a corresponding NMR structure, we are unable to say if this is a consequence of excessive mobility in the MD simulation, although the poorer match with experimental alanine scanning results suggests this is the case. The match between prediction and experiment is generally somewhat improved by using ensembles of structures. Hence, the Pearson correlation coefficient for NOXA-B/MCL-1 and SIMS/SUMO between the BudeAlaScan data and experiment is 0.58 for single structures (Figure a,b) and 0.66 and 0.62 for the MD and NMR ensembles, respectively (Figure a,b).

Use of Multiple Tools to Predict Hot-Residues at a Topographically Distinct PPI

Having experimentally validated the approach with three model systems, we applied the workflow to a novel protein–protein interface with a different topography. For this, we selected a recently described PPI between an Affimer and BCL-xL,[70] which, like MCL-1, is an antiapoptotic member of the BCL-2 family of apoptotic regulators (Figure a).[71] In this structure, two nine-residue loops from the Affimer interact with the BH3 helix binding cleft on BCL-xL.

Figure 7

Application of predictive workflow to a topographically distinct protein–protein interaction. (a) Affimer/BCL-xL cocrystal structure (PDB ID 6HJL), highlighting residues Trp41, Trp44, and Trp76 (orange) identified for experimental characterization (Affimer in cyan, BCL-xL in green). (b) Predicted ΔΔG values calculated using each predictive tool together with average values for alanine variants of residues within the two 9-residue variable loops of the Affimer scaffold; gray boxes show ΔΔG limits estimated from the experimental W to A changes; residues 41 and 44 ≥ 15 kJ/mol, residue 76 ≤ 1 kJ/mol, (3 points for Foldx, D43A, E46A, and W76A, are missing from the plot as their values are less than −5 kJ/mol). (c) Competition fluorescence anisotropy inhibition curves for inhibition of the FITC–BID/BCL-xL interaction (20 mM Tris, 150 mM NaCl, 0.01% Triton X-100, pH 7.4, using FITC–BID as a tracer at 25 nM and BCL-xL at 100 nM concentration) using variant Affimer sequences. Predictions for each loop were performed using each of the tools, and the average ΔΔG was determined (Figure b). We then selected three residues to test experimentally: two predicted to be hot-residues (Trp41 and Trp44), and one predicted as not being a hot-residue (Trp76) according to the average values of ΔΔG across the six methods. The parent and variant Affimer sequences were cloned, expressed, and purified and then tested in a competitive FA assay, that is, by displacement of FITC–BID fluorescently labeled peptide from BCL-xL. These experiments fully confirmed the predictions (Figure c and SI, Table S9). Crucially, at least one of the individual prediction tools would not have predicted these residues accurately, further validating the use of comparative analyses.

Conclusions

We outline effective approaches for rapid predictive in silico examination of PPI interfaces. This employs multiple computational tools, allowing comparative analyses to predict hot-residues accurately. In the broader context of providing improved and accessible methods for in silico Ala scanning, we introduce BudeAlaScan. This is a versatile tool for the rapid analysis of PPIs using X-ray structures, NMR ensembles, or snapshots from MD trajectories as input files. Exploiting six in silico prediction tools illustrates that use of multiple methods leads to improved accuracy in the prediction of hot-residues. We demonstrate this further by comparing the predictions with experimental studies for three different PPIs. We note that the predicted ΔΔG values for single-alanine variants vary between the in silico methods and also change through structural ensembles analyzed with any single method. Thus, in terms of informing on the role of individual amino acids in PPIs and developing starting points for ligand design, we advocate using a combination of methods to predict hot-residues and hot-spots. Additional use of NMR ensembles or snapshots from MD trajectories allows an assessment of the persistence of noncovalent interactions. Taken together this allows prioritization of potential hot-residues to reduce the number of predictions that require experimental validation and provides greater confidence in those that should be selected for mimicry in ligand based design.

Methods

Computational Methods and MD

SKEMPI Data Set and Predictions

The set of 748 single mutations to alanine from the SKEMPI database with associated binding data is that described by Barlow et al.[28] and provided in that paper’s Supporting Information (jp7b11367_si_002.xlsx).

FoldX

The program FoldX was downloaded from http://FoldXsuite.crg.eu and used in AlaScan mode with default parameters.

Rosetta

Flex_ddG

Data were taken from the Supporting Information of previously published work[28] for benchmarking against SKEMPI. Mutations specific to this work were calculated using the flex_ddG method in Rosetta Commons release and the flex_ddG.xml script provided in the SI of Barlow et al.[28] The Talaris 2014 force field was used, and each result is the average of 50 repeats.

Robetta

Mutations specific to this work were also calculated with the Robetta server, http://robetta.bakerlab.org or via the Rosetta Commons release.

BudeAlaScan

BudeAlaScan is a command-line python application for computational alanine scanning. It employs ISAMBARD[5] for structure manipulation and a customized version of the Bristol University Docking Engine (BUDE)[6] for energy calculations. The program was run in scan mode with default parameters. The application is available via the BAlaS server: http://coiledcoils.chm.bris.ac.uk/balas

BeAtMuSiC

Data used in the BeAtMuSiC publication were taken from the MCSM Web site, http://biosig.unimelb.edu.au/mcsm/data, for benchmarking against SKEMPI. Mutations specific to this work were calculated using the BeAtMuSiC server: http://babylone.ulb.ac.be/beatmusic.

mCSM

The mCSM data were calculated in-house by querying the server http://biosig.unimelb.edu.au/mcsm/protein_protein with a python script.

Molecular Dynamics Simulations

All simulations were performed using the GROMACS 5.1.4 suite and the following general protocols. Structures from the protein database (SIMS/SUMO 2LAS; GKAP/SHANK-PDZ 1Q3P; NOXA-B/MCL-1 2JM6) were processed with the GROMACS tool chain. The utility pdb2gmx was used to add hydrogen atoms consistent with pH 7 and virtual-site hydrogens for mobile groups and parametrize the protein with the amber99SB-ildn force field. The protein was placed in an orthorhombic box 2 nm larger than the protein in each dimension and filled with TIP3P water containing 0.15 M sodium chloride ions to give a charge-neutral system overall. After 10000 steps of steepest descent minimization, molecular dynamics was initiated with random velocities while restraining the protein backbone to its original position with a force constant of 1000 kJ/nm for 0.2 ns. Simulations were developed for a further 1 μs without backbone position restraints under periodic boundary conditions. The Particle Mesh Ewald’s method was used for long-range electrostatic interactions, while short-range Coulombic and van der Waals energies were truncated at 1.2 nm. The temperature was maintained at 310 K using the v-rescale method and the pressure at 1 bar with the Berendsen barostat. The use of virtual-site hydrogens allowed a 5 fs time step for the leapfrog integrator. Bond constraints were implemented with the LINCS method, and SETTLE was used for waters. Trajectories were processed and analyzed with the GROMACS tools and visualized with VMD 1.9.3.

Expression and Purification of Proteins

Proteins were expressed using standard protocols and characterized as described in the Supporting Information.

Peptide Synthesis and Purification

Peptides were prepared using microwave assisted solid-phase FMoc based synthesis using a CEM Liberty Blue peptide synthesizer and purified by reverse phase HPLC. Full details and characterization are available in the Supporting Information.

Isothermal Titration Calorimetry (ITC)

ITC experiments were carried out using Microcal ITC200i instrument (Malvern) at 25 °C in 20 mM Tris, 150 mM NaCl, pH 7.5, buffer. SHANK-PDZ was dialyzed against the buffer prior to experiment; lyophilized peptides were dissolved in the same buffer. SHANK-PDZ (150 μM) was present in the cell and titrated with 1.4–2 mM peptide solutions loaded into the syringe using 2 μL injections with 120 s spacing between the injections for 20 injections. Heats of peptide dilution were subtracted from each measurement raw data. Data was analyzed using Microcal Origin 8 and fitted to a one binding site model.

Fluorescence Anisotropy

Fluorescence anisotropy assays were performed in 384-well plates (Greiner Bio-one). Each experiment was run in triplicate, and the fluorescence anisotropy was measured using a PerkinElmer EnVisionTM 2103 MultiLabel plate reader with excitation at 480 nm (30 nm bandwidth), polarized dichroic mirror at 505 nm, and emission at 535 nm (40 nm bandwidth, S and P polarized) for FAM and FITC labeled peptides. The excitation and emission wavelengths for BODIPY labeled BAK peptide were set to 531 and 595 nm, respectively. The excitation and emission wavelengths for FITC labeled BID peptide were set to 490 and 535 nm, respectively.

Direct Binding Assays

Fluorescence anisotropy direct titration assays were performed with protein concentration diluted over 16–24 points using 1/2 dilutions. Twenty microliters of buffer was first added to each well. Twenty microliters of a solution of protein was added to the first column. The solution was well mixed, and 20 μL was taken out and added to the next column and so on. This operation consists of serial dilution of the protein across the plate. Finally, 20 μL of tracer was added to the wells. For control wells, the tracer peptide was replaced with an identical volume of assay buffer, and plates were read after 1 h.

Competition Binding Assays

FA competition assays were performed in 384 well plates with the concentration of variant peptide competitor typically from 10 to 1500 μM, diluted over 16–24 points in 1/2 regime with fixed protein and tracer concentrations. For control wells, the tracer peptide was replaced with an identical volume of assay buffer. The total volume in each well was 60 μL. Plates were read after 1 h (and 16 h for BCL-xL assays) of incubation at RT. Additional details on buffer composition, reagent concentration for individual PPIs, and data analyses are given in the Supporting Information.

66 in total

1. Protein structure prediction and analysis using the Robetta server.

Authors: David E Kim; Dylan Chivian; David Baker
Journal: Nucleic Acids Res Date: 2004-07-01 Impact factor: 16.971

2. Comparing experimental and computational alanine scanning techniques for probing a prototypical protein-protein interaction.

Authors: Richard T Bradshaw; Bhavesh H Patel; Edward W Tate; Robin J Leatherbarrow; Ian R Gould
Journal: Protein Eng Des Sel Date: 2010-07-23 Impact factor: 1.650

Review 3. Control of apoptosis by the BCL-2 protein family: implications for physiology and therapy.

Authors: Peter E Czabotar; Guillaume Lessene; Andreas Strasser; Jerry M Adams
Journal: Nat Rev Mol Cell Biol Date: 2014-01 Impact factor: 94.444

4. PredHS: a web server for predicting protein-protein interaction hot spots by using structural neighborhood properties.

Authors: Lei Deng; Qiangfeng Cliff Zhang; Zhigang Chen; Yang Meng; Jihong Guan; Shuigeng Zhou
Journal: Nucleic Acids Res Date: 2014-05-22 Impact factor: 16.971

Review 5. A million peptide motifs for the molecular biologist.

Authors: Peter Tompa; Norman E Davey; Toby J Gibson; M Madan Babu
Journal: Mol Cell Date: 2014-07-17 Impact factor: 17.970

Review 6. The SUMO pathway: emerging mechanisms that shape specificity, conjugation and recognition.

Authors: Jaclyn R Gareau; Christopher D Lima
Journal: Nat Rev Mol Cell Biol Date: 2010-12 Impact factor: 94.444

7. HotPoint: hot spot prediction server for protein interfaces.

Authors: Nurcan Tuncbag; Ozlem Keskin; Attila Gursoy
Journal: Nucleic Acids Res Date: 2010-05-05 Impact factor: 16.971

8. Effect of the MDM2 antagonist RG7112 on the P53 pathway in patients with MDM2-amplified, well-differentiated or dedifferentiated liposarcoma: an exploratory proof-of-mechanism study.

Authors: Isabelle Ray-Coquard; Jean-Yves Blay; Antoine Italiano; Axel Le Cesne; Nicolas Penel; Jianguo Zhi; Florian Heil; Ruediger Rueger; Bradford Graves; Meichun Ding; David Geho; Steven A Middleton; Lyubomir T Vassilev; Gwen L Nichols; Binh Nguyen Bui
Journal: Lancet Oncol Date: 2012-10-17 Impact factor: 41.316

9. KFC Server: interactive forecasting of protein interaction hot spots.

Authors: Steven J Darnell; Laura LeGault; Julie C Mitchell
Journal: Nucleic Acids Res Date: 2008-06-06 Impact factor: 16.971

10. Computational prediction of alanine scanning and ligand binding energetics in G-protein coupled receptors.

Authors: Lars Boukharta; Hugo Gutiérrez-de-Terán; Johan Aqvist
Journal: PLoS Comput Biol Date: 2014-04-17 Impact factor: 4.475

19 in total

1. Rapid prediction of crucial hotspot interactions for icosahedral viral capsid self-assembly by energy landscape atlasing validated by mutagenesis.

Authors: Ruijin Wu; Rahul Prabhu; Aysegul Ozkan; Meera Sitharam
Journal: PLoS Comput Biol Date: 2020-10-20 Impact factor: 4.475

Review 2. Computational Tools and Strategies to Develop Peptide-Based Inhibitors of Protein-Protein Interactions.

Authors: Maxence Delaunay; Tâp Ha-Duong
Journal: Methods Mol Biol Date: 2022

3. Understanding p300-transcription factor interactions using sequence variation and hybridization.

Authors: Fruzsina Hóbor; Zsófia Hegedüs; Amaurys Avila Ibarra; Vencel L Petrovicz; Gail J Bartlett; Richard B Sessions; Andrew J Wilson; Thomas A Edwards
Journal: RSC Chem Biol Date: 2022-04-11

4. A potential interaction between the SARS-CoV-2 spike protein and nicotinic acetylcholine receptors.

Authors: A Sofia F Oliveira; Amaurys Avila Ibarra; Isabel Bermudez; Lorenzo Casalino; Zied Gaieb; Deborah K Shoemark; Timothy Gallagher; Richard B Sessions; Rommie E Amaro; Adrian J Mulholland
Journal: Biophys J Date: 2021-02-18 Impact factor: 4.033

5. Computational Hot-Spot Analysis of the SARS-CoV-2 Receptor Binding Domain/ACE2 Complex*.

Authors: Pedro A Rosario; Brian R McNaughton
Journal: Chembiochem Date: 2020-12-04 Impact factor: 3.164

6. Query-guided protein-protein interaction inhibitor discovery.

Authors: Sergio Celis; Fruzsina Hobor; Thomas James; Gail J Bartlett; Amaurys A Ibarra; Deborah K Shoemark; Zsófia Hegedüs; Kristina Hetherington; Derek N Woolfson; Richard B Sessions; Thomas A Edwards; David M Andrews; Adam Nelson; Andrew J Wilson
Journal: Chem Sci Date: 2021-03-02 Impact factor: 9.825

7. Identification of β-strand mediated protein-protein interaction inhibitors using ligand-directed fragment ligation.

Authors: Zsófia Hegedüs; Fruzsina Hóbor; Deborah K Shoemark; Sergio Celis; Lu-Yun Lian; Chi H Trinh; Richard B Sessions; Thomas A Edwards; Andrew J Wilson
Journal: Chem Sci Date: 2021-01-06 Impact factor: 9.825

8. A Coarse-Grained Methodology Identifies Intrinsic Mechanisms That Dissociate Interacting Protein Pairs.

Authors: Haleh Abdizadeh; Farzaneh Jalalypour; Ali Rana Atilgan; Canan Atilgan
Journal: Front Mol Biosci Date: 2020-08-25

9. SARS-CoV-2 nucleocapsid and Nsp3 binding: an in silico study.

Authors: Muhammad Tahir Khan; Muhammad Tariq Zeb; Hina Ahsan; Abrar Ahmed; Arif Ali; Khalid Akhtar; Shaukat Iqbal Malik; Zhilei Cui; Sajid Ali; Anwar Sheed Khan; Manzoor Ahmad; Dong-Qing Wei; Muhammad Irfan
Journal: Arch Microbiol Date: 2020-08-04 Impact factor: 2.552

Review 10. Dynamics, a Powerful Component of Current and Future in Silico Approaches for Protein Design and Engineering.

Authors: Bartłomiej Surpeta; Carlos Eduardo Sequeiros-Borja; Jan Brezovsky
Journal: Int J Mol Sci Date: 2020-04-14 Impact factor: 5.923