Literature DB >> 35654412

PPI-Affinity: A Web Tool for the Prediction and Optimization of Protein-Peptide and Protein-Protein Binding Affinity.

Sandra Romero-Molina¹, Yasser B Ruiz-Blanco¹, Joel Mieres-Perez¹, Mirja Harms², Jan Münch^2,3, Michael Ehrmann⁴, Elsa Sanchez-Garcia¹.

Abstract

Virtual screening of protein-protein and protein-peptide interactions is a challenging task that directly impacts the processes of hit identification and hit-to-lead optimization in drug design projects involving peptide-based pharmaceuticals. Although several screening tools designed to predict the binding affinity of protein-protein complexes have been proposed, methods specifically developed to predict protein-peptide binding affinity are comparatively scarce. Frequently, predictors trained to score the affinity of small molecules are used for peptides indistinctively, despite the larger complexity and heterogeneity of interactions rendered by peptide binders. To address this issue, we introduce PPI-Affinity, a tool that leverages support vector machine (SVM) predictors of binding affinity to screen datasets of protein-protein and protein-peptide complexes, as well as to generate and rank mutants of a given structure. The performance of the SVM models was assessed on four benchmark datasets, which include protein-protein and protein-peptide binding affinity data. In addition, we evaluated our model on a set of mutants of EPI-X4, an endogenous peptide inhibitor of the chemokine receptor CXCR4, and on complexes of the serine proteases HTRA1 and HTRA3 with peptides. PPI-Affinity is freely accessible at https://protdcal.zmb.uni-due.de/PPIAffinity.

Entities: Chemical

Keywords: binding free energy; dissociation constant; machine learning; mutation; peptide design; protein−protein interaction

Mesh：

Substances：
Peptides
Proteins

Year: 2022 PMID： 35654412 PMCID： PMC9361347 DOI： 10.1021/acs.jproteome.2c00020

Source DB: PubMed Journal: J Proteome Res ISSN： 1535-3893 Impact factor: 5.370

Introduction

Protein–protein interactions (PPIs) are fundamental to most biological processes.[1] Prominent disorders, such as cancer and degenerative diseases, are related to aberrant PPIs.[2] In therapy, optimized PPIs are also critical for the strong binding of antibodies to their protein antigens.[3] Therefore, the characterization of PPIs in terms of their binding affinity (BA) is highly relevant to the design of new biologics and therapeutic compounds.[4] Notably, peptides are a promising class of bioactive compounds, which often have higher specificity and reduced side effects compared to small-molecule pharmaceuticals.[5] Currently, there are more than 60 approved peptide drugs, and hundreds of peptidic compounds undergo clinical or preclinical trials.[6] However, the design of peptide drugs remains a challenging task due to their flexible structures and diversity of binding sites.[7] Virtual screening approaches, based on BA predictions, reduce the time of the drug development pipelines.[3] Thus, over the past decades, several screening tools based on the BA of protein–protein complexes have been introduced.[8−26] Relevant examples of these methods are DFIRE,[9] CP_PIE,[16] ISLAND,[24] and the web server PRODIGY,[21,27] (which tailored and improved the original model introduced by Kastritis et al.).[20] Protein–peptide complexes are usually scored with functions derived from binding affinity data of small molecules. Examples of such scoring methods are RF-Score[28] and Kdeep,[29] which used a random forest algorithm[30] and a convolutional neural network,[31,32] respectively, to train their models. Supporting Information Table SI-1 summarizes the above-mentioned BA predictors, as well as other state-of-the-art methods that contribute to the broad field of PPI prediction tools. Noteworthy, to the best of our knowledge, there is no publicly available web tool specifically designed to predict and optimize the BA of diverse protein–peptide complexes, by considering as a peptide an amino acid sequence with less than 30 residues. Several works have approached this aim specifically for the identification of binders of the major histocompatibility complexes MHC-I, II.[33,34] However, given that their training is restricted to MHC data, these models are not applicable to predict the binding free energy of other protein–peptide complexes. Besides, the available tools typically do not leverage the possibility of optimizing the primary structure of the peptide to improve the affinity of the complex. We evaluated the performance of the above-mentioned screening tools for estimating the BA of protein–peptide complexes by testing 100 randomly selected protein–peptide complexes from the Biolip[35] database. This test set contained complexes with peptides ranging from 4 to 29 amino acids, coupled to receptors with sizes between 51 and 496 amino acids. The binding free energy of the complexes covered the range between −12.6 and −4.6 kcal/mol. The highest correlation was delivered by Kdeep (R = 0.32), while the correlation with all of the other methods was in the range R = [0.13, 0.24] (Supporting Information Table SI-2). This comparison evidences the rather low dependability of state-of-the-art screening tools for the prediction of the BA of protein–peptide complexes. Given the remarkable scaffold that peptides represent for drug development, we addressed this issue by developing machine-learning-based predictors of BA that are specific for protein–peptide complexes. In addition, we present a predictor of protein–protein BA that rivals the performance of the state-of-the-art screening tools built for such systems. Both predictors are integrated into a novel tool named PPI-Affinity, which is a web server designed to score protein–protein and protein–peptide complexes based on their predicted BA, as well as to optimize the affinity of a complex by mutating and screening selected residues. With such functionalities, PPI-Affinity can be employed at early steps of drug design processes, which are focused on the screening and optimization of protein/peptide binders for a given protein target.

Materials and Methods

In this section, we describe the dataset and the modeling procedure used to develop both predictors. In both scenarios, protein–protein and protein–peptide systems, the performance of the models is evaluated with cross-validation and hold-out test sets.

Data Collection: Protein–Protein Complexes

We initially retrieved 2 852 protein–protein complexes with known BA data deposited in the PDBbind (v.2020)[36] database. Subsequently, we curated these data by extracting only dimeric complexes and removed those in common with the benchmark used by Vangone and Bonvin[21] to assess their model. We excluded those cases in which the binding affinity values were reported with measures other than Kd, Ki, or ΔGbind. Likewise, those instances with imprecise BA values (i.e., reporting ranges of Kd values instead of precise values) were removed. Cases with binding free energy values outside the range [−18.1, −3.1] kcal/mol were also excluded, as these instances are sparingly represented and the difference between their BA value and the rest of the distribution was ΔG ≥ 0.5. The application of such filters rendered a dataset of 833 protein–protein complexes (Supporting Information Figure SI-1 depicts the distribution of binding free energy values for this dataset). All of the structures were preprocessed by adding hydrogens and other missing atoms with MODELLER v9.23[37−40] and PDB2PQR.[41]

Features Generation

We employed the 3D-structure descriptors implemented in the protein codification package ProtDCal.[42] This program is accessible via the web server ProtDCal-Suite,[43] which also provides access to other models developed by us using this codification approach. ProtDCal has a workflow of four automated steps, which are: (i) residue-wise codification, (ii) modification based on the vicinity, (iii) residue grouping based on amino acid types, and (iv) numerical aggregation using descriptive statistics. The options selected at each step are used in a combinatorial scheme to generate an array of numerical descriptors that characterize the input structure. Each descriptor in the array is the result of the combination of one residue property (e.g., reside-wise contact order, RWCO),[44] a vicinity operator (e.g., autocorrelation, AC), a grouping criterion (e.g., nonpolar residues, NPR), and an aggregation operator (e.g., variance, V); thus, every element in the array is identified by a unique label (e.g., RWCO_AC_NPR_V). In the Supporting Information Section SI-1, we provide the configuration file used to compute the descriptors employed in this study. This configuration generates 23 040 descriptors for each protein–protein complex. In ProtDCal, interchain residue contacts are determined by a maximum spatial distance (d) measured between the α carbons of the residues. We set up the calculation of such contacts with a spatial cutoff d = 10 Å. ProtDCal has been successfully applied by us and other authors to model post-translational modifications,[42,45,46] protein–protein interaction,[47] enzyme-like amino acid sequences,[48] critical residues for protein function,[49] and antibacterial peptides.[50,51]

Modeling Protocol for Protein–Protein Affinity Data

The learning process was carried out with Weka 3.8.4.[52] Support Vector Machine (SVM) was selected as the learning algorithm after performing a preliminary study that showed that it is the simplest and best-performing solution for developing the models (Supporting Information Section SI-2). SVM is a successful approach widely validated in drug discovery.[53,54] This technique has delivered a worthy performance on small and mid-sized datasets,[55] where deeper machine learning (ML) approaches, such as various neural network architectures, tend to overfit.[56,57] The implementation of SVM for regression in the package SMOreg[58] was employed to develop the models. We randomly split the collected data into three datasets: a training set with 653 complexes, a development set with 90 complexes, and a test set with 90 complexes (Script SI-1). The purpose of the development set was to monitor the generalization of multiple configurations of the hyperparameters during the training process. The hold-out test set was exclusively used to compare the final model with external predictors. Initially, every instance in the training set was represented as a vector of 23 040 molecular descriptors generated with ProtDCal. The steps of feature selection included the removal of invariant descriptors and those carrying missing values. Next, an optimal subset of attributes was searched with a supervised exploration guided by the error in the prediction of the binding affinity for cases in the training set. The resulting optimal subset comprised 26 descriptors, which were used to train the final models (Supporting Information Section SI-3). Supporting Information File SI-1 provides a configuration file for ProtDCal to compute this specific list of descriptors. We adopted an ensemble learning approach by creating four partially overlapping subsets of the training data with a distribution of the output variable flatter than the complete training set (Supporting Information Figure SI-2). Thus, we trained independent models with each subset and combined their outcomes using the Vote method implemented in Weka 3.8.4,[52,59,60] with combination schemes based on the average, maximum, and minimum predicted values among selected models. With each training subset, the hyperparameters of the SVM were adjusted using a grid search.[61] The search space was defined by the following range of hyperparameter values: Complexity (C): 2–5, 2–4.5, 2–4, 2–3.5, 2–3,..., 21, 21.5, 22, 22.5, 23, 23.5 and 24 Degree (D) of the polynomial function kernel: 1, 2, and 3 The models generated during the search were assessed using the Pearson’s correlation coefficient (R) (eq ) of the estimations in the training data via resubstitution and 10-fold cross-validation (10-fold CV), as well as in the development set.The terms y and y̅ are the actual affinity values and their mean in the datasets. Analogously, the terms ypred and y̅pred correspond to the same type of values but as predicted by the model. To identify the optimum values of the hyperparameters, we formulated an ad hoc fitness-robustness score (FRS) (eq ) as a function of the correlation coefficient, which combines in a single measure the performance of a model in terms of goodness of fit and robustness. The FRS function has an optimum maximum value at FRS = 1; thus, we selected the configuration that maximized the FRS function. The terms RTS, RCV, and RDEV are the correlation coefficients obtained on the training set, in 10-fold CV, and development set, respectively. R̅ is the arithmetic mean of the performance on these three tests. The first term of the function aims to combine the performance on the training set (goodness of fit), with the performance in cross-validation and development set (generalization). The next two terms reduce the deviations between the performance on the training set and the performance when evaluating in cross-validation and development set. These weighting terms improve the robustness of the selected model by considering the generalization power of the predictor in addition to the goodness of fit. Overall, such a unified quantity is a highly informative measure to guide the optimization of hyperparameters during the modeling. We intend to continue challenging this formulation in further studies and applications. Figure summarizes the results of the optimization of the hyperparameter values in each training subset. A complete list of all of the intermediate models and performance measures is provided in Supporting Information Table SI-3.

Figure 1

Results of the optimization of hyperparameters. The selected model per training subset is marked with a red square and corresponds to the combination of degree (D) and complexity (log2 C) values that maximized the fitness-robustness score defined in eq . All possible combinations with at least two models were evaluated in the development set to determine the best ensemble model. We summarized all of the performance measures for the independent models and the distinct ensembles in Supporting Information Table SI-4.

Data Collection: Protein–Peptide Complexes

The model was developed with data extracted from the Biolip database, which also incorporates data from the PDBbind Protein-Ligand[36] database. We downloaded the nonredundant dataset of Biolip, containing 105 152 entries. Only protein–peptide complexes with less than 90% of identity between the binding site′s residues and the full receptor sequence are included in these data. We extracted the complexes containing single-chain receptor only and a peptide formed by standard residues with a minimum length of three residues. Instances reported with post-translational modifications or fusion constructs (peptide/nonpeptide) were discarded. Subsequently, we selected the cases for which their BA values are reported in terms of the dissociation (Kd) or inhibition (Ki) constants only. We excluded the complexes with ambiguous Kd or Ki values (i.e., reporting a range of values), and those with binding free energies outside the range from −14.4 to −3.6 kcal/mol, as these instances were poorly represented and the difference between their BA value and the rest of the distribution was ΔG ≥ 0.5. The curated dataset contained 1149 complexes, with peptides of length ranging between 3 and 29 amino acids and receptors of sizes between 31 and 957 amino acids (Supporting Information Figure SI-3). Hydrogen atoms and other missing atoms were added to the structures using MODELLER v9.23[37−40] and PDB2PQR.[41]

Modeling Protocol for the Protein–Peptide BA Data

We used SVM, implemented in the package SMOreg[58] in Weka 3.8.4, following a protocol equivalent to that employed for the protein-protein model described above. In summary, we randomly divided the dataset into three subsets: training dataset (949 instances), development dataset (100 instances), and test dataset (100 instances) (Script SI-1). Subsequently, we extracted numerical descriptors from the complexes’ structure using ProtDCal[42,43] with the configuration file summarized in Supporting Information Section SI-1. This step generated 23 040 molecular descriptors for each instance in the dataset. Next, we reduced this large multidimensional space to 37 descriptors through a features selection process (Supporting Information Section SI-4; a list of the extracted set of descriptors can be found in the file SI-2). The training scheme used to develop the final predictor followed an ensemble approach. Four individual models were built with partially overlapping subsets of the training set. The subsets are random samples of the training data with a flatter distribution of the BA values (Supporting Information Figure SI-4) compared to the entire training set. This is achieved by undersampling the region next to the mode value of the distribution, while keeping the tails. Such transformation in the distribution of the data allows a balanced error weight along the entire range of the response variable. Subsequently, we optimized the hyperparameters of each model independently and combined their predictions with the Vote method implemented in Weka 3.8.4, according to the average, maximum, and minimum ensemble rules. To create the models, the hyperparameters of the estimator were adjusted in a grid search following the same methodology as for the protein–protein model. During the optimization of the hyperparameters, the performance of the models was monitored in the training data, in 10-fold CV, and the development set. The optimum set of hyperparameter values for each training dataset was selected according to the FRS defined in eq . Figure shows the results of the hyperparameters tuning process for each training subset. A complete list of all of the intermediate models and performance measures can be found in Supporting Information Table SI-5.

Figure 2

Results of the optimization of hyperparameters. The selected model per training subset is marked with a red square and corresponds to the combination of degree (D) and complexity (log2 C) values that maximized the FRS. Next, the best ensemble model was selected by evaluating all possible combinations of models on the development set (Supporting Information Table SI-6). The selected ensemble model was evaluated on the hold-out test set of 100 complexes and compared with other available state-of-the-art protein–protein and protein–ligand BA predictors. Finally, the ranking power of the model was assessed on a set of mutants of the peptide EPI-X4, an endogenous inhibitor of CXCR4 whose activity was experimentally determined and on a set of complexes between peptides and the serine proteases HTRA1 or HTRA3.

Results and Discussion

In this section, we analyze the performance of the developed models and compare them with other state-of-the-art predictors.

Performance of the Protein–Protein Model

The best ensemble model contains two out of the four training subsets, whose individual correlations are R2 = R3 = 0.50. The ensemble model improved the individual ones achieving a correlation of R = 0.53 on the development set. The rule of minimum predicted value was used to build the ensemble predictor. Recently, Vangone and Bonvin[21] developed a model that predicts the BA based on two structural descriptors: the network of inter-residue contacts (ICs) and the noninteracting surface (NIS). The model, named ICs/NIS-based predictor was implemented in the web server PRODIGY.[27] This tool delivered a much better performance (R = 0.74 and MAE = 1.4) than other state-of-the-art methods on a benchmark set of 79 protein–protein complexes. Thus, we employed this benchmark set to evaluate our method (Table , Test set 1) against the ICs/NIS-based model, the other two top-ranked and currently available tools in that assessment, and the ISLAND method. We estimated the performance of our ensemble model on this benchmark set and obtained a correlation coefficient R = 0.62 and MAE = 1.8 kcal/mol (Test set 1) that ranks our method second, after PRODIGY.

Table 1

Summary of the Evaluation of PPI-Affinity and State-of-the-Art BA Predictors on Two Sets of Protein–Protein Affinity Dataa

	test set 1		test set 2
method	R	MAE (kcal/mol)	R	MAE (kcal/mol)
PRODIGY	0.74	1.4	0.31	2.5
DFIRE	0.60	4.6	0.10	25.4
CP_PIE	–0.52	8.8	–0.10	11.0
ISLAND	0.38	2.1	0.27	2.2
PPI-Affinity	0.62	1.8	0.50	1.8

The performance is expressed as the Pearson’s correlation coefficient (R) between experimental and predicted BA. The test set 1 corresponds to the benchmark employed by Vangone and Bonvin,[21] while the test set 2 corresponds to the hold-out set of 90 data points taken from PDBbind (v.2020). The performance for the other methods on test set 1 was reported by Vangone and Bonvin.[21] The negative values of the correlation coefficients indicate that the corresponding method predicts unbinding free energy. Next, we challenged the predictors with a larger hold-out test set of 90 complexes, which was initially extracted from the data collected from the PDBbind (v.2020) protein–protein dataset (Table , test set 2). In this test, our model renders a correlation coefficient of R = 0.50, with an error MAE = 1.8 kcal/mol (Test set 2), performance which is only marginally inferior to that obtained in the benchmark set of Vangone and Bonvin[21] (R = 0.62; MAE = 1.8 kcal/mol). The performance of ISLAND was diminished with respect to our method. Nevertheless, ISLAND delivered consistent results, in general superior to those delivered by other methods, in both test sets. The other predictors (PRODIGY, DFIRE, CP_PIE) show a large decrease in their performance with respect to their results in the test set 1 (Table , test set 1), with PRODIGY being the second best with a correlation coefficient R = 0.31 and MAE = 2.5 kcal/mol. Such a dramatic decay in the predictions suggests the presence of overfitting toward the previous benchmark set, specifically in the case of PRODIGY. Nonetheless, the analysis of the performance of the other predictors is hindered by the lack of applicability domain (AD) definition for using the methods. The absence of a defined AD limits the analysis of the errors of the predictions, as it is not possible to examine whether test samples are simply outside the scope of these predictors or there is a specific structural issue that affects the quality of the prediction. In short, the data evidence the generalization achieved by our predictor, which performs consistently well in different external test sets. Supporting Information Figure SI-5 displays plots of experimental versus predicted BA values for PPI-Affinity in both test sets. Next, we evaluated the performance of PPI-Affinity on a set of mutants taken from the SKEMPI v2.0[62] dataset (Figure ). This database reports the binding affinity changes of 7085 mutations of 345 protein–protein interactions for which the structure of the complex has been resolved. We selected a subset from this dataset by applying the following filtering steps: (I) we extracted the dimeric complexes having at least 30 amino acids in each protein sequence; (II) we removed the complexes overlapping between the selected data, the PDBbind (v.2020) dataset, and the benchmark set of Vangone and Bonvin;[21] and (III) we removed the wild-type systems with more than one binding affinity value reported, as well as all mutants with ambiguous or unreported binding affinities. The output of the filtering steps reduced the dataset to 34 wild-type complexes and 182 mutants. We fed the conformed test set to PPI-Affinity. Eight wild-type complexes and their related mutants were found outside the applicability domain of the model. Thus, the final test set contained 26 wild-type structures and 151 mutants. The assessed mutants featured between one and six mutations per protein sequence, with 80% of the structures accounting for only one mutation. The binding free energy of all of the complexes was in the range of −16.3 to −5.5 kcal/mol.

Figure 3

Performance of PPI-Affinity on the test set of 26 wild-type complexes and 151 mutants of protein–protein affinity data points taken from the SKEMPI dataset. The performance is reported as the Pearson’s correlation coefficient (R) between experimental and predicted BA. The green points correspond to BA values of the wild-type complexes, and the orange points correspond to the BA values of the mutants. The performance of PPI-Affinity (R = 0.78 and MAE = 1.4 kcal/mol) on the SKEMPI dataset is superior to that obtained in the Vangone and Bonvin benchmark[21] (Table , Test set 1) and in the 90 protein–protein complexes taken from the PDBbind (v.2020) database (Table , Test set 2). When considering only the wild-type complexes, PPI-Affinity delivered a performance of R = 0.77 and MAE = 1.1 kcal/mol. These results evidence the robustness of our protein–protein model in a third external dataset, at the time that show the ability of PPI-Affinity to characterize the changes upon mutations of protein–protein complexes. Additionally, we assessed the performance of PPI-Affinity against the LUPIA[26] classifier. LUPIA uses a threshold value to classify as “high” or “low” the binding affinity of protein–protein complexes. This evaluation (Supporting Information Section SI-5) evidenced the shortcomings of discretizing the BA values to train classifiers rather than regressors models. Taken together, the performance of our protein–protein model on several tests ranks our ensemble model on the top of state-of-the-art fast protein–protein BA predictors.

Performance of the Protein–Peptide Model

The best ensemble model was obtained by two out of the four training subsets, improving their performances (R1 = 0.53, R3 = 0.54) to R = 0.56 on the development set. The rule of maximum value was used to build the ensemble model. The model was assessed on the test set of 100 protein–peptide complexes initially hold-out from the data extracted from the Biolip database (Supporting Information Figure SI-6). Here, we compared the results of our method with state-of-the-art protein–protein and protein–ligand BA predictors. The considered tools include Kdeep and RF-Score for protein–ligand complexes, as well as the above-presented PRODIGY, DFIRE, and CP_PIE methods. ISLAND requires a minimum size of 20 amino acids for each protein sequence. For this reason, the tool was applied to the assessment of only the protein–protein model. Our protein–peptide affinity model outperformed all the other tools, showing a correlation coefficient of R = 0.55 with MAE = 1.1 kcal/mol (Table ). The low correlations values delivered by other state-of-the-art tools can be related to the fact that the functions of Kdeep and RF-Score are fitted primarily using small organic ligands.[63,64] This suggests that fitting with mostly small ligand data cannot capture the differences imposed by the larger size of the peptides, as well as the diversity of peptide interactions with the target and the solvent.

Table 2

Correlation Coefficient (R) of Protein–Protein and Protein–Ligand BA Predictors on the Test Set of 100 Protein–Peptide Complexesa

method	R	MAE (kcal/mol)
PRODIGY	0.13	1.9
DFIRE	0.29	8.7
CP_PIE	–0.28	9.0
Kdeep*	0.32	10.7
RF-Score*	0.23	1.8
PPI-Affinity	0.55	1.1

Protein–ligand methods are marked with a star. The negative values of the correlation coefficients indicate that the corresponding method predicts unbinding free energy.

Case Study I: Ranking the Affinity of Mutants of EPI-X4

EPI-X4 (Endogenous Peptide Inhibitor of CXCR4) is a fragment of albumin identified as an endogenous antagonist of the CXC chemokine receptor 4 (CXCR4) by Zirafi et al.[65] Given the implication of CXCR4 in viral (HIV) infection, inflammation and cancer,[66,67] this peptide represents a highly promising scaffold to develop therapeutic drugs targeting the CXCR4 receptor. Recently, mutants of EPI-X4 have been screened to identify derivatives with enhanced stability and affinity. For this, the affinity values (in terms of IC50 nanomolar, nM) of EPI-X4 and 56 derivatives to CXCR4 were estimated using an antibody competition assay.[67] The scheme of this assay is based on the competitive binding of a fluorescently labeled anti-CXCR4 antibody (clone 12G5) with CXCR4 ligands (Supporting Information Section SI-6).[67] These derivatives are peptides with size in the range of 6–16 amino acids. From the experiments, 26 mutants out of 30 active (IC50 < 10 000nM) derivatives were found to be more active compared to EPI-X4. Here, we employed these data to evaluate the ranking power of our protein–peptide affinity model. For doing this, we use the enrichment factor (EF),[68] defined as (eq ):where I represents a fixed number of top-ranked instances based on predicted values; Nactivetop is the number of active peptides, within the top I instances of the dataset; Nactive is the number of active peptides; and Ntotal is the total number of peptides in the whole dataset. Peptides with IC50 < 10 000 nM were defined as active for an initial test. In a second and more stringent evaluation, the active peptides were considered as those more active than EPI-X4. The structures of the complexes, formed by CXCR4 and each peptide mutant, were generated via homology modeling using as a template the structural model of the complex CXCR4/EPI-X4, reported by Sokkar et al.[69,70] We estimated the BA of each complex with different predictors, as well as the associated enrichment factors. Figure shows the results of the tests corresponding to (1) the conventional activity test using the activity threshold determined by the competition assay (IC50 = 10 000 nM) and (2) the peptides with an affinity higher than EPI-X4. PPI-Affinity achieved the maximum enrichment EF5 = EF10 = EF15 = 1.9 for this test, i.e., the 15 top-ranked instances according to the PPI-Affinity estimation are active. In the case of the stringent test, taking only those with higher affinity than EPI-X4 as active, the maximum EF was obtained within the top 10 peptides, i.e., EF5 = EF10 = 1.9, while the enrichment within the top 15 derivatives also reached a high value EF15 = 1.8. The high enrichment factor within the top 5, 10, and 15 peptides, representing more than 25% of the dataset, evidenced the remarkable ranking capabilities of PPI-Affinity. The other tools show moderate to high enrichment values although below the level reached by PPI-Affinity (Figure ). Noteworthy are CP_PIE and Kdeep, which deliver a steady high performance in the different tests.

Figure 4

Performance of protein–protein and protein–ligand BA predictors on the set of 56 derivatives of EPI-X4. The performance of the methods is based on the enrichment factor (EF) obtained among the top 5, 10, and 15 ranked candidates. Two results are shown per tool, one corresponding to the activity test (AT), and the second corresponding to the peptides with an affinity higher than EPI-X4 (ATEPI-X4).

Case Study II: Ranking the Affinity of Peptides for the PDZ Domains of HtrAs

High-temperature requirement serine proteases (HtrAs) are involved in many physiological processes and neurodegenerative diseases such as Alzheimer’s disease and CARASIL.[71] These proteases are largely regulated by an allosteric mechanism whose initial step is the interaction of polypeptide chains with the peripheric PDZ domain. Here, we challenged PPI-Affinity with a series of peptides bound to PDZ domains of two human HtrAs: HTRA1 and HTRA3. For these sets of protein–peptide interactions, we compared the relative ranking based on the binding affinity predicted by PPI-Affinity to the ranking based on the experimentally determined IC50 values[72] (Tables and 4).

Table 3

Ranking of BA of Protein–Peptide Interactions in HTRA1 as Predicted by PPI-Affinity and Based on the Experimental IC50 Valuesc

PPI-Affinity		experimental
ranking	ΔG (kcal/mol)	ranking	(IC50 (μM))a
(1) DSAIWWV	–8.8	(1) DSRIWWV	0.9 ± 0.1
(2) GWKTWIL	–8.5	(2) DARIWWV	1.3 ± 0.1
(3) WDKIWHV	–8.2	(3) DSAIWWV	2.5 ± 0.4
(4) DSRIWWV	–8.1	(4) WDKIWHV	2.8 ± 0.3
(5) DARIWWV	–8.0	(5) ASRIWWV	2.8 ± 0.3
(6) ASRIWWV	–8.0	(6) DSRIWWA	3.5 ± 0.9
(7) DIETWLL	–7.8	(7) DSRIWAV	6 ± 1
(8) DSRIWWA	–7.3	(8) GWKTWIL	7.7 ± 0.6
(9) DSRAWWV	–7.3	(9) DSRAWWV	13 ± 1
(10) DSRIWAV	–7.2	(10) DIGPVCFL	16 ± 3
(11) DIGPVCFLb	–7.1	(11) DIETWLL	23 ± 3
(12) EVKIMVVb	–7.0	(12) EVKIMVV	24 ± 8
(13) DSRIAWV	–6.9	(13) DSRIAWV	40 ± 5

Values taken from ref (72).

Protein–peptide structures that are at the border of the applicability domain of PPI-Affinity.

No protein–peptide structure is outside the applicability domain.

Table 4

Ranking of BA of Protein–Peptide Interactions in HTRA3 as Predicted by PPI-Affinity and Based on the Experimental IC50 Values

PPI-Affinity		experimental
ranking	ΔG (kcal/mol)	ranking	(IC50 (μM))a
(1) FGAWVc	–7.7	(1) FGRWV	0.6 ± 0.1
(2) FGRWVb	–7.5	(2) RSWWV	0.6 ± 0.1
(3) FGRWIc	–7.5	(3) FGAWVb	0.9 ± 0.1
(4) RSWWV	–7.4	(4) FGRWIb	1.0 ± 0.1
(5) FGRWFc	–7.3	(5) GRWV	1.0 ± 0.1
(6) GRWVb	–7.2	(6) FARWVb	1.1 ± 0.2
(7) FGRWAb	–7.1	(7) RWV	1.3 ± 0.1
(8) FGRAVb	–6.9	(8) FGRWL	2.9 ± 0.3
(9) FGRWLb	–6.6	(9) FGRWA	3.5 ± 0.3
(10) WVc	–6.6	(10) WVb	4.7 ± 0.4
(11) WAc	–6.5	(11) FGRWFb	7.7 ± 0.8
(12) FARWVc	–6.4	(12) WAb	14 ± 1
(13) WGc	–6.2	(13) WGb	22 ± 3
(14) RWVb	–5.2	(14) FGRAV	270 ± 110

Values from ref (72).

Protein–peptide structures are at the border of the applicability domain of PPI-Affinity.

Protein–peptide structures that are outside the applicability domain of PPI-Affinity.

Values taken from ref (72). Protein–peptide structures that are at the border of the applicability domain of PPI-Affinity. No protein–peptide structure is outside the applicability domain. Values from ref (72). Protein–peptide structures are at the border of the applicability domain of PPI-Affinity. Protein–peptide structures that are outside the applicability domain of PPI-Affinity. The prediction of PPI-Affinity suggests that three of the peptides have more affinity for the PDZ domain of HTRA1 than the experimentally determined best binder DSRIWWV (in bold, Table ). However, the calculated binding affinities of those peptides differ from that of DSRIWWV by only 0.1, 0.4, and 0.7 kcal/mol (for WDKIWHV, GWKTWIL, and DSAIWWV, respectively). These differences are very small and within the MAE of PPI-Affinity (Table ). The rest of the peptides are correctly predicted by the method as weaker binders than DSRIWWV to the PDZ domain of HTRA1. We note, however, that the binding energy differences for most of these peptides also fall within the MAE reported for our method based on the test set of protein–peptide affinity (Table ). Interestingly, even the two peptides (DIGPVCFL and EVKIMVV) at the border of the applicability domain of our method are correctly predicted as weaker binders to the PDZ domain of HTRA1 compared to DSRIWWV. Nevertheless, the predicted values for peptides that are outside the applicability domain of PPI-Affinity should be considered with care. The Kd of the complex between HTRA1 and DSRIWWV is experimentally reported as 1.3 ± 0.2 μM, which corresponds to ΔG = −8.2 ± 0.1 kcal/mol at the experimental temperature of 301.15 K. This value is in excellent agreement with the value of −8.1 kcal/mol predicted by PPI-Affinity (Table ). We also tested our method on peptides binding to the PDZ domain of HTRA3 (Table ). In this set of protein–peptide complexes, eight systems are outside the applicability domain of PPI-Affinity. As shown in Table , only one peptide (FGAWV) is incorrectly predicted to have better BA for HTRA3 than the best experimental binder FGRWV. We note that FGRWV lies outside the applicability domain of the model, and this prediction should be taken with caution. In addition, we measured the overall ranking power of PPI-Affinity by calculating Kendall’s correlation coefficient[73] on both test sets, HTRA1 and HTRA3 protein–peptide complexes (Table ). The results obtained with PRODIGY, Kdeep, RF-Score, DFIRE, and CP_PIE are also shown for comparison. The binding affinity values delivered by each tool are listed in Supporting Information Section SI-7.

Table 5

Correlation of IC50 Values with the Estimations from PPI-Affinity and Other State-of-the-Art BA Predictors on the Sets of HTRA1’s and HTRA3’s PDZ Binders

	R		τ
method	HTRA1	HTRA3	HTRA1	HTRA3
PRODIGY	0.25	0.29	0.18	0.01
DFIRE	0.59	0.24	0.38	0.15
CP_PIE	–0.11	–0.03	–0.25	–0.08
Kdeep*	0.38	0.11	0.16	0.41
RF-Score*	0.56	0.13	0.56	0.27
PPI-Affinity	0.63	0.02	0.59	0.42

Kendall’s tau coefficient is a robust nonparametric measure highly suitable to discriminate correlated from uncorrelated variables. This measure involves two main magnitudes: the number of concordant and discordant pairs. Two observations (x, y) and (x, y) are classified as a concordant pair if x > x, and y > y, or vice versa x < x and y < y. If none of these conditions are true, the pair is known as discordant. The Kendall’s tau correlation coefficient (τ) is defined as (eq )where Nc and Nd are the number of concordant and discordant pairs, respectively; Nt is the number of ties in the order of the binding affinity determined by PPI-Affinity; and Nu is the number of ties in the experimental binding affinity. The Kendall’s correlation coefficient is equal to 1 if the calculated ranking of the peptides completely agrees with the ranking determined experimentally. Importantly, here we correlate IC50 values with predicted binding free energy changes; these magnitudes are expected to be linked by a nonlinear relation, which then violates the linearity requirement for the proper interpretation of Pearson’s correlation coefficient. Besides, the small size of the data induces violations of the bivariate normality requisite of this coefficient. Consequently, the use of a robust nonparametric measure such as Kendall’s tau coefficient is a requirement to achieve a correct interpretation of the correlation tests. The Pearson’s correlation coefficient obtained by PPI-Affinity (R = 0.02) in the HTRA3 test set evidences the lack of correlation with the experimentally measured IC50 values. Beyond the previously noted shortcomings of Pearson’s coefficient to assess these data, this result can be also a consequence of half of the peptides in the test set being outside the applicability domain of PPI-Affinity. For HTRA1 binders, where all cases are within the applicability domain, PPI-Affinity delivers the highest correlation coefficient value (R = 0.63). In terms of Kendall’s tau coefficient, PPI-Affinity produces the highest values for both targets (τ = 0.59 and τ = 0.42 for HTRA1 and HTRA3, respectively), evidencing its better ranking power compared to other state-of-the-art predictors. Interestingly, RF-Score gives the second-best performance on the HTRA1 test set with τ = 0.56, which is diminished by about half on the HTRA3 test set (τ = 0.27). Conversely, Kdeep’s performance is close to PPI-Affinity’s on the HTRA3 test set (τ = 0.41), but Kdeep delivers the lowest positive Kendall’s tau value on the HTRA1 test set (τ = 0.16). The weak performance of PRODIGY on both test sets might be related to the limitations of its protein–protein affinity predictor to evaluate smaller protein–peptide complexes. An important advantage of PPI-Affinity is that, thanks to two tailored models for protein–protein and protein–peptide complexes, PPI-Affinity can deliver comparable performance levels for both types of biomolecular systems. In short, the evaluation of BA predictors on different tests and test sets evidenced that state-of-the-art BA predictors, either intended for protein–protein or protein–ligand complexes, deliver low accuracy in the prediction of the BA of protein–peptide complexes. Although it improves the accuracy of the state-of-the-art approaches only slightly, PPI-Affinity is a solution to that issue since it delivers a significantly better and robust performance among different tests. This fact tackles the inherent issues in generalization and overfitting that apparently affect other predictors. The shortcomings in the accuracy of predictions of absolute binding free energy values are to a large extent a consequence of the inherent deviations in the experimental data available for training, which arise from uneven standards under the experimental conditions and measurement procedures. Nonetheless, within such a noisy scenario, the support vector machine models developed by us allow making predictions that improve to some extent the state of the art at the same time of showing steady performance in both absolute binding affinity prediction and ranking assessments.

Web Server Implementation

We implemented PPI-Affinity as a web server to facilitate the use of our models. PPI-Affinity is a suite organized in three modules, which are summarized below by order of increasing complexity of their functionality.

Binding Affinity Predictor

This method is the direct application of the prediction models in a set of PDB files with protein–protein and protein–peptide complexes that should be provided by the users. The module has the functionality to characterize diverse complexes, whose structures were obtained from external sources, based on their binding affinity.

Build & Predict

This module follows the same purpose as the previous one. However, instead of receiving coordinate files with the complexes of interest, it only requires a template file (in PDB format) and a list of amino acid sequences (in FASTA format). Then, the server builds five structural homology models for each sequence in the list, using MODELLER, and selects the structure with the lowest DOPE as the best candidate to represent the complex. Subsequently, it scores all the complexes using the predictor of binding affinity. The Build & Predict module is particularly suitable for screening structurally similar complexes for which no structure has been elucidated.

Protein Engineering

This third and most comprehensive module allows for the automatic generation and scoring of mutations at the interface of the complexes, which aims to optimize the affinity of these complexes. Figure depicts the workflow of the module, which encompasses the next steps:

Figure 5

Workflow of the three modules included in PPI-Affinity. For each module, the required input data and associated steps are indicated. The main difference among them lies in the input data: the Binding Affinity predictor module receives as input a set of protein–protein/peptide complexes (in PDB format); the Build & Predict module generates the complexes from a template file and a list of amino acid sequences (in FASTA format) provided by the user; and the Protein Engineering module receives as input only a template model, generates a list of derivatives for one of the protein/peptide contained in the PDB file, and calculates homology models for all created mutants. Regardless of the module, once the PDB files have been prepared, PPI-Affinity computes the structural descriptors with ProtDCal, estimates the BA values with the machine learning models, and returns the list of derivatives ranked by their BA values.

Step 1: Template Input

The user provides an input structure of the complex (in PDB format) and optionally the amino acid sequences of the chains in FASTA format. In addition, the user must specify which chain will be optimized.

Step 2: Mutants Generation

Next, a maximum of 10 000 derivatives is constructed for the specified chain. The generation of mutants is controlled by the following parameters: the maximum number of modifications to the reference sequence, deletion and mutation probabilities, maximum molecular weight, type of mutations, and a mutability vector whose elements take value 0 for residues that remain unmodified and a value between 0 and 1 representing the probability of modifying each position. The types of mutations are defined by groups of amino acids; the types are “Conservative” formed by the groups Polar (NCQHSTG), Acid (DE), Basic (KR), Non-polar (AILMPV), and Aromatic (WYF) residues; “Conservative extended” for Polar extended (NCQHSTGDEKR) and Non-polar extended (AILMPVWYF) residues; and “Unrestrained” allows the residues to be changed by any other type of residue.

Step 3: Homology Modeling

In this step, five structural models per derivative are generated with MODELLER. Among them, the structure with the lowest DOPE value is selected as the best model for the corresponding mutant.

Step 4: BA Estimation

The binding free energy is calculated using either the protein–protein or the protein–peptide affinity predictor. The selection of the estimator depends on the lengths of the chains in the structure; if a chain contains less than 30 residues, the protein–peptide predictor is used, otherwise the protein–protein estimator is applied.

Step 5: Ranking of Mutants

Finally, the mutants will be arranged in decreasing or increasing order of affinity, and either all or a selection of top candidates are returned to the user. The two sorting schemes allow the use of this module not only as an optimizer of the complex but also to spot mutations that can largely destabilize the complex of interest.

Applicability Domain

Defining the applicability domain (AD) of a model is an important step before the deployment of a predictor as it allows to provide insights into the reliability of the estimations in new systems.[74] Here, the AD is the subspace defined by the value range of the variables of the models (structural descriptors) in our training dataset (Supporting Information Tables SI-7 and SI-8). Thus, the descriptors’ value of a new complex is checked to determine whether this structure is within the AD of the model. The result of this analysis is provided to the users alongside the predicted binding affinity value. Estimations associated with instances outside the AD should be interpreted with special care and probably corroborated by other methods. Additionally, Supporting Information Table SI-9 summarizes the sizes of the peptides and receptors used to train and test the protein–peptide model. Supporting Information Tables SI-10 and SI-11 present the descriptive statistics of the training, development, and test sets used in the modeling.

Implementation Details

The frontend of PPI-Affinity was implemented with the Python Framework Django v3.1.1 for uploading and validating the user data. The backend, which constitutes the core of the program, was programmed in Python 3. Internally, PPI-Affinity uses MODELLER[37−40] to create the new homology structures. ProtDCal[42] is used to calculate the structural descriptors required by the SVM models, and Weka 3.8.4[52] is the interpreter of these predictors to predict the binding free energy of the generated structures. Finally, a queuing system is employed for the management of the jobs sent by the users.

Conclusions

We developed PPI-Affinity, a binding free energy predictor targeting protein–protein and protein–peptide complexes specifically. This fast-screening tool moderately outperformed the predictions and ranking power of similar empirical predictors. The performance of the models was evaluated on various test sets, which include a largely used benchmark set for empirical binding free energy predictors and scoring functions, as well as new augmented datasets gathered in this work from BioLip and PDBbind. We also evaluated the ranking power of the protein–peptide model on a set of EPI-X4 derivatives and HtrAs peptide binders. Altogether, these tests highlight PPI-Affinity, not only as a top-ranked predictor but also as the most robust tool with respect to performance in different tests. Furthermore, we implemented our models in a freely available web server that incorporates diverse functionalities that allow the screening of protein complexes as well as the engineering of the amino acid composition at the interface of the complex, to enhance the binding affinity or to spotlight critical mutants that may destabilize the interaction. The PPI-Affinity web server is thus a versatile tool with a direct impact on the design of peptide binders as well as in protein engineering and design.

63 in total

1. PDB2PQR: an automated pipeline for the setup of Poisson-Boltzmann electrostatics calculations.

Authors: Todd J Dolinsky; Jens E Nielsen; J Andrew McCammon; Nathan A Baker
Journal: Nucleic Acids Res Date: 2004-07-01 Impact factor: 16.971

2. Scoring by intermolecular pairwise propensities of exposed residues (SIPPER): a new efficient potential for protein-protein docking.

Authors: Carles Pons; David Talavera; Xavier de la Cruz; Modesto Orozco; Juan Fernandez-Recio
Journal: J Chem Inf Model Date: 2011-01-07 Impact factor: 4.956

3. FireDock: fast interaction refinement in molecular docking.

Authors: Nelly Andrusier; Ruth Nussinov; Haim J Wolfson
Journal: Proteins Date: 2007-10-01

4. Quantitative prediction of protein-protein binding affinity with a potential of mean force considering volume correction.

Authors: Yu Su; Ao Zhou; Xuefeng Xia; Wen Li; Zhirong Sun
Journal: Protein Sci Date: 2009-12 Impact factor: 6.725

Review 5. Constraining cyclic peptides to mimic protein structure motifs.

Authors: Timothy A Hill; Nicholas E Shepherd; Frederik Diness; David P Fairlie
Journal: Angew Chem Int Ed Engl Date: 2014-10-06 Impact factor: 15.336

6. PRODIGY: a web server for predicting the binding affinity of protein-protein complexes.

Authors: Li C Xue; João Pglm Rodrigues; Panagiotis L Kastritis; Alexandre Mjj Bonvin; Anna Vangone
Journal: Bioinformatics Date: 2016-08-08 Impact factor: 6.937

7. Influence of Varying Training Set Composition and Size on Support Vector Machine-Based Prediction of Active Compounds.

Authors: Raquel Rodríguez-Pérez; Martin Vogt; Jürgen Bajorath
Journal: J Chem Inf Model Date: 2017-04-10 Impact factor: 4.956

8. Systematic Identification of Machine-Learning Models Aimed to Classify Critical Residues for Protein Function from Protein Structure.

Authors: Ricardo Corral-Corral; Jesús A Beltrán; Carlos A Brizuela; Gabriel Del Rio
Journal: Molecules Date: 2017-10-09 Impact factor: 4.411

Review 9. Machine Learning Methods in Drug Discovery.

Authors: Lauv Patel; Tripti Shukla; Xiuzhen Huang; David W Ussery; Shanzhi Wang
Journal: Molecules Date: 2020-11-12 Impact factor: 4.411

Review 10. Recent advances in the development of protein-protein interactions modulators: mechanisms and clinical trials.

Authors: Haiying Lu; Qiaodan Zhou; Jun He; Zhongliang Jiang; Cheng Peng; Rongsheng Tong; Jianyou Shi
Journal: Signal Transduct Target Ther Date: 2020-09-23