| Literature DB >> 30020406 |
Ruben Sanchez-Garcia1, C O S Sorzano1, J M Carazo1, Joan Segura1.
Abstract
Motivation: Protein-Protein Interactions (PPI) are essentials for most cellular processes and thus, unveiling how proteins interact is a crucial question that can be better understood by identifying which residues are responsible for the interaction. Computational approaches are orders of magnitude cheaper and faster than experimental ones, leading to proliferation of multiple methods aimed to predict which residues belong to the interface of an interaction.Entities:
Mesh:
Substances:
Year: 2019 PMID: 30020406 PMCID: PMC6361243 DOI: 10.1093/bioinformatics/bty647
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.BIPSPI workflow. Sequence-base and structural features are used to codify pairs of residues. At first step, XGBoost classifier is fed with encoded pairs in order to obtain interacting pairs predictions. Interacting pairs scores are combined with original features and fed to a second step classifier. Lastly, interacting predictions obtained in step two are converted to binding site predictions employing our scoring function
Performance evaluation for BIPSPI leave-one-out over the DBv5, DBv3 and DImS complexes and comparison with other methods
| Algorithm | Dataset | Input | Residue–residue contact prediction | Binding site prediction | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| BIPSPI | DImS | Seq | 0.7469 | 0.7300 | 0.0170 | 0.6883 | 0.6741 | 0.3375 | 0.2330 | 0.3592 | 0.4264 | 0.8219 | 0.8595 |
| Struc* | 0.8800 | 0.8909 | 0.0432 | 0.7940 | 0.7816 | 0.4739 | 0.3679 | 0.4750 | 0.5098 | 0.8680 | 0.8832 | ||
| Struc | 0.8789 | 0.8875 | 0.0439 | 0.7985 | 0.7847 | 0.4772 | 0.3779 | 0.4416 | 0.5983 | 0.8228 | 0.8974 | ||
| DBv5 | Seq | 0.8024 | 0.8137 | 0.0110 | 0.7286 | 0.7527 | 0.3049 | 0.2791 | 0.3003 | 0.4828 | 0.8349 | 0.9322 | |
| Struc* | 0.9011 | 0.9184 | 0.0238 | 0.8046 | 0.8154 | 0.3967 | 0.3721 | 0.4012 | 0.5079 | 0.9037 | 0.9353 | ||
| Struc | 0.9052 | 0.9188 | 0.0234 | 0.8235 | 0.8225 | 0.4104 | 0.3855 | 0.3910 | 0.5585 | 0.8895 | 0.9407 | ||
| DBv3 | Seq | 0.8153 | 0.8154 | 0.0113 | 0.7361 | 0.7492 | 0.3041 | 0.2830 | 0.3233 | 0.4396 | 0.8828 | 0.9251 | |
| Struc* | 0.9024 | 0.9186 | 0.0269 | 0.8103 | 0.8136 | 0.4081 | 0.3712 | 0.4223 | 0.4815 | 0.9112 | 0.9287 | ||
| Struc | 0.9044 | 0.9131 | 0.0234 | 0.8157 | 0.8163 | 0.4058 | 0.3730 | 0.3831 | 0.5458 | 0.8871 | 0.9383 | ||
| PAIRpred | Dv3 | Seq | 0.809 | NA | NA | 0.708 | 0.708 | NA | NA | NA | NA | NA | NA |
| Struc-d | 0.8783 | 0.8930 | 0.0125 | 0.7587 | 0.6913 | 0.2012 | 0.1807 | 0.1680 | 0.7809 | 0.5030 | 0.9470 | ||
| Struc-p | 0.8783 | 0.8930 | 0.0125 | 0.7689 | 0.7741 | 0.3412 | 0.3112 | 0.3716 | 0.4197 | 0.8987 | 0.9256 | ||
| PPiPP | Dv3 | Seq | 0.729 | NA | NA | 0.661 | 0.661 | NA | NA | NA | NA | NA | NA |
Note: Seq, Sequence-based features only; Struct*, Structural and sequence-based features one step; Struc, Structural and sequence-based features two steps (default). Struc-d, PAIRpred structural and sequence-based features and maximum as scoring function (default); Struc-p, PAIRpred structural and sequence-based features and proposed scoring function. NA, Not available
Performance evaluation for BIPSPI interface scores estimated by a leave-one-out cross-validation over the complexes compiled in DBv5 using different scoring strategies
| Algorithm | Input | Binding site prediction | |||||
|---|---|---|---|---|---|---|---|
| Seq | 0.2968 | 0.2740 | 0.3005 | 0.4679 | 0.8617 | 0.9272 | |
| Struc | 0.4043 | 0.3826 | 0.3947 | 0.5444 | 0.8940 | 0.9392 | |
| Seq | 0.3049 | 0.2791 | 0.3003 | 0.4828 | 0.8349 | 0.9322 | |
| Struc | 0.4104 | 0.3855 | 0.3910 | 0.5585 | 0.8895 | 0.9407 | |
| Maximun | Seq | 0.1955 | 0.1684 | 0.1761 | 0.6459 | 0.6163 | 0.9320 |
| Struc | 0.3199 | 0.2977 | 0.2679 | 0.6394 | 0.7780 | 0.9444 | |
Note: , Proposed scoring function; + wAVG, Proposed scoring function followed by averaging along sequence (default); Seq, Sequence-based features only; Struct, Structural and sequence-based features two steps (default).
Fig. 2.BIPSPI interface predictions for the proteins included in pdb 4ov6 bioassembly number 2. Subtilase domain of PCKS9 protein (pdb-chain E), surface representation. Peptide inhibitor domain of PCKS9 (pdb-chain D), green ribbon or trace schema. PCSK9-binding adnectin protein (pdb-chain G), magenta ribbon or trace schema. (A) Normalized binding site prediction scores for the prediction of the PCKS9 subtilase domain (heat map surface) interacting with the peptide inhibitor domain (green ribbon). Scores for all residues are displayed. (B) Normalized binding site predicted scores for the prediction of the PCKS9 subtilase domain (heat map surface) interacting with PCSK9-binding adnectin protein (magenta ribbon). Scores for all residues are displayed. (C) Compact representation of (A) and (B) in which just the highest score binding site residues for each interacting binding site are depicted. For the PCKS9 subtilase domain (grey surface), residues that interact with the peptide inhibitor domain (green ribbon) are coloured in lemon-green and in light-pink when they interact with the PCSK9-binding adnectin protein (magenta). (D) Residue spheres representation of the top four highest score residue predictions coloured in light-pink for the PCSK9-binding adnectin protein (magenta) and lemon-green for the peptide inhibitor domain (green) (Color version of this figure is available at Bioinformatics online.)