| Literature DB >> 29409446 |
Sebastian Daberdaku1, Carlo Ferrari2.
Abstract
BACKGROUND: The correct determination of protein-protein interaction interfaces is important for understanding disease mechanisms and for rational drug design. To date, several computational methods for the prediction of protein interfaces have been developed, but the interface prediction problem is still not fully understood. Experimental evidence suggests that the location of binding sites is imprinted in the protein structure, but there are major differences among the interfaces of the various protein types: the characterising properties can vary a lot depending on the interaction type and function. The selection of an optimal set of features characterising the protein interface and the development of an effective method to represent and capture the complex protein recognition patterns are of paramount importance for this task.Entities:
Keywords: 3D Zernike Descriptors; Protein–protein interface prediction; SVM
Mesh:
Substances:
Year: 2018 PMID: 29409446 PMCID: PMC5802066 DOI: 10.1186/s12859-018-2043-3
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
The HQI8 subset of amino acid indices from the AAindex database
| Entry name | Description |
|---|---|
| BLAM930101 | Alpha helix propensity of position 44 in T4 lysozyme [ |
| BIOV880101 | Information value for accessibility; average fraction 35% [ |
| MAXF760101 | Normalized frequency of alpha-helix [ |
| TSAJ990101 | Volumes including the crystallographic waters using the ProtOr [ |
| NAKH920108 | AA composition of MEM of multi-spanning proteins [ |
| CEDJ970104 | Composition of amino acids in intracellular proteins (percent) [ |
| LIFS790101 | Conformational preference for all beta-strands [ |
| MIYS990104 | Optimized relative partition energies - method C [ |
The four basic kernel functions
| Kernel name | Mathematical formulation |
|---|---|
| Linear |
|
| Polynomial |
|
| Radial basis function (RBF) | |
| Sigmoid |
|
γ, r and d are kernel parameters
Performance measures for the binary classification problem: TP – true positives, TN – true negatives, FP – false positives, FN – false negatives
| Measure | Mathematical formulation | Comment |
|---|---|---|
| Accuracy | A | Indicates the fraction of correct predictions over the total: not very significant when dealing with imbalanced data. |
| Precision | P | Indicates the fraction of relevant instances among the retrieved ones. |
| Recall | R | Indicates the fraction of relevant instances that have been retrieved over the total relevant instances. |
| F1 score | F | It is the harmonic mean of precision and recall. |
| Matthews correlation coefficient | MCC | Returns a value between −1 and +1: +1 represents a perfect prediction, 0 no better than random prediction and −1 indicates total disagreement between prediction and observation. |
Training and test split for each of the 16 protein classes in the Protein–Protein Docking Benchmark 5.0
| Dataset | Training set | Test set |
|---|---|---|
| A | 1AY1.HL (1BGX), 1BVL.BA (1BVK), 2FAT.HL (2FD6), 2I24.N (2I25), 3EO0.AB (3EO1), 3G6A.LH (3G6D), 3HMW.LH (3HMX), 3L7E.LH (3L5W), 3MXV.LH (3MXW), 3V6F.AB (3V6Z), 4GXV.HL (4GXU) | 1FGN.LH (1AHW), 1DQQ.CD (1DQJ), 1QBL.HL (1WEJ), 1GIG.LH (2VIS), 2VXU.HL (2VXT), 3RVT.CD (3RVW), 4G5Z.HL (4G6J) |
| A | 1TAQ.A (1BGX), 3LZT (1BVK), 1A43 (1E6J), 1YWH.A (2FD6), 1IK0.A (3G6D), 1F45.AB (3HMX), 3M1N.A (3MXW), 3F5V.A (3RVW), 3KXS.F (3V6Z), 1DOL.A (4DN4), 4I1B.A (4G6J), 1RUZ.HIJKLM (4GXU) | 1TFH.A (1AHW), 1HRC (1WEJ), 2VIU.ACE (2VIS), 1J0S.A (2VXT), 1QM1.A (2W9E), 1TGJ.AB (3EO1), 3F74.A (3EOA), 2FK0.ABCDEF (4FQI) |
| AB | 1BJ1.HL (1BJ1), 1FSK.BC (1FSK), 1I9R.HL (1I9R), 1K4C.AB (1K4C), 1KXQ.H (1KXQ), 2JEL.HL (2JEL), 1QFW.HL (9QFW) | 1IQD.AB (1IQD), 1NCA.HL (1NCA), 1NSN.HL (1NSN), 1QFW.IM (1QFW), 2HMI.CD (2HMI) |
| AB | 2VPF.GH (1BJ1), 1BV1 (1FSK), 1D7P.M (1IQD), 7NN9 (1NCA), 1HRP.AB (1QFW), 1S6P.AB (2HMI), 1POH (2JEL) | 1ALY.ABC (1I9R), 1JVM.ABCD (1K4C), 1PPI (1KXQ), 1KDC (1NSN) |
| EI | 1QQU.A (1AVX), 1PIG (1BVN), 1JAE.A (1CLV), 1EAX.A (1EAW), 1TRM.A (1EZU), 4PEP (1F34), 2PKA.XY (1HIA), 1AKL.A (1JIW), 3GMU.B (1JTG), 1QLP.A (1OPH), 1SCD.A (1OYV), 1X9Y.A (1PXV), 2DCY.A (2B42), 966C.A (2J0T), 1ZM8.A (2O3B), 1SUP (2SIC), 1A3S.A (3A4S), 2QA9.E (3SGQ), 3VLA.A (3VLB), 4HWX.AB (4HX3), 1UNK.D (7CEI) | 2CGA.B (1ACB), 1RGH.B (1AY7), 1HCL (1BUH), 2TGT (1D6R), 9RSA.B (1DFJ), 9EST.A (1FLE), 1CK7.A (1GXD), 3QI0.A (1JTD), 1J06.B (1MAH), 1UDH. (1UDI), 2GHU.A (1YVB), 1KWM.A (1ZLI), 8CPA.A (4CPA), 1ERK.A (4IZ7) |
| EI | 1EGL (1ACB), 1BA7.B (1AVX), 1HOE (1BVN), 1HPT (1CGI), 1QFD.A (1CLV), 1F32.A (1F34), 1PMC.A (1GL1), 1BX8 (1HIA), 1BTL.A (1JTD), 1ZG4.A (1JTG), 1UTQ.A (1OPH), 1PJU.A (1OYV), 1LU0.A (1PPE), 1NYC.A (1PXV), 1B1U.A (1TMQ), 1CEW.I (1YVB), 2JTO.A (1ZLI), 1ZFI.A (2ABZ), 1T6E.X (2B42), 1D2B.A (2J0T), 2NNR.A (2OUL), 2CI2.I (2SNI), 2UUX.A (2UUY), 3A4R.A (3A4S), 3VL8.A (3VLB), 1C7K.A (4HX3) | 1A19.B (1AY7), 1DKS.A (1BUH), 1K9B.A (1D6R), 2BNH (1DFJ), 9PTI (1EAW), 1ECZ.AB (1EZU), 2REL.A (1FLE), 1BR9.A (1GXD), 2RN4.A (1JIW), 1FSC (1MAH), 2GKR.I (1R0R), 2UGI.B (1UDI), 1J57.A (2O3B), 3SSI (2SIC), 1H20.A (4CPA), 2LS7.A (4IZ7), 1M08.B (7CEI) |
| ER | 1IXM.AB (1F51), 1BU6.O (1GLA), 1AUQ (1M10), 1JXQ.A (1NW9), 1B3K.A (1OC0), 1R6C.X (1R6Q), 2FXS.A (1US7), 2AYN.A (2AYO), 3OWG.A (2GAF), 1L7E.AB (2OOR), 1YZU.A (2OT3), 2YVF.A (2YVJ), 2D1I.A (2Z0E), 2EDI.A (3FN1), 1BPB.A (3K75), 1UPL.A (4FZA) | 1AUQ (1IJK), 1JMJ.A (1JMO), 3EED.AB (1JWH), 1JZO.AB (1JZD), 1V8Z.AB (1WDW), 1MH1 (2NZ8), 4JJ7.AB (3H11), 3LVM.AB (3LVK), 3PC6.A (3PC8), 1XVB.ABCDEF (4GAM) |
| ER | 1SRR.C (1F51), 1FVU.AB (1IJK), 2OPY.A (1NW9), 2W0G.A (1US7), 1GEQ.A (1WDW), 1VPT.A (2GAF), 1NTY.A (2NZ8), 1E3T.A (2OOR), 1TXU.A (2OT3), 2E4P.A (2YVJ), 1V49.A (2Z0E), 2LQ7.A (3FN1), 1DCJ.A (3LVK), 3PC7.A (3PC8), 3GGF.A (4FZA), 1CKV.A (4GAM) | 1F3Z.A (1GLA), 2CN0.HL (1JMO), 3C13.A (1JWH), 1JPE.A (1JZD), 1M0Z.B (1M10), 2JQ8.A (1OC0), 2W9R.A (1R6Q), 2FCN.A (2AYO), 3H13.A (3H11), 3K77.A (3K75) |
| ES | 1E1N.A (1E6E), 1GJR.A (1EWY), 1B39.A (1FQ1), 1N0V.C (1ZM4), 3UIU.A (2A1A), 2BBK.JM (2MTA), 1SUR.A (2O8V), 2OOA.A (2OOB), 1GIQ.A (4H03), 4LW2.AB (4LW4) | 1CL0.A (1F6M), 1QUP.A (1JK9), 1JB1.ABC (1KKL), 1L6P (1Z5Y), 1U90.A (2A9K), 1J54.A (2IDO), 1CCP (2PCC) |
| ES | 1CJE.D (1E6E), 1CZP.A (1EWY), 1FPZ.F (1FQ1), 2JCW.A (1JK9), 2HPR (1KKL), 1Q46.A (2A1A), 2C8B.X (2A9K), 1SE7.A (2IDO), 2RAC.A (2MTA), 1NI7.A (4LW4) | 2TIR.A (1F6M), 2B1K.A (1Z5Y), 1XK9.A (1ZM4), 1YJ1.A (2OOB), 1YCC (2PCC), 1IJJ.A (4H03) |
| OG | 1QG4.A (1A2K), 1AB8.AB (1AZS), 1CTQ.A (1BKD), 1MH1 (1E96), 1MH1 (1I4D), 5P21.A (1LFD), 6Q21.D (1WQ1), 2ZKM.X (2FJU), 1GFI.A (2GTP), 1MH1 (2H7V), 3CPI.G (3CPH) | 1TND.C (1FQJ), 1A4R.A (1GRN), 1MH1 (1HE1), 821P (1HE8), 1RRP.AB (1K5D), 1HUR.A (1R8S), 2BME.A (1Z0K), 1FKM.A (2G77) |
| OG | 1OUN.AB (1A2K), 1AZT.A (1AZS), 1HH8.A (1E96), 1RGP (1GRN), 1HE9.A (1HE1), 1OXZ.A (1J2J), 1LXD.A (1LFD), 1R8M.E (1R8S), 1WER (1WQ1), 1YZM.A (1Z0K), 1Z06.A (2G77) | 1FQI.A (1FQJ), 1TBG.DH (1GP2), 1A12.A (1I2M), 1F59.A (1IBR), 1YRG.B (1K5D), 2BV1.A (2GTP), 1G16.A (3CPH) |
| OR | 1BUY.A (1EER), 1QFK.HL (1FAK), 1B98.AM (1HCF), 1NOB.F (1KAC), 1MKF.AB (1ML0), 1FZV.AB (1RV6), 1BEC (1SBB), 1ACC.A (1T6B), 1U5Y.ABD (1XU1), 1JX6.A (1ZHH), 1YWH.A (2I9B), 3L88.ABC (3L89), 1H0C.AB (3R9A), 1N6U.A (3S9D) | 3AVE.AB (1E4K), 1C3D (1GHQ), 1G0Y.R (1IRA), 1MZN.AB (1K74), 1TGK (1KTZ), 1BQU.A (1PVH), 1R42.A (2AJF), 2BBA.A (2HLE), 1S62.A (2X9A) |
| OR | 1LY2.A (1GHQ), 1WWB.X (1HCF), 1EMR.A (1PVH), 1QSZ.A (1RV6), 1SHU.X (1T6B), 2HJE.A (1ZHH), 2GHV.E (2AJF), 1IKO.P (2HLE), 2I9A.A (2I9B), 2X9B.A (2X9A), 1CKL.A (3L89), 2C0M.A (3R9A), 1ITF.A (3S9D), 1M1U.A (4M76) | 1FNL.A (1E4K), 1ERN.AB (1EER), 1TFH.B (1FAK), 1ILR.1 (1IRA), 1ZGY.AB (1K74), 1F5W.B (1KAC), 1M9Z.A (1KTZ), 1DOL (1ML0), 1SE4 (1SBB), 1XUT.A (1XU1) |
| OX | 2CPL (1AK4), 2CLR.DE (1AKJ), 1IJJ.B (1ATN), 1D6O.A (1B6C), 1BDD (1FC2), 3CHY.A (1FFW), 1GRI.B (1GCQ), 1THF.D (1GPW), 1EAN.A (1H9D), 1D4T.AB (1M27), 1IAM.A (1MQ8), 1OFT.AB (1OFU), 1SYQ.A (1RKE), 2PAB.ABCD (1RLB), 1QGV.A (1SYX), 1XQR.A (1XQS), 2FXU.A (1Y64), 1FCH.A (2C0L), 1SZ7.A (2CFH), 2HRA.A (2HRK), 1NG1.A (2J7P), 3CX9.A (2VDB), 3AA7.AB (3AAA), 3BIX.A (3BIW), 1C3D.A (3D5S), 1P97.A (3F1P), 3MYI.A (3H2V), 3KOV.AB (3P57) | 1AVV.A (1EFN), 1QRQ.ABCD (1EXB), 1FC1.AB (1FCC), 1QJB.AB (1IB1), 1H15.AB (1KLU), 3MIN.ABCD (1N2C), 1HNF (1QA9), 2F0R.A (1S1Q), 1UCH (1XD3), 1M4Z.A (1ZHI), 1Y20.A (2A5T), 1BIZ.AB (2B4J), 1CRZ.A (2HQS), 3HEC.A (2OZA), 1EQF.A (3AAD), 1Z6R.AB (3BP8), 3BX8.A (3BX7), 3ODQ.AB (3SZK), 1VDD.ABCD (4JCV) |
| OX | 4J93.A (1AK4), 3DNI (1ATN), 1CX8.AB (1DE4), 1G83.A (1EFN), 1FC1.AB (1FC2), 2IGG.A (1FCC), 1FWP.A (1FFW), 1GCP.B (1GCQ), 1D0N.B (1H1V), 1STE (1KLU), 1MQ9.A (1MQ8), 2VAW.A (1OFU), 1CCZ.A (1QA9), 3MYI.A (1RKE), 1L2Z.A (1SYX), 1Z1A.A (1ZHI), 1Z9E.A (2B4J), 2BJN.A (2CFH), 1OAP.A (2HQS), 2IYL.D (2J7P), 3FYK.X (2OZA), 1MYO.A (3AAA), 1TEY.A (3AAD), 2R1D.A (3BIW), 2GOM.A (3D5S), 2HD7.A (3DAW), 1WI6.A (3H2V), 3IO2.A (3P57), 2H3K.A (3SZK), 1W3S.A (4JCV) | 1CD8.AB (1AKJ), 1IAS.A (1B6C), 1QDV.ABCD (1EXB), 1K9V.F (1GPW), 1ILF.A (1H9D), 1KUY.A (1IB1), 1KW2.B (1KXP), 2NIP.AB (1N2C), 1HBP (1RLB), 1YJ1.A (1S1Q), 1S3X.A (1XQS), 1UX5.A (1Y64), 2A5S.A (2A5T), 1PNE (2BTF), 1C44.A (2C0L), 2HQT.A (2HRK), 2J5Y.A (2VDB), 3BP3.A (3BP8), 3OSK.A (3BX7), 1X0O.A (3F1P) |
The table gives the PDB code and chain ID of each protein used in this study (the PDB code in parentheses identifies the corresponding bound complex in the DB5 database)
The number of interface (positive samples) and non-interface (negative samples) local surface patches in the balanced and unbalanced versions of the training set and in the test set for each protein class
| Protein complex class | Receptor ligand | Bound unbound | Balanced training set | Unbalanced training set | Test set | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Interface patches | Non-interface patches | Total | Interface patches | Non-interface patches | Total | Interface patches | Non-interface patches | Total | |||
| A | r | b | 4520 | 5859 | 10379 | 1162 | 31174 | 32336 | 629 | 22141 | 22770 |
| u | 4545 | 5867 | 10412 | 1169 | 31070 | 32239 | 621 | 22315 | 22936 | ||
| l | b | 4533 | 6164 | 10697 | 1155 | 32987 | 34142 | 674 | 20382 | 21056 | |
| u | 4809 | 5926 | 10735 | 1224 | 31529 | 32753 | 683 | 26222 | 26905 | ||
| AB | r | b | 2207 | 3806 | 6013 | 566 | 20284 | 20850 | 378 | 18852 | 19230 |
| u | 2234 | 3805 | 6039 | 580 | 20181 | 20761 | 444 | 19265 | 19709 | ||
| l | b | 2472 | 2315 | 4787 | 633 | 12329 | 12962 | 333 | 6175 | 6508 | |
| u | 2432 | 2319 | 4751 | 624 | 12316 | 12940 | 332 | 7877 | 8209 | ||
| EI | r | b | 8862 | 7350 | 16212 | 2254 | 39035 | 41289 | 1268 | 28696 | 29964 |
| u | 7927 | 7350 | 15277 | 2026 | 38872 | 40898 | 1299 | 29502 | 30801 | ||
| l | b | 11291 | 4172 | 15463 | 2890 | 22222 | 25112 | 1541 | 17243 | 18784 | |
| u | 13344 | 4170 | 17514 | 3397 | 22154 | 25551 | 1471 | 17085 | 18556 | ||
| ER | r | b | 5512 | 7404 | 12916 | 1392 | 39672 | 41064 | 1458 | 37028 | 38486 |
| u | 5072 | 7418 | 12490 | 1295 | 39631 | 40926 | 1165 | 36294 | 37459 | ||
| l | b | 7615 | 3779 | 11394 | 1953 | 20199 | 22152 | 973 | 13578 | 14551 | |
| u | 7218 | 3770 | 10988 | 1834 | 20293 | 22127 | 829 | 13053 | 13882 | ||
| ES | r | b | 2328 | 5429 | 7757 | 606 | 29129 | 29735 | 486 | 14498 | 14984 |
| u | 1821 | 5361 | 7182 | 462 | 28721 | 29183 | 401 | 14013 | 14414 | ||
| l | b | 3004 | 2231 | 5235 | 763 | 11934 | 12697 | 310 | 7704 | 8014 | |
| u | 2274 | 2227 | 4501 | 574 | 11866 | 12440 | 348 | 7658 | 8006 | ||
| OG | r | b | 3960 | 5557 | 9517 | 1008 | 29796 | 30804 | 873 | 18550 | 19423 |
| u | 3469 | 5700 | 9169 | 882 | 30468 | 31350 | 764 | 19318 | 20082 | ||
| l | b | 4501 | 2756 | 7257 | 1142 | 14773 | 15915 | 734 | 14996 | 15730 | |
| u | 3781 | 2803 | 6584 | 988 | 14887 | 15875 | 857 | 14844 | 15701 | ||
| OR | r | b | 5109 | 7344 | 12453 | 1298 | 39365 | 40663 | 696 | 19769 | 20465 |
| u | 4218 | 7305 | 11523 | 1082 | 39031 | 40113 | 1079 | 19306 | 20385 | ||
| l | b | 4691 | 3273 | 7964 | 1205 | 17471 | 18676 | 1012 | 14860 | 15872 | |
| u | 4163 | 3424 | 7587 | 1057 | 18219 | 19276 | 1635 | 14425 | 16060 | ||
| OX | r | b | 8894 | 10487 | 19381 | 2280 | 55923 | 58203 | 1831 | 62163 | 63994 |
| u | 9096 | 10630 | 19726 | 2332 | 56765 | 59097 | 1591 | 62829 | 64420 | ||
| l | b | 10146 | 8392 | 18538 | 2583 | 44821 | 47404 | 2035 | 32141 | 34176 | |
| u | 9560 | 9393 | 18953 | 2443 | 50234 | 52677 | 2119 | 33604 | 35723 | ||
The number of selected features belonging to each physico-chemical property and for each protein class. The + and − signs indicate, respectively, the descriptors of the positive and negative parts of the corresponding amino acid index
| Protein complex class | Receptor ligand | Bound unbound | Number of selected features | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Total | BLAM930101 + | BLAM930101 − | BIOV880101 + | BIOV880101 − | MAXF760101 | TSAJ990101 | NAKH920108 | CEDJ970104 | LIFS790101 | MIYS990104 + | MIYS990104 − | |||
| A | r | b | 109 | 6 | 9 | 0 | 0 | 13 | 11 | 24 | 2 | 0 | 14 | 30 |
| u | 96 | 9 | 9 | 6 | 0 | 4 | 13 | 20 | 3 | 2 | 11 | 19 | ||
| l | b | 89 | 5 | 11 | 1 | 4 | 7 | 21 | 5 | 21 | 7 | 5 | 2 | |
| u | 85 | 8 | 7 | 2 | 6 | 10 | 21 | 3 | 18 | 4 | 4 | 2 | ||
| AB | r | b | 117 | 28 | 0 | 0 | 8 | 4 | 10 | 17 | 20 | 0 | 1 | 29 |
| u | 108 | 26 | 0 | 0 | 8 | 3 | 11 | 13 | 16 | 0 | 0 | 31 | ||
| l | b | 78 | 9 | 8 | 3 | 7 | 7 | 6 | 3 | 19 | 5 | 5 | 6 | |
| u | 75 | 13 | 12 | 7 | 6 | 5 | 6 | 1 | 9 | 5 | 8 | 3 | ||
| EI | r | b | 105 | 2 | 0 | 0 | 19 | 11 | 31 | 14 | 10 | 6 | 5 | 7 |
| u | 129 | 10 | 12 | 1 | 24 | 8 | 30 | 14 | 11 | 10 | 3 | 6 | ||
| l | b | 91 | 6 | 2 | 0 | 4 | 15 | 21 | 1 | 4 | 6 | 17 | 15 | |
| u | 80 | 4 | 3 | 0 | 5 | 11 | 12 | 5 | 11 | 3 | 3 | 23 | ||
| ER | r | b | 115 | 7 | 0 | 1 | 5 | 23 | 12 | 1 | 23 | 6 | 12 | 25 |
| u | 126 | 11 | 1 | 1 | 2 | 3 | 14 | 1 | 27 | 25 | 20 | 21 | ||
| l | b | 100 | 6 | 8 | 10 | 14 | 24 | 4 | 1 | 8 | 3 | 14 | 8 | |
| u | 100 | 9 | 5 | 12 | 10 | 10 | 18 | 3 | 10 | 7 | 8 | 8 | ||
| ES | r | b | 84 | 14 | 0 | 0 | 8 | 1 | 10 | 2 | 18 | 7 | 4 | 20 |
| u | 79 | 4 | 4 | 7 | 13 | 5 | 20 | 6 | 8 | 3 | 1 | 8 | ||
| l | b | 83 | 0 | 0 | 9 | 10 | 15 | 7 | 0 | 5 | 15 | 15 | 7 | |
| u | 86 | 11 | 5 | 14 | 7 | 3 | 13 | 4 | 10 | 7 | 12 | 0 | ||
| OG | r | b | 102 | 6 | 9 | 4 | 5 | 29 | 7 | 1 | 8 | 12 | 18 | 3 |
| u | 107 | 11 | 13 | 2 | 5 | 23 | 16 | 6 | 11 | 9 | 10 | 1 | ||
| l | b | 92 | 8 | 3 | 6 | 4 | 19 | 7 | 8 | 4 | 8 | 7 | 18 | |
| u | 78 | 10 | 3 | 2 | 1 | 8 | 9 | 7 | 8 | 12 | 5 | 13 | ||
| OR | r | b | 97 | 14 | 0 | 1 | 4 | 23 | 12 | 1 | 5 | 13 | 7 | 17 |
| u | 68 | 11 | 0 | 3 | 6 | 5 | 9 | 1 | 18 | 4 | 0 | 11 | ||
| l | b | 79 | 3 | 0 | 0 | 2 | 14 | 6 | 1 | 17 | 17 | 17 | 2 | |
| u | 100 | 11 | 9 | 0 | 8 | 19 | 8 | 3 | 9 | 10 | 7 | 16 | ||
| OX | r | b | 141 | 7 | 14 | 1 | 1 | 9 | 22 | 19 | 34 | 15 | 11 | 8 |
| u | 122 | 19 | 4 | 2 | 8 | 12 | 12 | 9 | 24 | 10 | 6 | 16 | ||
| l | b | 132 | 19 | 9 | 2 | 14 | 27 | 12 | 3 | 10 | 10 | 15 | 11 | |
| u | 118 | 13 | 15 | 1 | 12 | 10 | 17 | 2 | 18 | 8 | 10 | 12 | ||
| Generic model | b | 83 | 7 | 0 | 0 | 8 | 8 | 9 | 0 | 15 | 5 | 13 | 18 | |
| u | 76 | 3 | 8 | 1 | 5 | 10 | 10 | 2 | 12 | 4 | 6 | 15 | ||
The selected (best) SVM model for each protein class, i.e. the penalty C, the kernel function and its parameters (γ, d, r)
| Protein complex class | Receptor ligand | Bound unbound | No. features | kernel function |
|
|
|
|
|---|---|---|---|---|---|---|---|---|
| A | r | b | 109 | sigmoid | 495.33 | 0.00054 | N/A | 1.44470 |
| u | 96 | rbf | 1365.14 | 0.00039 | N/A | N/A | ||
| l | b | 89 | linear | 46.05 | N/A | N/A | N/A | |
| u | 85 | linear | 221.64 | N/A | N/A | N/A | ||
| AB | r | b | 117 | poly | 23.87 | 0.03006 | 2 | 1.73464 |
| u | 108 | poly | 426.47 | 0.01110 | 3 | 0.01539 | ||
| l | b | 78 | poly | 2157.88 | 0.01906 | 7 | 0.17614 | |
| u | 75 | poly | 4362.45 | 0.03470 | 10 | -0.03613 | ||
| EI | r | b | 105 | poly | 1514.50 | 0.00003 | 3 | -0.15922 |
| u | 129 | sigmoid | 33.32 | 0.00029 | N/A | -1.61953 | ||
| l | b | 91 | sigmoid | 213.15 | 0.00065 | N/A | 0.47294 | |
| u | 80 | poly | 1916.02 | 0.01531 | 4 | 0.13840 | ||
| ER | r | b | 115 | rbf | 9.22 | 0.00366 | N/A | N/A |
| u | 126 | rbf | 298.47 | 0.00222 | N/A | N/A | ||
| l | b | 100 | sigmoid | 157.32 | 0.00024 | N/A | -0.24272 | |
| u | 100 | poly | 1001.44 | 0.00597 | 5 | 0.00039 | ||
| ES | r | b | 84 | linear | 196.85 | N/A | N/A | N/A |
| u | 79 | linear | 7010.36 | N/A | N/A | N/A | ||
| l | b | 83 | poly | 954.76 | 0.00581 | 6 | 1.00104 | |
| u | 86 | poly | 721.43 | 0.02692 | 6 | 0.00022 | ||
| OG | r | b | 102 | poly | 8543.28 | 0.01682 | 6 | 0.00004 |
| u | 107 | rbf | 12.42 | 0.00062 | N/A | N/A | ||
| l | b | 92 | poly | 257.51 | 0.00575 | 3 | -0.00191 | |
| u | 78 | poly | 3421.90 | 0.01659 | 8 | 0.00014 | ||
| OR | r | b | 97 | linear | 281.56 | N/A | N/A | N/A |
| u | 68 | linear | 1804.59 | N/A | N/A | N/A | ||
| l | b | 79 | poly | 5502.26 | 0.01908 | 9 | 0.00113 | |
| u | 100 | sigmoid | 63.94 | 0.00261 | N/A | -1.90377 | ||
| OX | r | b | 141 | rbf | 60.29 | 0.00029 | N/A | N/A |
| u | 122 | rbf | 747.39 | 0.00006 | N/A | N/A | ||
| l | b | 132 | poly | 383.62 | 0.02146 | 8 | 0.04259 | |
| u | 118 | poly | 779.96 | 0.02933 | 9 | 0.05214 | ||
| Generic model | b | 83 | sigmoid | 148.639 | 0.02312 | N/A | -1.44779 | |
| u | 76 | sigmoid | 3218.238 | 0.00196 | N/A | 1.92731 | ||
The “No. features” column indicates the number of selected features resulting from the Randomized Logistic Regression algorithm
Mean and standard deviation (in parentheses) measures of F1 score, classification accuracy, precision, recall, MCC and ROC-AUC obtained on the at the local surface patch level using the corresponding best SVM model
| Protein complex | Receptor | Bound | F1 score | Accuracy | Precision | Recall | MCC | ROC-AUC |
|---|---|---|---|---|---|---|---|---|
| class | ligand | unbound | ||||||
| A | r | b | 0.272 (0.101) | 0.862 (0.033) | 0.166 (0.073) | 0.917 (0.056) | 0.346 (0.346) | 0.954 (0.019) |
| u | 0.274 (0.121) | 0.876 (0.026) | 0.169 (0.086) | 0.883 (0.084) | 0.341 (0.341) | 0.939 (0.044) | ||
| l | b | 0.093 (0.096) | 0.811 (0.045) | 0.067 (0.071) | 0.182 (0.152) | 0.019 (0.019) | 0.538 (0.154) | |
| u | 0.097 (0.055) | 0.059 (0.030) | 0.052 (0.030) | 0.987 (0.014) | -0.016 (-0.016) | 0.473 (0.053) | ||
| AB | r | b | 0.230 (0.104) | 0.910 (0.023) | 0.161 (0.080) | 0.590 (0.176) | 0.256 (0.256) | 0.890 (0.032) |
| u | 0.228 (0.116) | 0.913 (0.020) | 0.156 (0.073) | 0.546 (0.284) | 0.250 (0.250) | 0.845 (0.112) | ||
| l | b | 0.183 (0.101) | 0.653 (0.170) | 0.112 (0.069) | 0.553 (0.139) | 0.110 (0.110) | 0.655 (0.119) | |
| u | 0.115 (0.083) | 0.246 (0.093) | 0.063 (0.049) | 0.931 (0.086) | 0.071 (0.071) | 0.667 (0.180) | ||
| EI | r | b | 0.156 (0.073) | 0.604 (0.084) | 0.089 (0.044) | 0.770 (0.214) | 0.158 (0.158) | 0.764 (0.130) |
| u | 0.148 (0.070) | 0.645 (0.070) | 0.087 (0.045) | 0.705 (0.243) | 0.146 (0.146) | 0.747 (0.137) | ||
| l | b | 0.253 (0.101) | 0.535 (0.119) | 0.154 (0.068) | 0.793 (0.233) | 0.167 (0.167) | 0.725 (0.177) | |
| u | 0.203 (0.104) | 0.360 (0.095) | 0.118 (0.065) | 0.865 (0.192) | 0.086 (0.086) | 0.673 (0.150) | ||
| ER | r | b | 0.145 (0.077) | 0.733 (0.043) | 0.089 (0.060) | 0.580 (0.186) | 0.136 (0.136) | 0.734 (0.096) |
| u | 0.109 (0.063) | 0.747 (0.055) | 0.065 (0.044) | 0.465 (0.163) | 0.092 (0.092) | 0.663 (0.092) | ||
| l | b | 0.214 (0.151) | 0.494 (0.102) | 0.136 (0.127) | 0.851 (0.145) | 0.167 (0.167) | 0.774 (0.147) | |
| u | 0.137 (0.101) | 0.087 (0.063) | 0.077 (0.062) | 0.998 (0.005) | 0.019 (0.019) | 0.685 (0.144) | ||
| ES | r | b | 0.031 (0.026) | 0.954 (0.020) | 0.086 (0.088) | 0.023 (0.022) | 0.023 (0.023) | 0.712 (0.077) |
| u | 0.121 (0.108) | 0.861 (0.040) | 0.090 (0.087) | 0.281 (0.193) | 0.096 (0.096) | 0.709 (0.153) | ||
| l | b | 0.150 (0.070) | 0.665 (0.057) | 0.087 (0.043) | 0.665 (0.193) | 0.142 (0.142) | 0.703 (0.144) | |
| u | 0.169 (0.110) | 0.636 (0.074) | 0.102 (0.077) | 0.670 (0.163) | 0.148 (0.148) | 0.720 (0.140) | ||
| OG | r | b | 0.184 (0.110) | 0.704 (0.039) | 0.113 (0.072) | 0.552 (0.203) | 0.145 (0.145) | 0.700 (0.078) |
| u | 0.144 (0.114) | 0.805 (0.031) | 0.103 (0.097) | 0.340 (0.253) | 0.100 (0.100) | 0.631 (0.158) | ||
| l | b | 0.127 (0.031) | 0.373 (0.071) | 0.069 (0.017) | 0.927 (0.080) | 0.125 (0.125) | 0.722 (0.069) | |
| u | 0.108 (0.034) | 0.089 (0.028) | 0.058 (0.019) | 0.996 (0.007) | 0.037 (0.037) | 0.653 (0.119) | ||
| OR | r | b | 0.121 (0.079) | 0.662 (0.068) | 0.073 (0.053) | 0.558 (0.203) | 0.093 (0.093) | 0.659 (0.109) |
| u | 0.103 (0.087) | 0.115 (0.047) | 0.057 (0.052) | 0.968 (0.032) | 0.024 (0.024) | 0.626 (0.162) | ||
| l | b | 0.172 (0.099) | 0.269 (0.063) | 0.098 (0.063) | 0.980 (0.026) | 0.120 (0.120) | 0.723 (0.085) | |
| u | 0.190 (0.100) | 0.592 (0.082) | 0.153 (0.171) | 0.593 (0.183) | 0.095 (0.095) | 0.658 (0.119) | ||
| OX | r | b | 0.108 (0.088) | 0.677 (0.081) | 0.063 (0.057) | 0.555 (0.221) | 0.089 (0.089) | 0.665 (0.154) |
| u | 0.081 (0.054) | 0.670 (0.087) | 0.045 (0.032) | 0.490 (0.221) | 0.056 (0.056) | 0.614 (0.146) | ||
| l | b | 0.168 (0.070) | 0.369 (0.094) | 0.095 (0.043) | 0.896 (0.109) | 0.120 (0.120) | 0.720 (0.113) | |
| u | 0.151 (0.066) | 0.352 (0.061) | 0.085 (0.041) | 0.846 (0.092) | 0.077 (0.077) | 0.668 (0.110) |
Fig. 1Average Receiver Operating Characteristic curve comparison of the class-specific and generic predictors at the local surface patch level, for each protein class
Average pairwise sequence identity (in %) for each protein class
| Protein class | A | AB | EI | ER | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| r | l | r | l | r | l | r | l | |||||||||
| u | b | u | b | u | b | u | b | u | b | u | b | u | b | u | b | |
| Whole set | 44.04 | 44.02 | 41.52 | 42.32 | 42.98 | 42.95 | 38.19 | 38.57 | 41.80 | 40.86 | 42.36 | 43.96 | 33.70 | 35.43 | 41.84 | 40.48 |
| Training set | 47.02 | 46.88 | 34.44 | 40.12 | 45.15 | 45.33 | 41.93 | 41.93 | 39.91 | 39.40 | 42.75 | 43.02 | 33.07 | 36.28 | 43.93 | 42.61 |
| Test set | 46.49 | 46.40 | 44.43 | 53.07 | 39.41 | 39.41 | 31.43 | 43.17 | 38.98 | 39.70 | 49.09 | 50.80 | 36.65 | 36.20 | 43.98 | 43.62 |
| Protein class | ES | OG | OR | OX | ||||||||||||
| r | l | r | l | r | l | r | l | |||||||||
| u | b | u | b | u | b | u | b | u | b | u | b | u | b | u | b | |
| Whole set | 41.61 | 40.21 | 42.31 | 44.60 | 34.63 | 37.31 | 38.57 | 37.26 | 36.60 | 35.83 | 37.91 | 37.71 | 38.48 | 39.98 | 38.61 | 35.68 |
| Training set | 41.52 | 42.25 | 46.96 | 47.31 | 37.15 | 33.91 | 36.34 | 34.19 | 37.98 | 37.30 | 39.07 | 38.82 | 40.32 | 39.99 | 39.25 | 35.24 |
| Test set | 35.79 | 37.69 | 42.30 | 47.33 | 35.41 | 37.05 | 44.49 | 41.81 | 40.82 | 45.26 | 31.76 | 30.76 | 37.20 | 37.61 | 37.95 | 39.09 |
Fig. 2Average Receiver Operating Characteristic curve comparison of the proposed PPI interface prediction method, NPS-HomPPI, PrISE and SPPIDER at the residue level, for each protein class
Fig. 3Average Precision–Recall curve comparison of the proposed PPI interface prediction method, NPS-HomPPI, PrISE and SPPIDER at the residue level, for each protein class