| Literature DB >> 33941163 |
Abstract
BACKGROUND: Protein-protein interactions (PPIs) are the core of protein function, which provide an effective means to understand the function at cell level. Identification of PPIs is the crucial foundation of predicting drug-target interactions. Although traditional biological experiments of identifying PPIs are becoming available, these experiments remain to be extremely time-consuming and expensive. Therefore, various computational models have been introduced to identify PPIs. In protein-protein interaction network (PPIN), Hub protein, as a highly connected node, can coordinate PPIs and play biological functions. Detecting hot regions on Hub protein interaction interfaces is an issue worthy of discussing.Entities:
Keywords: Clustering; Hub protein; Optimization; Protein–protein interactions
Mesh:
Substances:
Year: 2021 PMID: 33941163 PMCID: PMC8094484 DOI: 10.1186/s12911-020-01350-4
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Proportion of hot spots and non-hot spots in different datasets
| Dataset | Hot spots | Non-hot spots | Total | Proportion |
|---|---|---|---|---|
| ASEdb | 65 | 90 | 155 | 0.72:1 |
| DateHub | 1056 | 1850 | 2906 | 0.57:1 |
| PartyHub | 1033 | 1980 | 3013 | 0.52:1 |
Selected optimal feature subset of protein residues
| Rank | Feature | Rank | Feature |
|---|---|---|---|
| 1 | BsRASA | 12 | BbASA |
| 2 | BsASA | 13 | BbRASA |
| 3 | BsmDI | 14 | UbASA |
| 4 | BnRASA | 15 | UbRASA |
| 5 | BminPI | 16 | UsASA |
| 6 | BpRASA | 17 | UnASA |
| 7 | BmaxPI | 18 | UnRASA |
| 8 | BpASA | 19 | UtASA |
| 9 | BnASA | 20 | UsmDI |
| 10 | BtmDI | 21 | UmaxDI |
| 11 | BmaxDI | 22 | UminPI |
Prediction method of hot regions based on residue coordination number optimization and improved K-means(RCNOIK)
LCSD algorithm based on PPRA optimization
| Input: Datasets | |
|---|---|
| Step 1: Feature selection for residues in datasets; | |
| Step 2: A clustering-based boundary point recognition algorithm is used to obtain the predicted set of hot regions | |
| Step 3: | |
| Step 4: | |
| Step 5: Repeat step 4 until there is no missing residue need to be processed; | |
| Step 6: Input the optimized hot regions |
RCNOIK algorithm based on PPRA optimization
| Input: Datasets | |
|---|---|
| Step 1: Feature selection for residues in datasets; | |
| Step 2: Calculate the sum of distance squares of residues and the average silhouette value to obtain the optimal k value; | |
| Step 3: K-means is used to obtain the hot regions | |
| Step 4: | |
| Step 5: | |
| Step 6: Input the optimized hot regions |
Fig. 1Workflow of predicting hot regions
Performance of predicting hot regions by different methods
| Method | Precision | Recall | F1-Score |
|---|---|---|---|
| Tuncbag | 1 | 0.20 | 0.33 |
| Nan | 0.67 | 0.40 | 0.49 |
| Hu | 0.78 | 0.70 | 0.74 |
| LCSD | 0.78 | 0.83 | 0.80 |
| RCNOIK | 0.80 | 0.82 | 0.83 |
Fig. 2Average silhouette width graph of different k value for 1A22, 1FCC and 3HFM. a 1A22,k = 2 under max ASW; b 1FCC, k = 2 under max ASW; c 3HFM, k = 3 under max ASW
Fig. 3Silhouette width plots under optimized k value. a 1A22, Average silhouette width is 0.55 under k = 2; b 1FCC, Average silhouette width is 0.76 under k = 2; c 3HFM, Average silhouette width is 0.57 under k = 3
Fig. 4Optimum feature combination of clustering
Prediction performance of two methods on DateHub and PartyHub (before optimization)
| Dataset | Method | Precision | Recall | F1-Score |
|---|---|---|---|---|
| DateHub | LCSD | 0.58 | 0.54 | 0.56 |
| RCNOIK | 0.64 | 0.54 | 0.59 | |
| PartyHub | LCSD | 0.48 | 0.51 | 0.49 |
| RCNOIK | 0.51 | 0.51 | 0.51 |
Prediction performance based on PPRA optimization (after optimization)
| Dataset | Method | Precision | Recall | F1-Score |
|---|---|---|---|---|
| DateHub | PPRA_LCSD | 0.78 | 0.70 | 0.74 |
| PPRA_RCNOIK | 0.89 | 0.70 | 0.78 | |
| PartyHub | PPRA_LCSD | 0.73 | 0.62 | 0.67 |
| PPRA_RCNOIK | 0.89 | 0.62 | 0.73 |
Prediction results of 1A0A and 1E9G by clustering based on RCNOIK
| PDB ID | Natural hot spots predicted | Natural hot spots unpredicted | False hot spots predicted |
|---|---|---|---|
| 1A0A | A16, A19, A49, A52, A53, B16, B29, B52, B53 | A20, A23, A46, A50, A56, B23, B43, B46, B49, B50 | A22, A43, A47, A57, B13, B19, B22, B28, B42, B47, B54, B57 |
| 1E9G | A51, A52, A90, A279, A281, B51, B90 | A84, A87, A178, A181, B52, B84, B87, B178, B181, B279 | A82, A126, A127, A128, A184, A278, B82, B16, B128, B179, B180, B278, B281, B283 |
Prediction results of 1A0A and 1E9G by PPRA optimization
| PDB ID | Hot spots recovered | Hot spots unrecovered | False hot spots recovered | False hot spots |
|---|---|---|---|---|
| 1A0A | A23, A46, A50, A56, B49, B50 | A20, B23, B43, B46 | A43, A47, B22, B28, B47, B54, B57 | A22, A57, B13, B19, B42 |
| 1E9G | A87, A178, A181, B84, B87, B178 | A84, B52, B181, B279 | A82, A126, A127, A184, A278, B16, B180, B281, B283 | A128, B82, B128, B179, B278 |
Prediction results of 1A0A and 1E9G by classification methods
| PDB ID | Method | Natural hot spots predicted | Natural hot spots unpredicted | False hot spots |
|---|---|---|---|---|
| 1A0A | Boosting | A16, A19, A49, A53, A56, B16, B29, B49, B50, B52, B53 | A20, A23, A46, A50, A52, B23, B43, B46 | A15, A22, A57, B13, B19, B22, B42 |
| Gradient boosting | A16, A19, A23, A46, A49, A52, A53, B16, B29, B49, B52, B53 | A50, B23, B43, B46 | A22, A26, B13, B19, B42, B57 | |
| Random forest | A16, A19, A23, A46, A49, A50, A52, A53, B16, B29, B43, B49, B52, B50, B53, A56 | A20, B23, B46 | A22, A57 | |
| 1E9G | Boosting | A51, A52, A90, A279, A281, B51, A178, A181, B84, B90 | A87, A84, B52, B87, B178, B181, B279 | A82, A126, A128, B82, B128, B179 |
| Gradient boosting | A51, A52, A87, A90, A279, A281, B51, B52, A178, A181, B84, B178, | A84, B87, B90, B181, B279 | A126, A128, B82, B126, B128, B278 | |
| Random forest | A51, A52, A87, A90, A279, A281, B51, A178, A181, B84, B87, B90, B279 | A84, B52, B178, B181 | A128, B179 |
Fig. 5Three-dimensional spatial structures of DateHub proteins 1A0A and 1E9G. a 1A0A; b 1E9G