| Literature DB >> 36246558 |
JunYan Zhang1,2, Yinghua Lyu1, Zhiqiang Ma1.
Abstract
Prediction of protein-protein interaction (PPI) sites is one of the most perplexing problems in drug discovery and computational biology. Although significant progress has been made by combining different machine learning techniques with a variety of distinct characteristics, the problem still remains unresolved. In this study, a technique for PPI sites is presented using a random forest (RF) algorithm followed by the minimum redundancy maximal relevance (mRMR) approach, and the method of incremental feature selection (IFS). Physicochemical properties of proteins and the features of the residual disorder, sequence conservation, secondary structure, and solvent accessibility are incorporated. Five 3D structural characteristics are also used to predict PPI sites. Analysis of features shows that 3D structural features such as relative solvent-accessible surface area (RASA) and surface curvature (SC) help in the prediction of PPI sites. Results show that the performance of the proposed predictor is superior to several other state-of-the-art predictors, whose average prediction accuracy is 81.44%, sensitivity is 82.17%, and specificity is 80.71%, respectively. The proposed predictor is expected to become a helpful tool for finding PPI sites, and the feature analysis presented in this study will give useful insights into protein interaction mechanisms.Entities:
Mesh:
Substances:
Year: 2022 PMID: 36246558 PMCID: PMC9553539 DOI: 10.1155/2022/5892627
Source DB: PubMed Journal: Dis Markers ISSN: 0278-0240 Impact factor: 3.464
Figure 1Propose methodology.
For a single residue of feature space prediction results.
| Dataset | Sensitivity | Specificity | Precision | Accuracy | MCC |
|---|---|---|---|---|---|
| Single | 0.267 | 0.980 | 0.787 | 0.835 | 0.390 |
Figure 2The prediction results with the feature space of the sliding window.
For sliding windows of feature space prediction best results.
| Dataset | Sensitivity | Specificity | Precision | Accuracy | MCC |
|---|---|---|---|---|---|
| SW11 | 0.463 | 0.974 | 0.824 | 0.870 | 0.553 |
Figure 3The prediction results with the feature space of the patch.
For a patch of feature space prediction best results.
| Dataset | Sensitivity | Specificity | Precision | Accuracy | MCC |
|---|---|---|---|---|---|
| Patch15 | 0.471 | 0.968 | 0.793 | 0.866 | 0.542 |
The ideal results compared from single residue, sliding window, and patch.
| Dataset | Sensitivity | Specificity | Precision | Accuracy | MCC |
|---|---|---|---|---|---|
| Single | 0.267 | 0.980 | 0.778 | 0.835 | 0.390 |
| SW11 | 0.463 | 0.974 | 0.824 | 0.870 | 0.553 |
| Patch15 | 0.471 | 0.968 | 0.793 | 0.866 | 0.542 |
Figure 4Sliding window and patch mRMR results.
Sliding window and patch IFS result.
| Dataset | Sensitivity | Specificity | Precision | Accuracy | MCC |
|---|---|---|---|---|---|
| SW11_32 | 0.480 | 0.975 | 0.833 | 0.874 | 0.570 |
| Patch15_37 | 0.449 | 0.977 | 0.831 | 0.869 | 0.547 |
Figure 5The influence of TREE number on the result.
The influence of the TREE number on the result.
| Dataset | Sensitivity | Specificity | Precision | Accuracy | MCC |
|---|---|---|---|---|---|
| Default trees | 0.498 | 0.975 | 0.838 | 0.878 | 0.584 |
| 197_trees | 0.550 | 0.976 | 0.854 | 0.889 | 0.627 |
Figure 6The influence of cross-validation adjustment on the result.
The influence of cross-validation adjustment on the result.
| Dataset | Sensitivity | Specificity | Precision | Accuracy | MCC |
|---|---|---|---|---|---|
| 10-fold cross-validation | 0.551 | 0.976 | 0.855 | 0.877 | 0.636 |
Figure 7The results of 100 times the average.
The results over imbalanced and trimmed data.
| Dataset | Sensitivity | Specificity | Precision | Accuracy | MCC |
|---|---|---|---|---|---|
| Imbalanced | 0.551 | 0.976 | 0.855 | 0.877 | 0.636 |
| Trimmed | 0.822 | 0.807 | 0.810 | 0.814 | 0.667 |
Performance comparison with other methods.
| Methods | Sensitivity | Specificity | Accuracy | MCC |
|---|---|---|---|---|
| Wang et al. [ | 0.698 | 0.666 | 0.729 | 0.230 |
| Nguyen and Rajapakse [ | 0.436 | 0.926 | 0.803 | 0.349 |
| Ofran and Rost [ | 0.763 | 0.786 | 0.863 | 0.376 |
| Proposed method | 0.822 | 0.807 | 0.814 | 0.667 |