| Literature DB >> 23729008 |
Brijesh K Sriwastava1, Subhadip Basu, Ujjwal Maulik, Dariusz Plewczynski.
Abstract
The physico-chemical properties of interaction interfaces have a crucial role in characterization of protein-protein interactions (PPI). In silico prediction of participating amino acids helps to identify interface residues for further experimental verification using mutational analysis, or inhibition studies by screening library of ligands against given protein. Given the unbound structure of a protein and the fact that it forms a complex with another known protein, the objective of this work is to identify the residues that are involved in the interaction. We attempt to predict interaction sites in protein complexes using local composition of amino acids together with their physico-chemical characteristics. The local sequence segments (LSS) are dissected from the protein sequences using a sliding window of 21 amino acids. The list of LSSs is passed to the support vector machine (SVM) predictor, which identifies interacting residue pairs considering their inter-atom distances. We have analyzed three different model organisms of Escherichia coli, Saccharomyces Cerevisiae and Homo sapiens, where the numbers of considered hetero-complexes are equal to 40, 123 and 33 respectively. Moreover, the unified multi-organism PPI meta-predictor is also developed under the current work by combining the training databases of above organisms. The PPIcons interface residues prediction method is measured by the area under ROC curve (AUC) equal to 0.82, 0.75, 0.72 and 0.76 for the aforementioned organisms and the meta-predictor respectively.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23729008 PMCID: PMC3744667 DOI: 10.1007/s00894-013-1886-9
Source DB: PubMed Journal: J Mol Model ISSN: 0948-5023 Impact factor: 1.810
Fig. 1A schematic diagram shows the training data preparation steps for PPI organism-specific database
Result on AUC optimized network over E. coli CV set and test set
| Run | Accuracy | Recall | Precision | Specificity | AUC | MCC | F-measure |
|---|---|---|---|---|---|---|---|
| CV run#1 | 80.3571 | 0.737024 | 0.760714 | 0.84738 | 0.792202 | 0.587729 | 0.748682 |
| CV run#2 | 81.3443 | 0.716263 | 0.793103 | 0.877273 | 0.796768 | 0.60559 | 0.752727 |
| CV run#3 | 80.9328 | 0.709343 | 0.788462 | 0.875 | 0.792171 | 0.596719 | 0.746812 |
| CV run#4 | 82.3288 | 0.741379 | 0.799257 | 0.877273 | 0.809326 | 0.627545 | 0.769231 |
| CV run#5 | 79.6982 | 0.695502 | 0.770115 | 0.863636 | 0.779569 | 0.570494 | 0.730909 |
| CV run#6 | 82.4417 | 0.754325 | 0.792727 | 0.870455 | 0.81239 | 0.630533 | 0.77305 |
| CV run#7 | 83.0137 | 0.758621 | 0.80292 | 0.877273 | 0.817947 | 0.642617 | 0.780142 |
| CV run#8 | 81.07 | 0.705882 | 0.793774 | 0.879545 | 0.792714 | 0.599392 | 0.747253 |
| CV run#9 | 80.5213 | 0.743945 | 0.759717 | 0.845455 | 0.7947 | 0.591595 | 0.751748 |
| CV run#10 | 78.4932 | 0.724138 | 0.731707 | 0.825 | 0.774569 | 0.550128 | 0.727903 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Bold entries represent average CV results and Test set results
Result on AUC optimization over Yeast CV set and test set
| Run | Accuracy | Recall | Precision | Specificity | AUC | MCC | F-measure |
|---|---|---|---|---|---|---|---|
| CV run#1 | 75.5585 | 0.683871 | 0.706667 | 0.804878 | 0.744375 | 0.49141 | 0.695082 |
| CV run#2 | 74.4094 | 0.654839 | 0.697595 | 0.80531 | 0.730074 | 0.465255 | 0.675541 |
| CV run#3 | 74.4094 | 0.687097 | 0.684887 | 0.783186 | 0.735141 | 0.470046 | 0.68599 |
| CV run#4 | 74.574 | 0.684887 | 0.68932 | 0.787611 | 0.736249 | 0.472979 | 0.687097 |
| CV run#5 | 75.3281 | 0.703226 | 0.694268 | 0.787611 | 0.745418 | 0.489872 | 0.698718 |
| CV run#6 | 72.6675 | 0.680645 | 0.659375 | 0.758315 | 0.71948 | 0.436918 | 0.669841 |
| CV run#7 | 75.3604 | 0.697749 | 0.697749 | 0.792035 | 0.744892 | 0.489785 | 0.697749 |
| CV run#8 | 74.5407 | 0.7 | 0.68239 | 0.776549 | 0.738274 | 0.474736 | 0.691083 |
| CV run#9 | 75.4593 | 0.690323 | 0.701639 | 0.798673 | 0.744498 | 0.490283 | 0.695935 |
| CV run#10 | 74.443 | 0.700965 | 0.68125 | 0.774336 | 0.73765 | 0.473305 | 0.690967 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Bold entries represent average CV results and Test set results
Result on AUC optimized network over Homo sapiens CV set and test set
| Run | Accuracy | Recall | Precision | Specificity | AUC | MCC | F-measure |
|---|---|---|---|---|---|---|---|
| CV run#1 | 83.6872 | 0.741379 | 0.821656 | 0.897623 | 0.819501 | 0.652729 | 0.779456 |
| CV run#2 | 83.9286 | 0.73639 | 0.831715 | 0.904936 | 0.820663 | 0.657941 | 0.781155 |
| CV run#3 | 84.0402 | 0.74212 | 0.830128 | 0.903108 | 0.822614 | 0.660444 | 0.783661 |
| CV run#4 | 84.8214 | 0.739255 | 0.851485 | 0.917733 | 0.828494 | 0.677197 | 0.791411 |
| CV run#5 | 83.2589 | 0.74212 | 0.811912 | 0.890311 | 0.816216 | 0.644075 | 0.775449 |
| CV run#6 | 84.581 | 0.767241 | 0.824074 | 0.895795 | 0.831518 | 0.672559 | 0.794643 |
| CV run#7 | 82.1429 | 0.74212 | 0.787234 | 0.872029 | 0.807075 | 0.621285 | 0.764012 |
| CV run#8 | 81.9196 | 0.716332 | 0.798722 | 0.884826 | 0.800579 | 0.614878 | 0.755287 |
| CV run#9 | 85.2679 | 0.767908 | 0.840125 | 0.906764 | 0.837336 | 0.687094 | 0.802395 |
| CV run#10 | 84.1518 | 0.759312 | 0.820433 | 0.893967 | 0.82664 | 0.663478 | 0.78869 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Bold entries represent average CV results and Test set results
Result on AUC optimized network over the multi-organism meta-data CV set and test set
| Run | Accuracy | Recall | Precision | Specificity | AUC | MCC | F-measure |
|---|---|---|---|---|---|---|---|
| CV run#1 | 78.5415 | 0.718354 | 0.735421 | 0.829624 | 0.773989 | 0.550257 | 0.726788 |
| CV run#2 | 80.2681 | 0.739451 | 0.757838 | 0.844336 | 0.791894 | 0.586334 | 0.748532 |
| CV run#3 | 81.4489 | 0.742887 | 0.779867 | 0.86171 | 0.802298 | 0.609998 | 0.760928 |
| CV run#4 | 80.176 | 0.712025 | 0.771429 | 0.860918 | 0.786472 | 0.58178 | 0.740538 |
| CV run#5 | 80.4858 | 0.724974 | 0.770437 | 0.85754 | 0.791257 | 0.589146 | 0.747014 |
| CV run#6 | 79.3884 | 0.690928 | 0.766979 | 0.86171 | 0.776319 | 0.564125 | 0.72697 |
| CV run#7 | 80.1341 | 0.739451 | 0.755388 | 0.842142 | 0.790797 | 0.583781 | 0.747335 |
| CV run#8 | 80.1508 | 0.748156 | 0.751323 | 0.836692 | 0.792424 | 0.585272 | 0.749736 |
| CV run#9 | 79.5978 | 0.71519 | 0.757542 | 0.849201 | 0.782195 | 0.570451 | 0.735757 |
| CV run#10 | 79.6482 | 0.726027 | 0.753005 | 0.842946 | 0.784487 | 0.572722 | 0.73927 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Bold entries represent average CV results and Test set results
Comparison of our current work with the existing techniques
| Methods | AUC | Sensitivity | Specificity | |
|---|---|---|---|---|
| 1 | Wang et. al. [ | 0.71933 | 0.68640 | 0.65417 |
| 2 | Nguyen et. al. [ | 0.74943 | 0.3598 | 0.92949 |
| 3 | Deng et. al. [ | 0.79761 | 0.76765 | 0.63158 |
| 4 | Borderner et. al. [ | – | 0.57 | 0.26 |
| 5 | Singh et. al. [ | – | 0.6 | 0.75 |
| 6 | PPIcons( | 0.814687 | 0.736842 | 0.892532 |
| 7 | PPIcons( | 0.75378 | 0.741602 | 0.765957 |
| 8 | PPIcons( | 0.722559 | 0.721839 | 0.72328 |
| 9 | PPIcons (meta-data) | 0.754744 | 0.722739 | 0.786748 |
Fig. 2The performance on testing dataset of PPIcons in comparison with the existing state-of-the-art tools