| Literature DB >> 25124108 |
Zhijie Dong, Keyu Wang, Truong Khanh Linh Dang, Mehmet Gültas, Marlon Welter, Torsten Wierschin, Mario Stanke, Stephan Waack1.
Abstract
BACKGROUND: The identification of protein-protein interaction sites is a computationally challenging task and important for understanding the biology of protein complexes. There is a rich literature in this field. A broad class of approaches assign to each candidate residue a real-valued score that measures how likely it is that the residue belongs to the interface. The prediction is obtained by thresholding this score.Some probabilistic models classify the residues on the basis of the posterior probabilities. In this paper, we introduce pairwise conditional random fields (pCRFs) in which edges are not restricted to the backbone as in the case of linear-chain CRFs utilized by Li et al. (2007). In fact, any 3D-neighborhood relation can be modeled. On grounds of a generalized Viterbi inference algorithm and a piecewise training process for pCRFs, we demonstrate how to utilize pCRFs to enhance a given residue-wise score-based protein-protein interface predictor on the surface of the protein under study. The features of the pCRF are solely based on the interface predictions scores of the predictor the performance of which shall be improved.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25124108 PMCID: PMC4150965 DOI: 10.1186/1471-2105-15-277
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Example history set having boundary .
Figure 2Computing the connected components of the new history set - case one.
Figure 3Computing the connected components of the new history set - case two.
Figure 4Viterbi recursion step in case . After adding node v to the history set, and will be replaced by . In this example, for every assignment of the new boundary the score of is maximized by varying over the assignments of v and and using the Viterbi variables of and .
Figure 5The distribution of the score for complexes from the data set .
Classification results on and the KL-subset, where the -distributions according to which the synthetic scores were drawn are defined by Equations 7, 8, 9 and 10
|
| Score precision | 0 | 0 | 1 | 1 | 1.2 |
|---|---|---|---|---|---|---|
| Surface AUC ratio | 1.084 | 1.091 | 1.093 | 1.089 | 1.093 | |
|
| Signal precision | 0.8 | 0.9 | 1.0 | 1.1 | 1.2 |
| Surface AUC ratio | 1.032 | 1.039 | 1.045 | 1.045 | 1.050 |
Depending on the variances determined by ς, the enhancer increases the AUC referred to the protein surface by 8.4%-9.3%. on PlaneDimers, and by 3.2%-5.0% on the KL-subset.
Comparing the enhancer with the threshold classifier of approximately equal specificity on synthetic scores assigned to surface residues of protein complexes taken from the data set and the KL-subset
| Data Set | Score Precision | Classifier | Specificity | Sensitivity | MCC |
|---|---|---|---|---|---|
| 0.8 | Threshold Predictor | 0.9672 | 0.2562 | 0.3253 | |
| Enhancer | 0.9666 | 0.4281 | 0.4911 | ||
|
| 0.9 | Threshold Predictor | 0.9618 | 0.2556 | 0.3077 |
| Enhancer | 0.9624 | 0.4086 | 0.4610 | ||
| 1.0 | Threshold Predictor | 0.9611 | 0.2428 | 0.2912 | |
| Enhancer | 0.9612 | 0.3872 | 0.4379 | ||
| 1.1 | Threshold Predictor | 0.9681 | 0.2100 | 0.2753 | |
| Enhancer | 0.9677 | 0.3307 | 0.4045 | ||
| 1.2 | Threshold Predictor | 0.9649 | 0.2100 | 0.2648 | |
| Enhancer | 0.9647 | 0.3213 | 0.3854 | ||
| 0.8 | Threshold Predictor | 0.9568 | 0.2936 | 0.3549 | |
| Enhancer | 0.9577 | 0.3586 | 0.4210 | ||
|
| 0.9 | Threshold Predictor | 0.9533 | 0.2843 | 0.3369 |
| Enhancer | 0.9531 | 0.3290 | 0.3820 | ||
| 1.0 | Threshold Predictor | 0.9570 | 0.2559 | 0.3152 | |
| Enhancer | 0.9571 | 0.2971 | 0.3591 | ||
| 1.1 | Threshold Predictor | 0.9615 | 0.2279 | 0.2949 | |
| Enhancer | 0.9614 | 0.2743 | 0.3459 | ||
| 1.2 | Threshold Predictor | 0.9604 | 0.2199 | 0.2828 | |
| Enhancer | 0.9599 | 0.2516 | 0.3175 |
Comparing the enhancer with the threshold classifier of approximately equal specificity on synthetic scores assigned to surface residues of protein complexes taken from the CGNK-subset
| Classifier | Specificity | Sensitivity | MCC |
|---|---|---|---|
| Threshold predictor | 0.9399 | 0.3782 | 0.3387 |
| Enhancer | 0.9400 | 0.3104 | 0.2767 |
Enhancing above various thresholds on , where ’s threshold was chosen such that the specificity approximately equals that of enhancing
| tp | tn | fp | fn | Spec. | Sen. | MCC | ||
|---|---|---|---|---|---|---|---|---|
| Enhancing above 0.500 | 2181 | 23182 | 4145 | 1414 | 0.848 | 0.607 | 0.362 | |
|
| 2100 | 23197 | 4130 | 1495 | 0.849 | 0.584 | 0.346 | |
| Enhancing above 0.525 | 2303 | 22917 | 4410 | 1292 | 0.839 | 0.641 | 0.373 | |
|
| 2206 | 22912 | 4415 | 1389 | 0.838 | 0.614 | 0.353 | |
| Enhancing above 0.550 | 2507 | 22103 | 5224 | 1088 | 0.809 | 0.697 | 0.375 | |
|
| 2419 | 22102 | 5225 | 1176 | 0.809 | 0.673 | 0.358 | |
| Enhancing above 0.575 | 2560 | 21992 | 5335 | 1035 | 0.805 | 0.712 | 0.380 | |
|
| 2463 | 21915 | 5412 | 1132 | 0.802 | 0.685 | 0.358 | |
| Enhancing above 0.600 | 2379 | 22685 | 4642 | 1216 | 0.830 | 0.662 | 0.376 | |
|
| 2253 | 22780 | 4547 | 1342 | 0.834 | 0.627 | 0.356 | |
| Enhancing above 0.625 | 2287 | 23044 | 4283 | 1308 | 0.843 | 0.636 | 0.376 | |
|
| 2136 | 23049 | 4278 | 1459 | 0.843 | 0.594 | 0.346 |
The sensitivity increased that way by 4%-7%. For every pair of experiments, the number of true negatives (tn), false negatives (fn), false positives (fp) and true positives (tp) are displayed.
Figure 6Comparison of enhancer and service of same specificity on the protein with PDB-Entry 1QM4. (A) Green spheres on the left show the interface surface residues correctly predicted by both tools. (B) Red spheres on the right indicate additional true positives of the enhancer.