| Literature DB >> 21861881 |
Joan Segura1, Pamela F Jones, Narcis Fernandez-Fuentes.
Abstract
BACKGROUND: Protein binding site prediction by computational means can yield valuable information that complements and guides experimental approaches to determine the structure of protein complexes. Predictions become even more relevant and timely given the current resolution of protein interaction maps, where there is a very large and still expanding gap between the available information on: (i) which proteins interact and (ii) how proteins interact. Proteins interact through exposed residues that present differential physicochemical properties, and these can be exploited to identify protein interfaces.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21861881 PMCID: PMC3171731 DOI: 10.1186/1471-2105-12-352
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Overview of VORFFIP method. The first-step RF uses residue and environment-based features as input variables. The second-step RF also included variables derived from the score values assigned by the first-step RF yielding a final prediction score. Numbers between parentheses refer to the different equations described in the Method section.
AUC values for different combinations of features and environment definitions
| Features | Voronoi Diagrams | Sphere | Sliding Window | Single |
|---|---|---|---|---|
| 0.79 | 0.75 | 0.77 | 0.72 | |
| 0.77 | 0.72 | 0.75 | 0.71 | |
| 0.76 | 0.74 | 0.72 | 0.65 | |
| 0.74 | 0.71 | 0.69 | 0.61 | |
| 0.78 | 0.75 | 0.77 | 0.73 | |
| 0.82 | 0.78 | 0.81 | 0.77 | |
| 0.79 | 0.75 | 0.77 | 0.73 | |
| 0.81 | 0.75 | 0.77 | 0.76 | |
| 0.77 | 0.73 | 0.75 | 0.72 | |
| 0.76 | 0.74 | 0.72 | 0.68 | |
| 0.82 | 0.78 | 0.81 | 0.77 | |
| 0.79 | 0.75 | 0.77 | 0.73 | |
| 0.82 | 0.78 | 0.8 | 0.78 | |
| 0.81 | 0.75 | 0.77 | 0.76 | |
| 0.85 | 0.78 | 0.81 | 0.77 |
The test consisted of a 5-fold cross validation using dataset B100 where interface residues were defined using DIMPLOT (Wallace, et al., 1995). The first column indicates the combination of features used: structural (s), energy (e), conservation (c), and B-factors (b). The second, third, fourth, fifth columns contain AUC values for Voronoi Diagram, sphere (15 Å cut-off), 9-residue sliding window, and single residue (no environment), respectively.
Figure 2ROC curves combining structure, energy, conservation and B-factors information and different environment definitions. Red, green, blue and yellow lines represent ROC curves using VDs, sphere, sliding window, and single residues (i.e. no environment) as environment descriptors respectively. Purple line represents a random prediction.
Figure 3Evaluating the effect of environment descriptors. The binding site of CI-2-SUBTILISIN NOVO (PDB code: 2sni, chain E; surface representation) was predicted using structural, energy, conservation, and B-factor information and three different types of environments definitions. (A) Interface as in the crystal structure (highlighted in red). (B) Prediction using a 9-residues sliding window. (C) Prediction using distance threshold (15 Angstroms cut-off). (D) Prediction using VDs. The gradient colour represents score values (s) where: blue (0 ≤ s < 0.5), green (0.5 ≤ s ≤ 0.7), yellow (0.7 ≤ s < 0.9), and red (s > 0.9). Solid and dashed circles represent differences in the prediction of non-interface and interface residues, respectively.
Comparing SPPIDER and VORFFIP
| METHOD | MCCa | Q2(%)b | R(%)c | P(%)d | AUCe |
|---|---|---|---|---|---|
| VORFFIP | 0.58 | 83.8 | 74.7 | 63.4 | 0.90 |
| SPPIDER | 0.42 | 74.2 | 60.3 | 63.7 | 0.76 |
(a) Matthew correlation coefficient, (b) Second quartile, (c) Recall, (d) Precision, (e) Area under the ROC curve. VORFFIP values were obtained using the default predictor. SPPIDER values taken from [20].
Comparing WHISCY, WHISCYMATE and VORFFIP
| METHOD | R(%)a | P(%)b | MCCc |
|---|---|---|---|
| VORFFIP | 47 | 42 | 0.38 |
| WHISCY | 27 | 39 | 0.27 |
| WHISCYMATE | 28 | 36 | 0.26 |
(a) Recall, (b) Precision, (c) Matthew correlation coefficient. VORFFIP values were obtained using the default predictor. WHISCY and WHISCYMATE values taken from [24]
Figure 4Different definitions of residues' structural environment or neighbourhood. (A) Single residue (red), i.e. no environment. (B) 9-residue sliding window (as in Sikic et al. [25]); central residue is shown in red and flanking residues in yellow. (C) Euclidean distance cut-off; residues enclosed in a sphere of radius R = 15 Angstroms (yellow) as in Porollo et al. [20], centred on the given residue (red). (D) Voronoi Diagrams; residue of interest (red) with colour gradient showing neighbouring residues; orange: residues sharing more than 16 edges with residue of interest; yellow: between 8 to 16; green: less than 8. Inset shows the 2D projection of a VD between two residues.