| Literature DB >> 16893954 |
Shide Liang1, Chi Zhang, Song Liu, Yaoqi Zhou.
Abstract
Most biological processes are mediated by interactions between proteins and their interacting partners including proteins, nucleic acids and small molecules. This work establishes a method called PINUP for binding site prediction of monomeric proteins. With only two weight parameters to optimize, PINUP produces not only 42.2% coverage of actual interfaces (percentage of correctly predicted interface residues in actual interface residues) but also 44.5% accuracy in predicted interfaces (percentage of correctly predicted interface residues in the predicted interface residues) in a cross validation using a 57-protein dataset. By comparison, the expected accuracy via random prediction (percentage of actual interface residues in surface residues) is only 15%. The binding sites of the 57-protein set are found to be easier to predict than that of an independent test set of 68 proteins. The average coverage and accuracy for this independent test set are 30.5 and 29.4%, respectively. The significant gain of PINUP over expected random prediction is attributed to (i) effective residue-energy score and accessible-surface-area-dependent interface-propensity, (ii) isolation of functional constraints contained in the conservation score from the structural constraints through the combination of residue-energy score (for structural constraints) and conservation score and (iii) a consensus region built on top-ranked initial patches.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16893954 PMCID: PMC1540721 DOI: 10.1093/nar/gkl454
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
The values of for 20 amino acid residues
| Amino acid | Amino acid | ||
|---|---|---|---|
| Ala | −0.925 | Leu | 1.07 |
| Arg | 0.291 | Lys | −0.991 |
| Asn | −0.248 | Met | 2.22 |
| Asp | −0.571 | Phe | 3.00 |
| Cys | 2.78 | Pro | −0.553 |
| Gln | −0.685 | Trp | 4.39 |
| Glu | −0.881 | Val | 0.278 |
| Gly | 0.042 | Ser | −0.749 |
| His | 1.56 | Thr | −0.730 |
| Ile | 2.46 | Tyr | 3.76 |
Leave-one-out cross validation for 57 unbound protein structures
| Unbound proteina | PDB code | Complex code | interface residues | Surface residues | Coverage (%) | Accuracy (%) |
|---|---|---|---|---|---|---|
| Barstar | 1a19A | 1brsA | 17 | 70 | 82.4 | 73.7 |
| Barnase | 1a2pA | 1brsD | 18 | 91 | 72.2 | 61.9 |
| Tumor suppressor p16ink4a | 1a5e- | 1bi7B | 31 | 125 | 22.6 | 38.9 |
| Acetylcholinesterase | 1acl- | 1fssA | 25 | 355 | 56.0 | 35.9 |
| Plastocyanin | 1ag6- | 2pcfA | 23 | 78 | 39.1 | 60.0 |
| cdc42hs | 1aje- | 1am4D | 18 | 160 | 33.3 | 60.0 |
| Rhogdi | 1ajw- | 1cc0E | 13 | 127 | 69.2 | 69.2 |
| Fkbp-rapamycin-binding domain | 1aueA | 1fapB | 8 | 78 | 37.5 | 15.8 |
| Trypsin inhibitor | 1avu- | 1avwB | 15 | 138 | 93.3 | 66.7 |
| Human procarboxypeptidase a2 | 1aye- | 1dtdA | 24 | 304 | 54.2 | 44.8 |
| Hydrolase angiogenin | 1b1eA | 1a4yB | 34 | 101 | 47.1 | 88.9 |
| Bifunctional trypsin/alpha-amylase inhibitor (rbi) | 1bip- | 1tmqB | 27 | 119 | 37.0 | 58.8 |
| Cytochrome | 1ctm- | 2pcfB | 25 | 201 | 36.0 | 36.0 |
| Granulocyte colony stimulating factor | 1cto- | 1cd9B | 6 | 103 | 100.0 | 35.3 |
| Receptor chey mutant | 1cye- | 1eayA | 13 | 97 | 7.7 | 6.7 |
| Calcium-free equine plasma gelsolin | 1d0nA | 1c0fS | 24 | 589 | 0.0 | 0.0 |
| Hydrolase inhibitor | 1d2bA | 1ueaB | 22 | 109 | 59.1 | 81.2 |
| Transferase | 1ekxA | 1d09A | 21 | 232 | 0.0 | 0.0 |
| Bovine chymotrypsinogen a | 1ex3A | 1cgiE | 28 | 188 | 46.4 | 68.4 |
| Neuronal t-snare syntaxin-1a | 1ez3A | 1dn1B | 18 | 116 | 44.4 | 47.1 |
| N-terminal domain of enzyme I from | 1eza- | 3ezaA | 24 | 249 | 0.0 | 0.0 |
| rgs4 | 1eztA | 1agrE | 21 | 115 | 19.0 | 23.5 |
| Enteropathogenic | 1f00I | 1f02I | 16 | 215 | 0.0 | 0.0 |
| Coxsackie virus and adenovirus receptor | 1f5wA | 1kacB | 18 | 98 | 22.2 | 23.5 |
| Fk506 binding protein | 1fkl- | 1b6cA | 20 | 88 | 45.0 | 69.2 |
| Uracil-DNA glycosylase | 1flzA | 1euiA | 26 | 172 | 69.2 | 78.3 |
| Neuronal sec1 | 1fvhA | 1dn1 | 41 | 423 | 39.0 | 55.2 |
| Hydrolase | 1g4kA | 1ueaA | 31 | 133 | 22.6 | 41.2 |
| Radixin ferm domain | 1gc7A | 1ef1A | 59 | 248 | 11.9 | 31.8 |
| Granulocyte colony stimulating factor (rhg-csf) | 1gnc- | 1cd9A | 18 | 174 | 16.7 | 17.6 |
| N-terminal region of p67phox | 1hh8A | 1e96B | 14 | 150 | 78.6 | 44.0 |
| Lipase (EC 3.1.1.3) | 1hplA | 1ethA | 21 | 325 | 9.5 | 8.0 |
| p53 core DNA-binding domain | 1hu8A | 1ycsA | 16 | 155 | 68.8 | 29.7 |
| Interleukin-1 beta | 1iob- | 1itbA | 42 | 133 | 19.0 | 47.1 |
| Actin | 1j6zA | 1c0fA | 30 | 281 | 30.0 | 33.3 |
| α-Amylase | 1jae- | 1tmqA | 33 | 316 | 45.5 | 78.9 |
| (EC 3.5.1.28) mutant | 1lba- | 1aroL | 16 | 112 | 37.5 | 40.0 |
| Knob domain from adenovirus serotype 12 | 1nobA | 1kacA | 18 | 139 | 0.0 | 0.0 |
| Nitric oxide synthase oxygenase domain | 1nos- | 1nocA | 1 | 251 | 0.0 | 0.0 |
| Porcine pancreatic procolipase b | 1pco- | 1ethB | 18 | 85 | 22.2 | 18.2 |
| Profilin | 1pne- | 1hluP | 25 | 107 | 60.0 | 93.8 |
| Phosphotransferase (hpr) | 1poh- | 1ggrB | 16 | 69 | 75.0 | 66.7 |
| Papain (EC 4.3.22.2) | 1ppp- | 1stfE | 28 | 160 | 32.1 | 50.0 |
| Streptokinase domain b | 1qqrA | 1bmlC | 10 | 118 | 70.0 | 38.9 |
| Rhogap | 1rgp- | 1am4A | 17 | 155 | 41.2 | 35.0 |
| Selenosubtilisin | 1selA | 1cseE | 23 | 177 | 47.8 | 61.1 |
| Cyclin a | 1vin- | 1finB | 29 | 194 | 51.7 | 50.0 |
| P120gap | 1wer- | 1wq1G | 34 | 251 | 35.3 | 66.7 |
| α-Lactamase tem1 | 1×pb- | 1jtgA | 33 | 186 | 54.5 | 90.0 |
| Ribonuclease inhibitor | 2bnh- | 1a4yA | 39 | 370 | 35.9 | 36.8 |
| Cyclophilin a | 2cpl- | 1ak4A | 19 | 120 | 63.2 | 75.0 |
| Glucose-specific phosphocarrier | 2f3gA | 1ggrA | 20 | 104 | 60.0 | 63.2 |
| Negative factor (fprotein) | 2nef- | 1avzB | 14 | 128 | 42.9 | 33.3 |
| RalGEF-rbd streptomyces | 2rgf- | 1lfdA | 15 | 79 | 33.3 | 26.3 |
| Subtilisin inhibitor | 3ssi- | 2sicI | 13 | 92 | 100.0 | 68.4 |
| Cytochrome c peroxidase (EC 1.11.1.5) mutant | 6ccp- | 2pcbA | 13 | 226 | 69.2 | 29.0 |
| BLIP | Bound | 1jtgB | 30 | 136 | 40.0 | 63.2 |
| Mean | 22 | 174 | 42.2 | 44.5 | ||
| Enzyme ( | 21 | 174 | 43 | 47.3 | ||
| Inhibitor ( | 21 | 149 | 68.6 | 65.6 | ||
| Others (25) | 23 | 181 | 34.1 | 35.8 |
aThe same dataset has been used by Neuvirth et al. (10) except BLIP. The unbound structure of BLIP was not available in PDB and we used the bound structure instead.
Effect of individual and combined scores
| No. | Energy scorea | Conservation scorea | Interface propensitya | Coverage (%) | Accuracy (%) | Interface rankb | Differencec |
|---|---|---|---|---|---|---|---|
| 1 | 1 | 19.2 | 18.7 | 2.7 | 0.46 (kcal·mol−1) | ||
| 2 | 1 | 22.6 | 24.7 | 4.6 | 0.12 | ||
| 3 | 1 | 37.1 | 35.2 | 2.8 | 0.20 | ||
| 4 | 1 | 1.2 | 25.9 | 27.0 | 2.3 | 0.61 | |
| 5 | 1 | 7 | 38.7 | 39.6 | 2.0 | 1.88 | |
| 6 | 1 | 3 | 37.7 | 39.5 | 2.8 | 0.73 | |
| 7 | 1 | 1.6 | 6 | 43.1 | 45.2 | 2.0 | 1.87 |
aThe weights of the combined scores are optimized to achieve the highest prediction accuracy on the training set of 57 proteins.
bThe rank of the observed interface against the generated surface patches, which have the same size as the interface, is divided into 10 equally sized categories for each protein and the ranks of the 57 proteins are averaged.
cThe difference between the interface and the rest of the surface residues as calculated by individual or combined scores.
Figure 1Comparison of protein interfaces predicted by single and combined scoring terms Red, predicted interface; green, observed interface; yellow, overlapped regions between predicted and observed interfaces. The interfaces were predicted by residue interface propensity (a), conservation score (b), energy score (c) and combination of them (d), respectively. There are actually two separate interfaces predicted by the conservation score. The small one is overlapped with the observed interface.
The average side chain energies of interface, surrounding and other surface residuesa
| Interface | Surrounding | Other surface | |
|---|---|---|---|
| Mean energies(kcal·mol−1) | −0.79 | −1.29 | −1.18 |
| SD | 1.80 | 1.80 | 1.91 |
| No. of residues | 1271 | 1065 | 7579 |
aThe surrounding residues have a lower energy than interface residues and other surface residues. The P-values are 5.9 × 10−12 and 0.038, respectively, as calculated by student's t-test. The surrounding residues are defined as those surface residues with side chain atoms within 1 Å plus sum of van der Waals radii of the two interacting atoms from any side chain in the interface.
Testing PINUP with the whole (in parentheses) and a subset of non-homologous proteins in the protein–protein docking benchmark 2.0a
| Subsetb | Category | No. of non-homologous proteins (all)c | Average (all) | ||
|---|---|---|---|---|---|
| Coverage (%) | Accuracy (%) | Expected accuracy (%) | |||
| Rigid body | Enzyme | 6 (21) | 30.9 (42.1) | 31.0 (50.5) | 14.6 (14.3) |
| Inhibitor | 11 (21) | 49.5 (60.4) | 50.6 (57.4) | 24.7 (25.7) | |
| Others | 25 (44) | 29.0 (31.7) | 23.5 (29.5) | 10.8 (13.1) | |
| Medium difficult | Enzyme | 1 (2) | 0.0 (32.5) | 0.0 (26) | 5.1 (8.1) |
| Inhibitor | 1 (2) | 84.6 (70.1) | 61.1 (61.8) | 20 (23.4) | |
| Others | 16 (20) | 25.0 (24.7) | 28.1 (30.3) | 12.1 (13.1) | |
| Difficult | Others | 8 (14) | 16.9 (18.4) | 20.0 (21.4) | 14.7 (13.0) |
| All | 68 (124) | 30.5 (36.3) | 29.4 (37.5) | 14.2 (15.5) | |
aBenchmark 2.0 of Chen et al. (45) except antibody–antigen complexes.
bSubset is based on the magnitude of conformational change after binding (45).
cThe number of proteins that share sequence identity <35% with any protein in the 57-protein dataset compiled by Neuvirth et al. (10). The number in parentheses is averaged for all proteins in the category.