| Literature DB >> 17470296 |
Benoît H Dessailly1, Marc F Lensink, Shoshana J Wodak.
Abstract
BACKGROUND: Most methods for predicting functional sites in protein 3D structures, rely on information on related proteins and cannot be applied to proteins with no known relatives. Another limitation of these methods is the lack of a well annotated set of functional sites to use as benchmark for validating their predictions. Experimental findings and theoretical considerations suggest that residues involved in function often contribute unfavorably to the native state stability. We examine the possibility of systematically exploiting this intrinsic property to identify functional sites using an original procedure that detects destabilizing regions in protein structures. In addition, to relate destabilizing regions to known functional sites, a novel benchmark consisting of a diverse set of hand-curated protein functional sites is derived. r> RESULTS: A procedure for detecting clusters of destabilizing residues in protein structures is presented. Individual residue contributions to protein stability are evaluated using detailed atomic models and a force-field successfully applied in computational protein design. The most destabilizing residues, and some of their closest neighbours, are clustered into destabilizing regions following a rigorous protocol. Our procedure is applied to high quality apo-structures of 63 unrelated proteins. The biologically relevant binding sites of these proteins were annotated using all available information, including structural data and literature curation, resulting in the largest hand-curated dataset of binding sites in proteins available to date. Comparing the destabilizing regions with the annotated binding sites in these proteins, we find that the overlap is on average limited, but significantly better than random. Results depend on the type of bound ligand. Significant overlap is obtained for most polysaccharide- and small ligand-binding sites, whereas no overlap is observed for most nucleic acid binding sites. These differences are rationalised in terms of the geometry and energetics of the binding site. r> CONCLUSION: We find that although destabilizing regions as detected here can in general not be used to predict binding sites in protein structures, they can provide useful information, particularly on the location of functional sites that bind polysaccharides and small ligands. This information can be exploited in methods for predicting function in protein structures with no known relatives. Our publicly available benchmark of hand-curated functional sites in proteins should help other workers derive and validate new prediction methods.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17470296 PMCID: PMC1890302 DOI: 10.1186/1471-2105-8-141
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Properties of known binding sites of the dataset proteins.
| Pdb id | Holo-pdb ids | N res | F res | ASA | F ASA | Cleft |
| 13 | 4.2 | 272 | 2.1 | T | ||
| 8 | 6.9 | 193 | 1.0 | T | ||
| 31 | 8.5 | 1158 | 3.7 | T | ||
| 24 | 8.3 | 976 | 7.4 | T | ||
| 4 | 6.0 | 119 | 0.9 | F | ||
| 14 | 20.9 | 110 | 0.8 | F | ||
| 24 | 10.8 | 947 | 8.7 | T | ||
| 4 | 1.5 | 329 | 1.4 | F | ||
| 17 | 11.3 | 1006 | 2.8 | T | ||
| 22 | 16.4 | 663 | 3.0 | T | ||
| 34 | 10.1 | 820 | 5.6 | F | ||
| 25 | 10.7 | 339 | 0.8 | T | ||
| 15 | 7.4 | 669 | 4.0 | T | ||
| 19 | 6.1 | 437 | 3.2 | T | ||
| 12 | 11.2 | 783 | 11.9 | T | ||
| 15 | 4.3 | 268 | 1.0 | T | ||
| 18 | 11.5 | 505 | 3.9 | T | ||
| 10 | 6.6 | 560 | 6.2 | T | ||
| 8 | 8.9 | 450 | 5.0 | F | ||
| 12 | 4.1 | 107 | 0.3 | T | ||
| 27 | 19.0 | 1191 | 5.6 | F | ||
| 12 | 3.1 | 471 | 3.2 | T | ||
| 15 | 4.2 | 472 | 3.5 | T | ||
| 41 | 8.5 | 1572 | 8.9 | T | ||
| 16 | 3.2 | 565 | 1.5 | T | ||
| 14 | 4.6 | 590 | 5.2 | T | ||
| 10 | 2.9 | 265 | 1.8 | T | ||
| 16 | 3.9 | 256 | 1.7 | T | ||
| 8 | 6.7 | 642 | 11.2 | T | ||
| 20 | 15.5 | 891 | 13.6 | T | ||
| 12 | 9.0 | 376 | 5.9 | F | ||
| 9 | 6.8 | 275 | 4.2 | T | ||
| 18 | 2.5 | 485 | 1.7 | T | ||
| 7 | 2.3 | 97 | 0.9 | T | ||
| 21 | 5.9 | 450 | 3.4 | F | ||
| 17 | 14.2 | 854 | 4.2 | T | ||
| 16 | 5.7 | 885 | 6.6 | T | ||
| 29 | 23.8 | 1760 | 13.8 | F | ||
| 15 | 13.0 | 971 | 5.0 | F | ||
| 10 | 7.9 | 775 | 11.7 | F | ||
| 15 | 11.8 | 1212 | 18.4 | T | ||
| 26 | 22.8 | 1819 | 28.2 | F | ||
| 12 | 9.5 | 604 | 5.2 | F | ||
| 21 | 31.3 | 1405 | 33.2 | F | ||
| 36 | 26.7 | 2246 | 28.9 | F | ||
| 22 | 12.2 | 1023 | 6.1 | T | ||
| 14 | 19.7 | 937 | 21.7 | T | ||
| 15 | 28.8 | 1236 | 29.5 | T | ||
| 19 | 8.1 | 1620 | 13.5 | F | ||
| 20 | 8.5 | 1386 | 11.6 | F | ||
| 18 | 9.2 | 1301 | 13.1 | F | ||
| 42 | 16.0 | 2796 | 21.6 | F | ||
| 12 | 14.3 | 770 | 8.8 | T | ||
| 8 | 5.1 | 352 | 2.0 | T | ||
| 18 | 15.8 | 1375 | 21.3 | F | ||
| 9 | 6.7 | 245 | 3.2 | T | ||
| 30 | 29.1 | 2289 | 30.2 | T | ||
| 5 | 4.3 | 497 | 7.4 | F | ||
| 30 | 13.5 | 1539 | 15.0 | T | ||
| 19 | 9.7 | 1229 | 12.4 | F | ||
| 17 | 6.5 | 364 | 2.8 | T | ||
| 10 | 15.2 | 691 | 9.0 | F | ||
| 10 | 9.0 | 1009 | 15.0 | F | ||
| 21 | 11.6 | 350 | 2.1 | T | ||
| 14 | 3.8 | 465 | 2.9 | F | ||
| 7 | 5.5 | 350 | 5.3 | T | ||
| 6 | 1.6 | 67 | 0.4 | T | ||
| 26 | 12.6 | 1526 | 11.9 | F | ||
| 19 | 8.6 | 365 | 2.2 | T | ||
| 16 | 4.5 | 627 | 4.4 | T | ||
| 19 | 3.6 | 872 | 3.9 | F | ||
| 19 | 8.0 | 586 | 1.8 | F | ||
| 17 | 13.9 | 492 | 3.8 | T | ||
| 49 | 10.6 | 2224 | 11.3 | F |
Properties of known binding sites of the dataset proteins. Binding sites are classified according to their type of ligand. The last 5 categories refer to binding sites where 2 types of ligand can bind.
Pdb identifier of structure used for energy calculations.
Pdb identifiers of the structures of the protein-ligand complex used to define the binding site.
Number of residues in binding site.
Fraction of protein residues in binding site (in %).
Total ASA of binding site residues.
Fraction of protein ASA in binding site (in %).
True (T) if binding site sits in a cleft, False (F) otherwise.
1gus appears twice here because it has 2 distinct binding sites for small ligands. The same observation applies to 1e6l and 1uns that have 2 distinct binding sites for different proteins.
Figure 1Amino acid composition in proteins of our dataset, functional sites and destabilizing regions. Mean proportions of residue types for all residues in the dataset, binding sites residues and destabilizing regions residues. Residues are sorted with increasing hydrophobicity according to Kyte-Doolittle scale [76].
Properties of destabilizing regions detected in the dataset proteins.
| Pdb id | N Reg | N res | F res | ASA | F ASA | Cleft |
| 1 | 7 | 5.3 | 196 | 3.0 | F | |
| 1 | 9 | 2.9 | 331 | 2.5 | T | |
| 1 | 10 | 3.2 | 290 | 2.2 | F | |
| 1 | 19 | 6.1 | 256 | 1.9 | F | |
| 1 | 18 | 3.9 | 170 | 0.9 | T | |
| 1 | 18 | 2.5 | 450 | 1.6 | T | |
| 1 | 11 | 1.5 | 405 | 1.4 | T | |
| 1 | 4 | 0.6 | 143 | 0.5 | T | |
| 1 | 6 | 4.7 | 410 | 6.2 | T | |
| 1 | 4 | 1.3 | 207 | 1.9 | T | |
| 1 | 17 | 5.5 | 495 | 4.5 | F | |
| 1 | 7 | 2.3 | 546 | 4.9 | F | |
| 1 | 9 | 2.9 | 82 | 0.7 | F | |
| 1 | 4 | 1.3 | 196 | 1.8 | T | |
| 1 | 4 | 1.6 | 230 | 2.0 | F | |
| 1 | 10 | 14.9 | 644 | 15.2 | F | |
| 4 | 22 | 9.3 | 152 | 2.0 | F | |
| 1 | 8 | 5.9 | 573 | 7.4 | F | |
| 1 | 8 | 2.2 | 309 | 2.3 | F | |
| 1 | 10 | 2.8 | 114 | 0.9 | F | |
| 1 | 10 | 2.8 | 534 | 4.0 | F | |
| 2 | 6 | 1.6 | 434 | 2.8 | T | |
| 2 | 11 | 3.0 | 465 | 3.0 | T | |
| 1 | 6 | 2.1 | 422 | 3.2 | F | |
| 1 | 20 | 6.9 | 571 | 4.4 | T | |
| 1 | 4 | 1.4 | 289 | 2.2 | T | |
| 1 | 6 | 2.1 | 234 | 1.8 | F | |
| 4 | 13 | 12.9 | 262 | 7.6 | T | |
| 1 | 12 | 5.4 | 107 | 1.0 | T | |
| 1 | 6 | 2.7 | 386 | 3.5 | T | |
| 1 | 12 | 2.7 | 546 | 3.3 | T | |
| 2 | 9 | 4.1 | 179 | 2.2 | T | |
| 1 | 21 | 5.9 | 493 | 3.5 | F | |
| 1 | 4 | 1.1 | 195 | 1.4 | T | |
| 2 | 5 | 1.9 | 425 | 3.6 | F | |
| 2 | 18 | 6.8 | 816 | 6.8 | T | |
| 6 | 7 | 4.7 | 114 | 1.8 | F | |
| 6 | 13 | 8.7 | 748 | 12.6 | T | |
| 4 | 4 | 3.0 | 185 | 3.2 | T | |
| 1 | 14 | 4.2 | 553 | 3.8 | T | |
| 1 | 5 | 1.5 | 256 | 1.7 | F | |
| 1 | 19 | 5.7 | 316 | 2.2 | F | |
| 1 | 8 | 2.4 | 662 | 4.5 | T | |
| 4 | 5 | 4.2 | 429 | 8.4 | T | |
| 1 | 11 | 5.3 | 705 | 5.5 | T | |
| 1 | 6 | 2.9 | 408 | 3.2 | F | |
| 1 | 4 | 1.9 | 304 | 2.4 | T | |
| 1 | 25 | 6.5 | 587 | 4.0 | F | |
| 1 | 6 | 1.6 | 255 | 1.7 | F | |
| 1 | 6 | 1.6 | 415 | 2.8 | F | |
| 1 | 6 | 1.6 | 144 | 1.0 | F | |
| 1 | 11 | 2.4 | 376 | 1.9 | F | |
| 1 | 11 | 2.4 | 679 | 3.5 | F | |
| 1 | 13 | 2.8 | 608 | 3.1 | F | |
| 1 | 5 | 1.1 | 315 | 1.6 | F | |
| 1 | 4 | 3.5 | 346 | 5.1 | T | |
| 1 | 7 | 2.0 | 198 | 1.5 | T | |
| 1 | 5 | 1.4 | 296 | 2.2 | F | |
| 1 | 7 | 2.0 | 313 | 2.3 | F | |
| 1 | 4 | 1.1 | 211 | 1.6 | F | |
| 1 | 4 | 1.1 | 248 | 1.8 | F | |
| 1 | 7 | 2.0 | 190 | 1.4 | T | |
| 1 | 6 | 1.2 | 314 | 1.8 | F | |
| 1 | 7 | 1.5 | 226 | 1.3 | T | |
| 1 | 4 | 0.8 | 138 | 0.8 | F | |
| 1 | 7 | 1.5 | 93 | 0.5 | T | |
| 1 | 4 | 1.1 | 217 | 1.3 | T | |
| 1 | 15 | 4.2 | 318 | 1.9 | T | |
| 3 | 10 | 2.1 | 547 | 3.6 | F | |
| 6 | 12 | 5.1 | 276 | 3.6 | T | |
| 3 | 23 | 4.9 | 54 | 0.3 | T | |
| 1 | 6 | 2.1 | 97 | 0.7 | F | |
| 1 | 5 | 1.8 | 105 | 0.8 | T | |
| 1 | 7 | 2.5 | 377 | 2.8 | T | |
| 1 | 9 | 3.2 | 423 | 3.1 | F | |
| 2 | 11 | 5.4 | 395 | 4.8 | T | |
| 2 | 9 | 4.4 | 324 | 3.8 | T | |
| 1 | 11 | 1.1 | 455 | 1.2 | F | |
| 1 | 15 | 1.5 | 149 | 0.4 | F | |
| 2 | 14 | 2.8 | 765 | 4.0 | F | |
| 2 | 4 | 0.8 | 236 | 1.2 | F | |
| 1 | 4 | 0.8 | 291 | 1.3 | F | |
| 1 | 17 | 3.3 | 438 | 2.0 | F | |
| 1 | 14 | 2.7 | 641 | 2.9 | F | |
| 1 | 20 | 3.8 | 504 | 2.3 | T | |
| 1 | 5 | 1.0 | 325 | 1.5 | F | |
| 1 | 4 | 1.8 | 300 | 2.9 | F | |
| 1 | 14 | 6.3 | 704 | 6.9 | T | |
| 1 | 6 | 2.7 | 329 | 3.2 | T | |
| 1 | 24 | 7.9 | 328 | 2.9 | T | |
| 1 | 5 | 1.7 | 170 | 1.5 | F | |
| 1 | 5 | 1.5 | 359 | 2.4 | T | |
| 1 | 33 | 9.6 | 1164 | 7.9 | F | |
| 1 | 7 | 2.0 | 165 | 1.1 | T | |
| 1 | 6 | 1.6 | 129 | 0.8 | T | |
| 1 | 4 | 1.1 | 319 | 2.0 | T | |
| 1 | 6 | 1.6 | 417 | 2.6 | F | |
| 1 | 21 | 5.7 | 828 | 5.2 | T | |
| 1 | 4 | 1.3 | 330 | 2.4 | F | |
| 1 | 22 | 7.0 | 663 | 4.9 | T | |
| 2 | 5 | 4.1 | 233 | 3.6 | T | |
| 1 | 4 | 1.7 | 246 | 2.1 | T | |
| 1 | 7 | 3.0 | 262 | 2.2 | F | |
| 1 | 7 | 3.6 | 368 | 3.7 | F | |
| 1 | 5 | 1.9 | 251 | 1.9 | T | |
| 1 | 30 | 4.3 | 715 | 2.7 | T | |
| 2 | 13 | 3.8 | 340 | 2.6 | T | |
| 2 | 8 | 5.1 | 304 | 4.8 | F | |
| 1 | 13 | 3.2 | 50 | 0.3 | F | |
| 1 | 5 | 1.2 | 53 | 0.3 | T | |
| 1 | 6 | 1.5 | 182 | 1.2 | F | |
| 1 | 11 | 2.7 | 356 | 2.3 | F | |
| 1 | 10 | 8.3 | 455 | 7.9 | F | |
| 1 | 7 | 4.6 | 431 | 4.8 | T | |
| 2 | 13 | 14.4 | 750 | 16.6 | T | |
| 8 | 7 | 4.8 | 131 | 2.4 | F | |
| 8 | 8 | 5.5 | 83 | 1.6 | F | |
| 1 | 14 | 10.9 | 592 | 9.0 | F | |
| 1 | 8 | 1.4 | 274 | 1.3 | T | |
| 2 | 8 | 2.8 | 1 | 0.0 | F | |
| 4 | 10 | 7.0 | 415 | 7.6 | T |
Pdb identifier of structure used for energy calculations.
Number of equivalent destabilizing regions identified in the structure (relevant for multimers only).
Number of residues in destabilizing region.
Fraction of protein residues in destabilizing region (in %).
Total ASA of destabilizing region residues.
Fraction of protein ASA in destabilizing region (in %).
True (T) if destabilizing region sits in a cleft, False (F) otherwise.
Figure 2Average Van der Waals, electrostatics and solvation contributions, and total free energy difference, for each residue type. Average values of the Van der Waals, electrostatics and solvation terms, and of the total free energy difference, for each residue type, computed over (a) all residues in the dataset, and (b) destabilizing residues. Standard deviations are indicated as error bars. Residues are sorted with increasing hydrophobicity according to Kyte-Doolittle scale [76].
Details of the intersection between binding sites and destabilizing regions.
| Pdb id | N site | N des. | N IR | Sens. | PPV | Exp N IR | P-value |
| 13 | 38 | 7 | 53.8 | 18.4 | 1.6 | 0.00023 | |
| 32 | 18 | 8 | 25.0 | 44.4 | 1.2 | 5.9e-06 | |
| 62 | 53 | 10 | 16.1 | 18.9 | 4.5 | 0.00999 | |
| 24 | 36 | 14 | 58.3 | 38.9 | 3.0 | 1.5e-08 | |
| 24 | 73 | 0 | 0.0 | 0.0 | 4.4 | 1.0 | |
| 84 | 73 | 41 | 48.8 | 56.2 | 15.2 | 6.2e-14 | |
| 24 | 18 | 7 | 29.2 | 38.9 | 1.9 | 0.00100 | |
| 8 | 46 | 0 | 0.0 | 0.0 | 0.7 | 1.0 | |
| 102 | 126 | 30 | 29.4 | 23.8 | 14.3 | 1.2e-05 | |
| 88 | 16 | 16 | 18.2 | 100.0 | 2.6 | 8.2e-14 | |
| 34 | 46 | 12 | 35.3 | 26.1 | 4.7 | 0.00061 | |
| 150 | 190 | 60 | 40.0 | 31.6 | 20.3 | 3.3e-18 | |
| 30 | 46 | 13 | 43.3 | 28.3 | 3.4 | 2.9e-06 | |
| 19 | 26 | 6 | 31.6 | 23.1 | 1.6 | 0.00232 | |
| 30 | 56 | 14 | 46.7 | 25.0 | 2.4 | 5.6e-09 | |
| 36 | 28 | 15 | 41.7 | 53.6 | 3.2 | 5.5e-09 | |
| 10 | 7 | 6 | 60.0 | 85.7 | 0.5 | 9.6e-08 | |
| 16 | 24 | 0 | 0.0 | 0.0 | 2.1 | 1.0 | |
| 48 | 101 | 16 | 33.3 | 15.8 | 4.1 | 6.5e-07 | |
| 108 | 66 | 0 | 0.0 | 0.0 | 12.6 | 1.0 | |
| 12 | 43 | 7 | 58.3 | 16.3 | 1.4 | 7.3e-05 | |
| 15 | 34 | 12 | 80.0 | 35.3 | 1.4 | 3.2e-11 | |
| 41 | 24 | 8 | 19.5 | 33.3 | 2.0 | 0.00037 | |
| 32 | 53 | 4 | 12.5 | 7.5 | 1.7 | 0.08496 | |
| 14 | 29 | 10 | 71.4 | 34.5 | 1.3 | 1e-08 | |
| 10 | 45 | 4 | 40.0 | 8.9 | 1.3 | 0.03039 | |
| 16 | 35 | 10 | 62.5 | 28.6 | 1.4 | 3.1e-08 | |
| 8 | 10 | 0 | 0.0 | 0.0 | 0.7 | 1.0 | |
| 20 | 14 | 6 | 30.0 | 42.9 | 2.2 | 0.00887 | |
| 9 | 7 | 0 | 0.0 | 0.0 | 0.5 | 1.0 | |
| 18 | 33 | 4 | 22.2 | 12.1 | 0.8 | 0.00748 | |
| 7 | 41 | 0 | 0.0 | 0.0 | 0.9 | 1.0 | |
| 21 | 28 | 4 | 19.0 | 14.3 | 1.6 | 0.07134 | |
| 68 | 22 | 12 | 17.6 | 54.5 | 3.1 | 5.6e-06 | |
| 16 | 27 | 5 | 31.2 | 18.5 | 1.5 | 0.01154 | |
| 60 | 18 | 0 | 0.0 | 0.0 | 2.3 | 1.0 | |
| 10 | 6 | 3 | 30.0 | 50.0 | 0.5 | 0.00632 | |
| 15 | 6 | 0 | 0.0 | 0.0 | 0.7 | 1.0 | |
| 24 | 4 | 0 | 0.0 | 0.0 | 0.4 | 1.0 | |
| 21 | 10 | 2 | 9.5 | 20.0 | 3.1 | 0.89027 | |
| 36 | 8 | 6 | 16.7 | 75.0 | 2.1 | 0.00465 | |
| 44 | 19 | 11 | 25.0 | 57.9 | 2.3 | 9.6e-07 | |
| 19 | 4 | 0 | 0.0 | 0.0 | 0.9 | 1.0 | |
| 20 | 4 | 4 | 20.0 | 100.0 | 0.9 | 0.00852 | |
| 18 | 7 | 2 | 11.1 | 28.6 | 0.7 | 0.12811 | |
| 42 | 5 | 4 | 9.5 | 80.0 | 0.8 | 0.00254 | |
| 5 | 4 | 0 | 0.0 | 0.0 | 0.2 | 1.0 | |
| 30 | 24 | 12 | 40.0 | 50.0 | 3.2 | 3.7e-06 | |
| 19 | 7 | 0 | 0.0 | 0.0 | 0.7 | 1.0 | |
| 9 | 8 | 0 | 0.0 | 0.0 | 0.5 | 1.0 | |
| 17 | 5 | 0 | 0.0 | 0.0 | 0.3 | 1.0 | |
| 42 | 19 | 7 | 16.7 | 36.8 | 2.2 | 0.00305 | |
| 14 | 37 | 4 | 28.6 | 10.8 | 1.4 | 0.04088 | |
| 7 | 6 | 0 | 0.0 | 0.0 | 0.3 | 1.0 | |
| 6 | 37 | 1 | 16.7 | 2.7 | 0.6 | 0.47097 | |
| 26 | 21 | 6 | 23.1 | 28.6 | 2.6 | 0.03234 | |
| 38 | 33 | 17 | 44.7 | 51.5 | 2.9 | 8.6e-12 | |
| 16 | 25 | 10 | 62.5 | 40.0 | 1.1 | 2.8e-09 | |
| 19 | 60 | 14 | 73.7 | 23.3 | 2.2 | 1.2e-10 | |
| 76 | 88 | 48 | 63.2 | 54.5 | 7.0 | 1.4e-36 | |
| 34 | 19 | 0 | 0.0 | 0.0 | 2.6 | 1.0 | |
| 49 | 40 | 12 | 24.5 | 30.0 | 4.2 | 0.00032 | |
No destabilizing regions were detected in 11 entries of the dataset (1utx, 1gv2, 1eao, 1e7l, 1vyi, 1w9s, 1upq, 1w53, 1tgr, 1r29, 1sif) and these entries are not listed in this table.
Pdb identifier of structure used for energy calculations.
Number of residues in binding site.
Number of residues in destabilising region.
Number of residues in intersection region.
Sensitivity (in %).
Positive predictive value (in %).
Expected number of residues in intersection region (see text).
1gus appears twice here because it has 2 distinct binding sites for small ligands. The same observation applies to 1e6l and 1uns that have 2 distinct binding sites for different proteins.
Average sensitivity (and standard deviation) for the given ligand type.
Average PPV (and standard deviation) for the given ligand type.
Examples of known binding sites and destabilizing regions identified in 3 proteins. Each protein is represented twice: its binding sites (residues colored green) and ligands (displayed and colored as cpk) are shown on the left panel, whereas destabilizing regions (residues colored orange or cyan) are shown on the right panel. Represented residues are all displayed as "balls-and-sticks". Ligands considered as biologically irrelevant are displayed on each panel as balls-and-sticks and colored cpk. Pdb ids used to reference subfigures are those used in the text and tables. (a) Endoglucanase B (Pdb id 1qi2 and 1qhz used for left and right panels, respectively), a protein with a polysaccharide-binding site. Backbone is displayed as coil and colored grey. PPV = 34.5%, Sensitivity = 71.4%. Two destabilizing regions are detected in this protein (one in orange and the other in cyan). (b) Phytase (Pdb id 1h6l and 2poo used for left and right panels, respectively), a protein with a small ligand-binding site. The backbone is displayed as cartoons and colored grey. PPV = 40.0%, Sensitivity = 62.5%. Two destabilizing regions are detected in this protein (one in orange and the other in cyan). (c) AML-1 (Pdb id 1h9d and 1eao used for left and right panels, respectively), a protein with a protein-binding site and a nucleic acid-binding site. The bound protein, CBF-β, is represented as cartoons and colored dark-red. AML-1 is displayed as coil and colored grey. No destabilizing region was detected in this protein. Figures 3 and 5 were drawn with Molscript [77] and rendered with Raster3D [78].
Overlap between destabilizing regions and binding sites according to ligand type
| Ligand type | Nu | Po | Pr | Pe | Sm | Me | Li | Total |
| Sig. Overlap | 1 (5.7) | 8 (6.2) | 7 (9.6) | 5 (5.7) | 19 (13.6) | 5 (4.5) | 2 (1.7) | 47 |
| No. Overlap | 9 (4.3) | 3 (4.8) | 10 (7.4) | 5 (4.3) | 5 (10.4) | 3 (3.5) | 1 (1.3) | 36 |
| Total | 10 | 11 | 17 | 10 | 24 | 8 | 3 | 83 |
Contingency table used to perform a Fisher exact test of homogeneity, among different categories of binding sites (based on ligand types), of the fraction of statistically significant overlaps (i.e. P-value ≤ 0.05, see Methods for meaning of the P-value) between destabilizing regions and known binding sites. Abbreviations used: Nu, nucleic-acid; Po, polysaccharide; Pr, protein; Pe, peptide; Sm, small compound; Me, metal ion; Li, lipid; Sig. Overlap, statistically significant overlap; No. Overlap, statistically non-significant or absent overlap. Calculated expected numbers of statistically significant overlaps are given between brackets, below the corresponding observed numbers.
Figure 4Thermodynamic cycle for calculating the contribution of a side-chain to the protein folding free energy. ΔGis the folding free energy of the protein in the presence of all amino acids including the one at position i. (BB) is the folding free energy of the protein in the absence of the side chain at position i. ΔG(SC) is the free energy cost of introducing the side chain of residue i into the water solvent. ΔG(SC) is the free energy cost of introducing the same side chain into the folded protein structure. ΔG(SC) includes the energy of interaction of the side chain with the surrounding residues in the protein structure, as well as the cost of burying the atoms of both the side chain and the surrounding protein structure.
Figure 5Destabilizing regions detection procedure. (a) clustering of highly destabilizing residues (red) that are less than 9.0 Å apart. (b) addition of destabilizing residues (orange) that are within 6.0 Å of a destabilizing residue already present in a destabilizing region. (c) Final result. Only the destabilizing regions larger than 4 residues are considered. In (a) and (b) are represented residues (cyan), destabilizing or not, which are enclosed in a sphere centered on a pair of destabilizing residues and therefore added to the destabilizing region (see text for more details).