| Literature DB >> 25740680 |
James D Alsop1, Julie C Mitchell1,2.
Abstract
Proteins are essential elements of biological systems, and their function typically relies on their ability to successfully bind to specific partners. Recently, an emphasis of study into protein interactions has been on hot spots, or residues in the binding interface that make a significant contribution to the binding energetics. In this study, we investigate how conservation of hot spots can be used to guide docking prediction. We show that the use of evolutionary data combined with hot spot prediction highlights near-native structures across a range of benchmark examples. Our approach explores various strategies for using hot spots and evolutionary data to score protein complexes, using both absolute and chemical definitions of conservation along with refinements to these strategies that look at windowed conservation and filtering to ensure a minimum number of hot spots in each binding partner. Finally, structure-based models of orthologs were generated for comparison with sequence-based scoring. Using two data sets of 22 and 85 examples, a high rate of top 10 and top 1 predictions are observed, with up to 82% of examples returning a top 10 hit and 35% returning top 1 hit depending on the data set and strategy applied; upon inclusion of the native structure among the decoys, up to 55% of examples yielded a top 1 hit. The 20 common examples between data sets show that more carefully curated interolog data yields better predictions, particularly in achieving top 1 hits. Proteins 2015; 83:1940-1946.Entities:
Keywords: hot spot; interolog; molecular evolution; mutagenesis; ortholog; protein-protein docking
Mesh:
Substances:
Year: 2015 PMID: 25740680 PMCID: PMC5054918 DOI: 10.1002/prot.24788
Source DB: PubMed Journal: Proteins ISSN: 0887-3585
Figure 1The benchmark example 1XQS highlights some of the possible scoring variations produced by the eight sequence‐based scoring strategies. The filtered strategies remove some of the high‐scoring non‐native predictions. The benefit of windowed and chemical conservation varies among examples; in this case, the windowed strategies performed somewhat worse overall than the unwindowed strategies. The chemical strategies performed slightly better than the strategies based on absolute sequence conservation, particularly on the native structure.
For Each of the Eight Strategies (A, AF, … CW, CFW), the Number of Systems Returning Top 10 and Top 1 Hits is Given, Along with the Number of Viable Systems for Which Hits Were Present
| Viable | A | AF | AW | AFW | C | CF | CW | CFW | ANY | |
|---|---|---|---|---|---|---|---|---|---|---|
| Top 10 Results | ||||||||||
| Data Set 1 | 17 | 10 | 13 | 9 | 11 | 10 | 13 | 12 | 14 | 15 |
| Data Set 1 + Native | 22 | 13 | 17 | 11 | 14 | 13 | 15 | 15 | 17 | 19 |
| Data Set 2 | 55 | 24 | 25 | 23 | 23 | 23 | 25 | 23 | 25 | 37 |
| Data Set 2 + Native | 85 | 32 | 33 | 27 | 28 | 29 | 33 | 32 | 31 | 52 |
| Top 1 Results | ||||||||||
| Data Set 1 | 17 | 2 | 4 | 5 | 6 | 2 | 5 | 6 | 6 | 12 |
| Data Set 1 + Native | 22 | 5 | 5 | 8 | 7 | 6 | 5 | 12 | 10 | 18 |
| Data Set 2 | 55 | 8 | 13 | 8 | 12 | 9 | 8 | 9 | 12 | 23 |
| Data Set 2 + Native | 85 | 12 | 17 | 12 | 13 | 12 | 13 | 17 | 15 | 36 |
By adding the native structure, hits are present in all systems, and the analysis is repeated using this data.
Using Only the 20 Examples Common to Data Sets 1 and 2, the Table Gives the Number of Systems Returning Top 10 and Top 1 Hits Is Given, Along with the Number of Viable Systems for Which Hits Were Present
| Viable | A | AF | AW | AFW | C | CF | CW | CFW | ANY | |
|---|---|---|---|---|---|---|---|---|---|---|
| Top 10 Results | ||||||||||
| Data Set 1 | 16 | 10 | 12 | 8 | 10 | 10 | 12 | 11 | 13 | 14 |
| Data Set 1 + Native | 20 | 13 | 15 | 10 | 13 | 12 | 13 | 13 | 15 | 17 |
| Data Set 2 | 16 | 6 | 7 | 7 | 7 | 5 | 8 | 7 | 8 | 11 |
| Data Set 2 + Native | 20 | 8 | 9 | 7 | 9 | 6 | 10 | 8 | 9 | 13 |
| Top 1 Results | ||||||||||
| Data Set 1 | 16 | 2 | 4 | 4 | 5 | 2 | 5 | 5 | 5 | 11 |
| Data Set 1 + Native | 20 | 5 | 5 | 7 | 6 | 6 | 5 | 10 | 8 | 16 |
| Data Set 2 | 16 | 1 | 3 | 1 | 3 | 1 | 1 | 1 | 4 | 7 |
| Data Set 2 + Native | 20 | 2 | 5 | 2 | 3 | 2 | 2 | 3 | 5 | 10 |
By adding the native structure, hits are present in all systems, and the analysis is repeated using this data.