| Literature DB >> 20398384 |
Abstract
BACKGROUND: Empirical scoring functions have proven useful in protein structure modeling. Most such scoring functions depend on protein side chain conformations. However, backbone-only scoring functions do not require computationally intensive structure optimization and so are well suited to protein design, which requires fast score evaluation. Furthermore, scoring functions that account for the distinctive relative position and orientation preferences of residue pairs are expected to be more accurate than those that depend only on the separation distance.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20398384 PMCID: PMC2874805 DOI: 10.1186/1471-2105-11-192
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 11D log-odds scores as a function of C. The Cys-Cys function has a peak near the typical Cseparation for disulfide bonds, in the range of 3.5-4.0 Å and is negative for large separations. On the contrary, the score for the same-charge Glu-Glu pairs is negative for small separations and positive for large separations, reflecting the electrostatic energy penalty for close proximity. Both the Cys-Cys and Glu-Glu scores are among the most accurate because of these physical constraints on their separations. The Ala-Ala score, shown for comparison, manifests an oscillatory behavior with a peak near that of the Cys-Cys score.
A comparison of the accuracy of the new scoring functions with other residue pair scoring functions from the literature in detecting correct threading solutions
| Residue pair scoring function | Rank 1 | Rank ≤ 50 | Rank ≤ 100 |
|---|---|---|---|
| 2357 (70.8%) | 2945 (88.5%) | 3030 (91.0%) | |
| 2774 (83.4%) | 3128 (94.0%) | 3183 (95.6%) | |
| 2911 (87.5%) | 3200 (96.2%) | 3231 (97.1%) | |
| 2714 (81.6%) | 3033 (91.1%) | 3068 (92.2%) | |
| 2054 (61.7%) | 2737 (82.2%) | 2847 (85.5%) | |
| 623 (18.7%) | 1632 (49.0%) | 1906 (57.3%) | |
| 1386 (41.6%) | 2378 (71.5%) | 2559 (76.9%) | |
| 1396 (41.9%) | 2391 (71.8%) | 2548 (76.6%) | |
The 5000 threading solutions for each of the 3328 template structures were ranked using either the 1D, 3D, or 6D residue pair scores or five different empirical potentials with available published parameters. The empirical potentials evaluated were from Miyazawa and Jernigan 1999 [16] (MJ1999), Tobi and Elber 2000 (TE2000), Rajgaria, McAllister, and Floudas 2006 [15] (RMF2006), and Rajgaria, McAllister, and Floudas 2008 [17] (RMF2008). The values indicate the number of protein structures for which the rank of the native sequence was within the indicated percentile. The corresponding percentage of the total structures is given in parentheses.
Prediction results for all-against-all gapless cross-threading of 333 proteins
| Residue pair scoring function | Percentile rank of native sequence | Median percentile rank | ||
|---|---|---|---|---|
| 0.001% | 0.01% | 0.1% | ||
| 118 (35.4%) | 209 (62.8%) | 261 (78.4%) | 2.47 × 10-3 | |
| 121 (36.3%) | 240 (72.1%) | 286 (85.9%) | 1.98 × 10-3 | |
| 125 (37.5%) | 247 (74.2%) | 293 (88.0%) | 1.66 × 10-3 | |
The number of structures for which the percentile rank of the native sequence was less than or equal to the indicated cutoffs are given in columns 2-4 and the median percentile rank is given in the last column. The corresponding percentage of the total structures is given in parentheses. The threading solutions were ranked using the indicated residue pair scoring function.
Comparison of cross-threading results for 78 contacting transmembrane helix pairs using the 6D potential and optionally including homolog sequences or a membrane depth-dependent residue potential
| Include homolog sequences? | Include depth-dependent score | Percentile rank of native sequence | Median Percentile Rank | ||
|---|---|---|---|---|---|
| 1% | 5% | 10% | |||
| No | No | 15 (19%) | 29 (37%) | 41 (53%) | 8.4% |
| No | Yes | 16 (21%) | 31 (40%) | 43 (55%) | 8.0% |
| Yes | No | 22 (28%) | 38 (49%) | 49 (63%) | 6.0% |
| Yes | Yes | 22 (28%) | 39 (50%) | 56 (72%) | 5.1% |
The backbone structure of each helix pair was used as a template and the amino acid sequences of all helix pairs were threaded into the structure in both possible helix correspondences, (A→ A', B→ B') and (A→ B', B→ A'), and in all possible ungapped alignments. The threading solutions for each template structure were ranked using the appropriate score and the percentile rank of the correct native sequence calculated. These results show that both including the homolog sequences and including the depth-dependent score improve the threading accuracy.
Results for the same quantities shown in Table 3, except only for the 27 transmembrane helix pair structures that have at least 15 inter-helix residue contacts
| Include homolog sequences? | Include depth-dependent score | Percentile rank of native sequence | Median Percentile Rank | ||
|---|---|---|---|---|---|
| 1% | 5% | 10% | |||
| No | No | 9 (33%) | 13 (48%) | 20 (74%) | 6.7% |
| No | Yes | 9 (33%) | 15 (56%) | 20 (74%) | 3.6% |
| Yes | No | 10 (37%) | 13 (48%) | 20 (74%) | 5.3% |
| Yes | Yes | 11 (41%) | 16 (59%) | 22 (81%) | 2.3% |
Comparison with the results in Table 3 show that the threading accuracy is higher for this subset of helix pairs with large interfaces, because of the stronger signal resulting from more inter-helix residue contacts.
Similarities between the optimal and native sequences (intrastructure) and similarities between the optimal sequences for a pair of proteins in the same HOMSTRAD family (interstructure)
| All residues | Core residues only | |||||
|---|---|---|---|---|---|---|
| All structures | RMSD < 2.5 Å | RMSD ≥ 2.5 Å | All structures | RMSD < 2.5 Å | RMSD ≥ 2.5 Å | |
| BP median %ID to | 13.4% | 13.5% | 13.2% | 17.6% | 17.6% | 17.8% |
| native | ||||||
| ROSETTA median | 25.9% | 26.5% | 23.6% | 35.6% | 36.2% | 33.2% |
| %ID to native | ||||||
| BP median | 22.4% | 24.0% | 17.7% | 29.4% | 30.8% | 25.3% |
| interstructure %ID | ||||||
| ROSETTA median | 22.8% | 25.4% | 17.2% | 29.8% | 33.6% | 21.4% |
| interstructure %ID | ||||||
| BP interstructure | 183 (45%) | 118 (41%) | 65 (56%) | 182 (45%) | 116 (40%) | 66 (57%) |
| %ID > ROSETTA | ||||||
| interstructure %ID | ||||||
| BP interstructure | 222 (55%) | 171 (59%) | 51 (44%) | 203 (50%) | 161 (55%) | 42 (36%) |
| %ID < ROSETTA | ||||||
| interstructure %ID | ||||||
| BP interstructure | 2 (0.49%) | 2 (0.70%) | 0 (0%) | 22 (5.4%) | 14 (4.8%) | 8 (6.9%) |
| %ID = ROSETTA | ||||||
| interstructure %ID | ||||||
The similarities were calculated as percent sequence identity (%ID). The optimal sequences were calculated using the 6D residue pair scoring functions with Belief Propagation (BP) and using the ROSETTA program. The last three rows give the number of structure pairs for which the BP sequence similarity was less than, greater than, or equal to the similarity calculated using ROSETTA. The calculations were performed for a total of 407 structure pairs, each in a different HOMSTRAD protein family.