| Literature DB >> 23173051 |
Nils Woetzel1, Mert Karakaş, Rene Staritzbichler, Ralf Müller, Brian E Weiner, Jens Meiler.
Abstract
The topology of most experimentally determined protein domains is defined by the relative arrangement of secondary structure elements, i.e. α-helices and β-strands, which make up 50-70% of the sequence. Pairing of β-strands defines the topology of β-sheets. The packing of side chains between α-helices and β-sheets defines the majority of the protein core. Often, limited experimental datasets restrain the position of secondary structure elements while lacking detail with respect to loop or side chain conformation. At the same time the regular structure and reduced flexibility of secondary structure elements make these interactions more predictable when compared to flexible loops and side chains. To determine the topology of the protein in such settings, we introduce a tailored knowledge-based energy function that evaluates arrangement of secondary structure elements only. Based on theEntities:
Mesh:
Substances:
Year: 2012 PMID: 23173051 PMCID: PMC3500277 DOI: 10.1371/journal.pone.0049242
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Amino acid neighbor count environment potential.
A shows the transition function that is used between the lower and upper threshold in which the weight for the neighbor being considered drops from 1 (4 Å) to 0 (11.4 Å) using half of a cosine function. B shows the neighbor count energy potential for all 20 amino acids with their three letter code.
Figure 2Amino acid pair distance potentials.
In A the idealized structure of 1ubi with Cβ and Hα2 atoms is shown with the distances between ILE 32 and LEU 56 (4.7 Å) and between LYS 11 and GLU 34 (8.3 Å). B shows selected amino acid pair distance potentials for Trp-Trp as an example for π-stacking interaction, ILE-LEU as an example for vdW apolar interaction, ARG-GLU as an example for Coulomb attraction, and Arg-Lys as an example for Coulomb repulsion.
Figure 3Loop closure potential.
A describes two β-strands connected by a loop characterized by the Euclidean distance between the two ends and the number of residues in the loop connecting those two ends. B describes the derived energy potential, where the energy is a function of the number of residues in the loop and the Euclidean distance between the ends of the main axes.
Figure 4SSE Fragment packing.
SSE fragments are shown with their geometric packing descriptors. A α1 and α2 are orthogonal, if the shortest connection between the main axes is orthogonal. B connection is not orthogonal, since the minimal interface length m cannot be achieved. C θ is the twist angle around the shortest connection – which is equivalent to the dihedral angle between main axis 1 – shortest connection – main axis 2. D ω is the offset from the optimal expected position for a helix-strand interaction, if it is 0°, the helix is on top of the strand, if it is 90°, the helix would interact with the backbone of the strand. ω1 and ω2 are the offsets for a strand-strand packing – for omegas close to 90°, it is a strand backbone pairing interaction dominated by hydrogen bond interaction within a sheet, if they are close to 0°, it is dominated by side chain interactions like seen in sheet-sandwiches. E every SSE is represented as multiple fragments and the SSE interaction is described by the list of all fragment interactions, leaving out additional fragments of the longer SSE with suboptimal packing (bottom grey helix fragment).
Figure 5Strand pairing and SSE packing potential.
Shown are all secondary structure element packing potentials with their schematic shortest connections, twist angle and their derived potentials. A shows the β-Strand-β-Strand pairing potential with prominent distance of 4.75 Å and angles of −15° and 165°. B shows the α-Helix-α-Helix packing with preferred packing distance of 10 Å and the preferred parallel angle of −45° and the anti-parallel packing of 135°. C shows the β-Sheet-β-Sheet packing potential with a preferred distance 10 Å and angles of −30° and 150 °. D shows the α-Helix-β-Sheet packing with its packing distance around 10 Å and an anti-parallel angle of 150°–180°.
Figure 6Contact order and square radius of gyration potential.
A Fold complexity is represented by the contact order potential. The potential is given as the likelihood to observe a contact order to number of residues ratio in the model. B Statistics for the square radius of gyration over the number of residues were directly collected in a histogram and converted into a potential.
Mean and standard deviation of predicted probabilities.
|
|
|
|
|
|
| |
| JUFO | 0.67 | 0.21 | 0.58 | 0.24 | 0.59 | 0.18 |
| PSIPRED | 0.76 | 0.20 | 0.71 | 0.27 | 0.73 | 0.21 |
For secondary structure prediction (JUFO and PSIPRED) and secondary structure type, the predicted probabilities are averaged and a standard deviation is derived.
Enrichment of sets of protein models.
| RMSD100<8 Å | total | amino acid clash | amino acid distance | amino acid neighbor count | contact order | loop length | loop closure | radius of gyration | SSE clash | SSE packing | strand pairing | SSPred JUFO | SSPred PSIPRED | sum | |||
| all | rosetta | 18 |
| 72 | 56 | 44 |
|
| 61 |
| 100 | 33 | 56 | 78 | 67 | ||
| perturbation | 53 | 100 | 98 | 96 | 21 | 94 | 98 | 49 | 96 | 89 | 57 | 47 | 60 | 77 | |||
| fold | 14 | 64 | 57 | 29 | 29 | 64 | 79 | 29 | 36 | 29 | 0 | 29 | 29 | 43 | |||
| α-helical | rosetta | 12 |
| 83 | 58 | 42 |
|
| 58 |
| 100 |
| 50 | 67 | 58 | ||
| perturbation | 24 | 100 | 96 | 92 | 17 | 92 | 100 | 58 | 92 | 75 |
| 42 | 46 | 63 | |||
| fold | 10 | 60 | 70 | 30 | 30 | 50 | 80 | 20 | 30 | 40 |
| 40 | 40 | 50 | |||
| β-sheet | rosetta | 3 |
| 67 | 33 | 100 |
|
| 33 |
| 100 | 67 | 33 | 100 | 67 | ||
| perturbation | 8 | 100 | 100 | 100 | 38 | 100 | 100 | 50 | 100 | 100 | 100 | 25 | 25 | 75 | |||
| fold | 3 | 67 | 33 | 33 | 33 | 100 | 67 | 67 | 67 | 0 | 0 | 0 | 0 | 33 | |||
| α/β | rosetta | 3 |
| 33 | 67 | 0 |
|
| 100 |
| 100 | 67 | 100 | 100 | 100 | ||
| perturbation | 21 | 100 | 100 | 100 | 19 | 95 | 95 | 38 | 100 | 100 | 100 | 62 | 90 | 95 | |||
| fold | 1 | 100 | 0 | 0 | 0 | 100 | 100 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |||
| ≤150 AA | rosetta | 12 |
| 92 | 58 | 50 |
|
| 58 |
| 100 | 25 | 42 | 75 | 67 | ||
| perturbation | 17 | 100 | 94 | 100 | 29 | 100 | 100 | 76 | 94 | 82 | 47 | 41 | 41 | 88 | |||
| fold | 9 | 67 | 44 | 22 | 22 | 78 | 89 | 22 | 22 | 11 | 0 | 22 | 22 | 22 | |||
| >150 AA | rosetta | 6 |
| 33 | 50 | 33 |
|
| 67 |
| 100 | 50 | 83 | 83 | 67 | ||
| perturbation | 36 | 100 | 100 | 94 | 17 | 92 | 97 | 36 | 97 | 92 | 61 | 50 | 69 | 72 | |||
| fold | 5 | 60 | 80 | 40 | 40 | 40 | 60 | 40 | 60 | 60 | 0 | 40 | 40 | 80 | |||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |||
| all | rosetta | 30 |
| 53 | 70 | 7 |
|
| 67 |
| 83 | 47 | 63 | 80 | 80 | ||
| perturbation | 52 | 71 | 75 | 94 | 35 | 87 | 79 | 40 | 62 | 98 | 60 | 71 | 87 | 94 | |||
| fold | 18 | 39 | 61 | 44 | 33 | 61 | 50 | 33 | 22 | 56 | 11 | 39 | 56 | 83 | |||
| α-helical | rosetta | 17 |
| 59 | 59 | 12 |
|
| 47 |
| 82 |
| 41 | 76 | 65 | ||
| perturbation | 24 | 54 | 58 | 92 | 46 | 71 | 63 | 42 | 38 | 100 |
| 79 | 92 | 96 | |||
| fold | 13 | 38 | 69 | 46 | 31 | 54 | 54 | 23 | 8 | 54 |
| 38 | 54 | 77 | |||
| β-sheet | rosetta | 5 |
| 40 | 80 | 0 |
|
| 80 |
| 80 | 80 | 80 | 80 | 100 | ||
| perturbation | 8 | 63 | 75 | 88 | 38 | 100 | 88 | 50 | 75 | 100 | 100 | 75 | 75 | 88 | |||
| fold | 2 | 0 | 50 | 50 | 100 | 100 | 50 | 50 | 50 | 0 | 50 | 50 | 50 | 100 | |||
| α/β | rosetta | 8 |
| 50 | 88 | 0 |
|
| 100 |
| 88 | 88 | 100 | 88 | 100 | ||
| perturbation | 20 | 95 | 95 | 100 | 20 | 100 | 95 | 35 | 85 | 95 | 100 | 60 | 85 | 95 | |||
| fold | 3 | 67 | 33 | 33 | 0 | 67 | 33 | 67 | 67 | 100 | 33 | 33 | 67 | 100 | |||
| ≤150 AA | rosetta | 12 |
| 42 | 42 | 8 |
|
| 50 |
| 58 | 33 | 67 | 75 | 75 | ||
| perturbation | 17 | 41 | 53 | 88 | 53 | 76 | 47 | 53 | 35 | 94 | 59 | 65 | 82 | 100 | |||
| fold | 11 | 27 | 64 | 55 | 36 | 73 | 45 | 36 | 27 | 36 | 9 | 45 | 64 | 82 | |||
| >150 AA | rosetta | 18 |
| 61 | 89 | 6 |
|
| 78 |
| 100 | 56 | 61 | 83 | 83 | ||
| perturbation | 35 | 86 | 86 | 97 | 26 | 91 | 94 | 34 | 74 | 100 | 60 | 74 | 89 | 91 | |||
| fold | 7 | 57 | 57 | 29 | 29 | 43 | 57 | 29 | 14 | 86 | 14 | 29 | 43 | 86 | |||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |||
| all | rosetta | 20 |
| 50 | 80 | 30 |
|
| 65 |
| 85 | 50 | 65 | 75 | 75 | ||
| perturbation | 25 | 64 | 68 | 100 | 32 | 100 | 92 | 52 | 52 | 100 | 52 | 68 | 88 | 100 | |||
| fold | 35 | 51 | 83 | 69 | 20 | 71 | 60 | 23 | 43 | 69 | 40 | 26 | 34 | 46 | |||
| α-helical | rosetta | 11 |
| 64 | 73 | 36 |
|
| 55 |
| 82 |
| 45 | 55 | 64 | ||
| perturbation | 12 | 33 | 42 | 100 | 33 | 100 | 100 | 67 | 25 | 100 |
| 75 | 92 | 100 | |||
| fold | 12 | 42 | 75 | 58 | 25 | 50 | 58 | 17 | 33 | 58 |
| 42 | 50 | 42 | |||
| β-sheet | rosetta | 4 |
| 50 | 100 | 50 |
|
| 50 |
| 75 | 100 | 75 | 100 | 75 | ||
| perturbation | 3 | 67 | 67 | 100 | 33 | 100 | 67 | 67 | 67 | 100 | 100 | 67 | 67 | 100 | |||
| fold | 8 | 38 | 88 | 75 | 25 | 88 | 63 | 13 | 63 | 63 | 63 | 13 | 13 | 38 | |||
| α/β | rosetta | 5 |
| 20 | 80 | 0 |
|
| 100 |
| 100 | 100 | 100 | 100 | 100 | ||
| perturbation | 10 | 100 | 100 | 100 | 30 | 100 | 90 | 30 | 80 | 100 | 100 | 60 | 90 | 100 | |||
| fold | 15 | 67 | 87 | 73 | 13 | 80 | 60 | 33 | 40 | 80 | 53 | 20 | 33 | 53 | |||
| ≤150 AA | rosetta | 13 |
| 62 | 85 | 38 |
|
| 62 |
| 85 | 38 | 46 | 69 | 69 | ||
| perturbation | 11 | 55 | 64 | 100 | 36 | 100 | 91 | 73 | 45 | 100 | 36 | 55 | 82 | 100 | |||
| fold | 14 | 36 | 79 | 71 | 21 | 71 | 71 | 21 | 36 | 50 | 21 | 29 | 43 | 43 | |||
| >150 AA | rosetta | 7 |
| 29 | 71 | 14 |
|
| 71 |
| 86 | 71 | 100 | 86 | 86 | ||
| perturbation | 14 | 71 | 71 | 100 | 29 | 100 | 93 | 36 | 57 | 100 | 64 | 79 | 93 | 100 | |||
| fold | 21 | 62 | 86 | 67 | 19 | 71 | 52 | 24 | 48 | 81 | 52 | 24 | 29 | 48 | |||
For each score and benchmark set, the percentage of protein model sets that had significant improvement in enrichment (Z-score>1.0) for each of the knowledge based potentials are displayed. Three classifications for native-like models were used (RMSD, GDT_TS and CR12), and protein model sets have been classified as α with #helices ≥ 2, as β with #strands ≥ 2, and αβ if both conditions are fulfilled. Proteins were also classified as small when having ≤ 150 amino acids. Cells with bold percentages highlight the cases where for more protein model sets a significant improvement in enrichment was achieved versus worsening. Cells in italic are discussed and expected to not enrich the respective model set (for discussion, please see text).
Weight set for consensus scoring function.
| AA distance | AA neighbor | loop length | Radius of gyration | Loop closure | AA pair clash | SSE clash | SSE packing | Strand pairing | Contact Score | SSPred JUFO | SSPred PSIPRED |
| 0.35 | 50 | 10 | 5 | 500 | 500 | 500 | 8 | 20 | 0.5 | 5 | 20 |
Monte Carlo optimization maximized the enrichment over the Rosetta model set. Loop closure, AA pair and SSE clash weights were set to 500. This weight set was used to calculate the score sum, as used to calculate enrichments for the benchmark set.
Ranking of native structure within different decosy‘r’us model sets.
| set | Pdb-Chain | DFIRE | R | ModPipe-Pair | ModPipe-Surf | ModPipe-Comb | DOPE | BCL::Score |
| fisa | 1fc2 | 254 | 158 | 491 | 1 | 453 | 375 | 480 |
| fisa | 1hdd-C | 1 | 90 | 293 | 18 | 135 | 1 | 60 |
| fisa | 2cro | 1 | 26 | 11 | 146 | 19 | 1 | 1 |
| fisa | 4icb | 1 | 1 | 196 | 2 | 167 | 1 | 1 |
|
|
|
|
|
|
|
|
| |
| fisa_casp3 | 1bg8-A | 1 | 1068 | 1 | 1180 | 282 | 1 | 9 |
| fisa_casp3 | 1bl0 | 1 | 960 | 4 | 912 | 86 | 1 | 246 |
| fisa_casp3 | 1jwe | 1 | 1177 | 1 | 1119 | 6 | 1 | 1 |
|
|
|
|
|
|
|
|
| |
| lmds | 1b0n-B | 430 | 300 | 56 | 186 | 18 | 34 | 182 |
| lmds | 1bba | 501 | 174 | 501 | 117 | 444 | 501 | 469 |
| lmds | 1fc2 | 501 | 291 | 325 | 54 | 222 | 476 | 501 |
| lmds | 1ctf | 1 | 1 | 1 | 1 | 1 | 1 | 12 |
| lmds | 1dtk | 1 | 9 | 4 | 1 | 1 | 1 | 4 |
| lmds | 1igd | 1 | 1 | 1 | 3 | 1 | 1 | 1 |
| lmds | 1shf-A | 1 | 5 | 24 | 18 | 7 | 1 | 2 |
| lmds | 2cro | 1 | 2 | 4 | 28 | 12 | 1 | 1 |
| lmds | 2ovo | 1 | 29 | 5 | 8 | 2 | 1 | 1 |
| lmds | 4pti | 1 | 4 | 1 | 44 | 1 | 1 | 3 |
|
|
|
|
|
|
|
|
| |
| lattice_ssfit | 1beo | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| lattice_ssfit | 1ctf | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| lattice_ssfit | 1dtk-A | 1 | 1 | 1 | 35 | 1 | 1 | 1 |
| lattice_ssfit | 1fca | 1 | 1 | 1 | 4 | 1 | 1 | 1 |
| lattice_ssfit | 1nkl | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| lattice_ssfit | 1pgb | 1 | 1 | 1 | 3 | 1 | 1 | 1 |
| lattice_ssfit | 1trl-A | 1 | 45 | 1 | 123 | 1 | 1 | 6 |
| lattice_ssfit | 4icb | 1 | 1 | 1 | 3 | 1 | 1 | 1 |
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
|
For different model sets from “decoys‘r’us” [39], the rank of the native structure, using different energy potentials, was determined. Ranks for DFIRE through DOPE were copied from the “DOPE” publication [20]. For each different model set, the number of sets for which the native was ranked 1st was counted and reported. In brackets ranks among the top 10 were counted as correct.
Enrichment of native like structures within the moulder decoy set.
| pdbid | RMSD criteria [Å] | DFIRE | R | ModPipe-Pair | ModPipe-Surf | ModPipe-Comb | DOPE | BCL::Score |
|
| 3.53 | 7.00 | 7.33 | 5.00 | 8.33 | 7.33 | 8.67 | 4.22 |
|
| 5.83 | 6.33 | 8.00 | 5.00 | 5.33 | 7.00 | 7.67 | 3.45 |
|
| 7.81 | 5.00 | 7.00 | 4.33 | 6.67 | 5.67 | 5.00 | 3.36 |
|
| 11.36 | 4.33 | 3.00 | 4.00 | 3.00 | 3.33 | 4.00 | 2.44 |
|
| 4.69 | 5.33 | 5.67 | 4.33 | 5.67 | 5.00 | 5.67 | 4.48 |
|
| 3.52 | 5.33 | 4.33 | 5.00 | 4.67 | 5.33 | 6.67 | 2.93 |
|
| 9.34 | 5.00 | 7.00 | 5.00 | 6.67 | 6.00 | 6.00 | 3.85 |
|
| 10.77 | 8.00 | 7.00 | 8.00 | 8.33 | 9.00 | 8.67 | 2.93 |
|
| 5.08 | 5.33 | 3.33 | 2.67 | 3.00 | 2.33 | 5.33 | 3.19 |
|
| 3.26 | 7.67 | 6.00 | 4.33 | 6.33 | 6.00 | 7.67 | 4.40 |
|
| 4.83 | 8.00 | 6.33 | 7.33 | 7.67 | 7.67 | 8.67 | 4.57 |
|
| 3.60 | 7.33 | 6.67 | 6.67 | 7.67 | 7.00 | 7.67 | 4.05 |
|
| 5.47 | 4.67 | 6.00 | 5.00 | 5.67 | 6.67 | 4.00 | 3.45 |
|
| 3.80 | 5.67 | 5.33 | 3.33 | 4.33 | 4.33 | 5.00 | 3.53 |
|
| 5.06 | 7.33 | 7.00 | 7.00 | 6.67 | 7.33 | 6.33 | 3.71 |
|
| 3.52 | 4.00 | 5.00 | 2.67 | 4.33 | 3.67 | 4.33 | 4.20 |
|
| 3.89 | 6.33 | 5.33 | 6.33 | 7.33 | 6.67 | 7.33 | 4.05 |
|
| 6.13 | 6.67 | 4.67 | 4.00 | 4.00 | 4.00 | 6.00 | 3.36 |
|
| 14.50 | 5.00 | 6.00 | 5.67 | 4.33 | 5.67 | 5.00 | 4.05 |
|
| 4.17 | 5.67 | 5.33 | 4.00 | 4.67 | 5.67 | 5.33 | 3.65 |
|
| 6.00 | 5.82 | 4.98 | 5.73 | 5.78 | 6.25 | 3.69 | |
|
| 5.67 | 6.00 | 5.00 | 5.67 | 5.84 | 6.00 | 3.68 | |
|
| 4.00 | 3.00 | 2.67 | 3.00 | 2.33 | 4.00 | 2.44 | |
|
| 8.00 | 8.00 | 8.00 | 8.33 | 9.00 | 8.67 | 4.57 |
For different model sets of the “moulder” decoy set [20], 10% enrichments were calculated. The 10% enrichment for different model sets also implies different RMSD cutoffs. The average, median, minimum, and maximum enrichment over the model sets is reported.