| Literature DB >> 26978354 |
Chen Cao1,2, Guishen Wang3,4, An Liu5, Shutan Xu6,7, Lincong Wang8,9, Shuxue Zou10,11.
Abstract
The assignment of secondary structure elements in proteins is a key step in the analysis of their structures and functions. We have developed an algorithm, SACF (secondary structure assignment based on Cα fragments), for secondary structure element (SSE) assignment based on the alignment of Cα backbone fragments with central poses derived by clustering known SSE fragments. The assignment algorithm consists of three steps: First, the outlier fragments on known SSEs are detected. Next, the remaining fragments are clustered to obtain the central fragments for each cluster. Finally, the central fragments are used as a template to make assignments. Following a large-scale comparison of 11 secondary structure assignment methods, SACF, KAKSI and PROSS are found to have similar agreement with DSSP, while PCASSO agrees with DSSP best. SACF and PCASSO show preference to reducing residues in N and C cap regions, whereas KAKSI, P-SEA and SEGNO tend to add residues to the terminals when DSSP assignment is taken as standard. Moreover, our algorithm is able to assign subtle helices (310-helix, π-helix and left-handed helix) and make uniform assignments, as well as to detect rare SSEs in β-sheets or long helices as outlier fragments from other programs. The structural uniformity should be useful for protein structure classification and prediction, while outlier fragments underlie the structure-function relationship.Entities:
Keywords: Cα backbone fragment; cluster; outlier detection; protein; secondary structure assignment
Mesh:
Substances:
Year: 2016 PMID: 26978354 PMCID: PMC4813195 DOI: 10.3390/ijms17030333
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
The agreement (%) of eleven programs on set T. The agreement percentage was computed using Q3 score.
| Method | Dssp | Stride | P-sea | Kaksi | Disicl | Palsse | Segno | Pross | Xtlsstr | Pcasso |
|---|---|---|---|---|---|---|---|---|---|---|
| Sacf | 84.7 | 85.1 | 81.8 | 82.6 | 76.9 | 68.4 | 80.5 | 83.1 | 76.1 | 84.3 |
| Dssp | 95.0 | 80.9 | 83.5 | 78.9 | 72.9 | 83.0 | 84.3 | 77.2 | 93.5 | |
| Stride | 81.1 | 84.1 | 78.4 | 73.6 | 82.5 | 84.8 | 80.2 | 92.0 | ||
| P-sea | 82.3 | 78.3 | 68.8 | 85.9 | 86.2 | 74.4 | 82.1 | |||
| Kaksi | 74.8 | 77.5 | 80.5 | 82.9 | 78.5 | 83.8 | ||||
| Disicl | 63.1 | 80.8 | 81.8 | 74.9 | 79.6 | |||||
| Palsse | 66.3 | 66.1 | 70.6 | 73.6 | ||||||
| Segno | 87.4 | 76.4 | 82.4 | |||||||
| Pross | 79.3 | 84.5 | ||||||||
| Xtlsstr | 79.2 |
SOV scores (%) between any two of the eleven programs on Set T for helix. For every SOV score in the table, the corresponding method in the first column is taken as the reference method.
| Method | Sacf | Dssp | Stride | P-sea | Kaksi | Disicl | Palsse | Segno | Pross | Xtlsstr | Pcasso |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Sacf | 96.6 | 94.1 | 92.6 | 92.6 | 88.3 | 81.7 | 80.1 | 91.2 | 90.3 | 95.2 | |
| Dssp | 91.3 | 93.7 | 86.0 | 88.4 | 82.9 | 81.1 | 75.8 | 86.1 | 89.2 | 94.1 | |
| Stride | 90.1 | 95.2 | 86.2 | 88.0 | 84.4 | 82.5 | 77.1 | 87.4 | 92.6 | 92.7 | |
| P-sea | 96.9 | 96.7 | 94.2 | 95.7 | 91.3 | 84.1 | 83.7 | 95.2 | 91.6 | 96.5 | |
| Kaksi | 93.8 | 96.0 | 93.4 | 92.6 | 84.7 | 86.3 | 79.1 | 92.8 | 91.6 | 95.0 | |
| Disicl | 87.3 | 89.9 | 89.6 | 85.6 | 85.7 | 72.8 | 80.0 | 87.6 | 85.6 | 89.6 | |
| Palsse | 60.3 | 62.4 | 63.1 | 63.7 | 67.1 | 47.8 | 50.5 | 62.2 | 69.0 | 59.7 | |
| Segno | 92.9 | 94.1 | 93.2 | 92.3 | 91.5 | 94.4 | 76.8 | 93.5 | 89.5 | 94.1 | |
| Pross | 95.9 | 97.4 | 97.5 | 95.6 | 96.7 | 93.9 | 83.1 | 86.2 | 93.9 | 97.1 | |
| Xtlsstr | 82.7 | 86.8 | 89.1 | 81.5 | 83.9 | 76.7 | 85.6 | 71.7 | 81.9 | 84.5 | |
| Pcasso | 90.9 | 96.4 | 93.2 | 87.6 | 89.5 | 84.7 | 80.3 | 77.4 | 87.6 | 89.4 |
SOV scores (%) between any two of eleven methods on Set T for β-sheet. For every SOV score in the table, the corresponding method in the first column is taken as the reference method.
| Method | Sacf | Dssp | Stride | P-sea | Kaksi | Disicl | Palsse | Segno | Pross | Xtlsstr | Pcasso |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Sacf | 86.0 | 85.4 | 78.7 | 86.0 | 78.3 | 68.9 | 80.9 | 78.6 | 71.3 | 87.1 | |
| Dssp | 81.2 | 97.0 | 78.0 | 88.0 | 70.8 | 73.1 | 80.4 | 77.2 | 71.3 | 89.2 | |
| Stride | 79.4 | 96.7 | 87.3 | 70.2 | 73.3 | 80.2 | 75.7 | 70.7 | 87.9 | ||
| P-sea | 78.7 | 78.9 | 78.6 | 83.2 | 77.2 | 70.8 | 87.0 | 79.0 | 68.7 | 80.6 | |
| Kaksi | 84.5 | 92.0 | 91.6 | 83.8 | 77.0 | 76.9 | 86.4 | 80.6 | 73.0 | 91.9 | |
| Disicl | 64.6 | 68.6 | 68.7 | 71.0 | 69.6 | 53.4 | 75.7 | 72.8 | 65.3 | 70.8 | |
| Palsse | 45.7 | 51.3 | 51.6 | 50.4 | 52.1 | 35.8 | 48.9 | 43.4 | 43.0 | 47.5 | |
| Segno | 75.8 | 79.8 | 79.9 | 82.5 | 82.3 | 80.9 | 68.3 | 81.5 | 72.9 | 81.2 | |
| Pross | 81.7 | 83.1 | 83.4 | 83.2 | 84.4 | 88.9 | 64.8 | 91.2 | 76.1 | 84.5 | |
| Xtlsstr | 74.9 | 77.8 | 77.9 | 73.8 | 77.2 | 79.1 | 62.9 | 82.8 | 78.0 | 77.0 | |
| Pcasso | 84.2 | 90.8 | 89.7 | 78.8 | 87.5 | 73.0 | 70.3 | 81.1 | 76.5 | 70.3 |
Figure 1The distribution of the lengths of helices (a) and β-sheets (b) from SACF and the other six methods on set T. The x-axis represents helix length (a) or β-strand length (b), while the y-axis represents the number of secondary structures of that particular length.
Discrepancies between terminals in the helices assigned by DSSP and other methods.
| Method | Same | N cap | N cap | C cap | C cap | ||||
|---|---|---|---|---|---|---|---|---|---|
| +(1–2) | +(>2) | −(1–2) | −(>2) | +(1-2) | +(>2) | −(1–2) | −(>2) | ||
| Sacf | 5194 | 1407 | 23 | 1919 | 534 | 1865 | 15 | 3142 | 578 |
| Stride | 11,388 | 990 | 34 | 332 | 80 | 801 | 60 | 401 | 62 |
| P-sea | 1639 | 4782 | 678 | 870 | 569 | 4405 | 610 | 1267 | 423 |
| Kaksi | 1761 | 5765 | 153 | 2269 | 217 | 5347 | 131 | 1737 | 92 |
| Disicl | 1310 | 4090 | 252 | 1828 | 369 | 1131 | 96 | 7306 | 587 |
| Palsse | 87 | 7423 | 726 | 121 | 59 | 7153 | 728 | 121 | 26 |
| Segno | 2734 | 5222 | 448 | 913 | 332 | 3344 | 397 | 1182 | 253 |
| Pross | 3037 | 2626 | 117 | 1638 | 796 | 2350 | 107 | 2326 | 592 |
| Xtlsstr | 803 | 5932 | 332 | 1855 | 600 | 1173 | 130 | 4023 | 857 |
| Pcasso | 5950 | 1211 | 50 | 1856 | 347 | 1795 | 35 | 2302 | 272 |
The second column shows the number of helices assigned by a given method (first column) that are identical to the helices assigned by DSSP. The third through tenth columns show the helices assigned by DSSP with at most one or two residues difference (1–2 residues) or more than two residue (>2 residues) divergence with the method in the first column. Note that a helix assigned by other methods can disagree with DSSP at both the N cap and C cap. “+”, a helix assigned by another method has more residues at the N or C cap than the helix assigned by DSSP; “−”, a helix assigned by another method has fewer residues at the N or C cap region than the helix assigned by DSSP.
Discrepancies between N and C caps in the β-sheets assigned by DSSP and other methods.
| Method | Same | N cap | N cap | C cap | C cap | ||||
|---|---|---|---|---|---|---|---|---|---|
| +(1–2) | +(>2) | −(1–2) | −(>2) | +(1–2) | +(>2) | −(1–2) | −(>2) | ||
| Sacf | 2375 | 1355 | 16 | 2218 | 535 | 1902 | 11 | 2,897 | 578 |
| Stride | 8352 | 733 | 83 | 285 | 80 | 544 | 69 | 353 | 63 |
| P-sea | 1621 | 3260 | 568 | 853 | 486 | 3267 | 473 | 1,225 | 433 |
| Kaksi | 1473 | 4138 | 71 | 2163 | 317 | 3890 | 73 | 1,638 | 195 |
| Disicl | 815 | 2720 | 182 | 1602 | 371 | 749 | 85 | 5,367 | 591 |
| Palsse | 56 | 5713 | 786 | 116 | 63 | 5513 | 781 | 114 | 28 |
| Segno | 2364 | 3753 | 384 | 851 | 337 | 2322 | 335 | 1085 | 255 |
| Pross | 2481 | 1820 | 83 | 1567 | 802 | 1544 | 84 | 2200 | 594 |
| Xtlsstr | 636 | 4447 | 275 | 1791 | 602 | 829 | 124 | 3507 | 863 |
| Pcasso | 4994 | 867 | 66 | 1267 | 348 | 973 | 48 | 1490 | 273 |
The second column shows the number of strands in β-sheets assigned by a given method (first column) that are identical to the strands assigned by DSSP. The third through tenth columns show the strands in β-sheets assigned by DSSP with at most one or two residues different (1–2 residues) or a more than two residue (>2 residues) divergence with the method in the first column. Note that strands in β-sheets assigned by other methods can disagree with DSSP at both the N cap and C cap. “+”, a strand assigned by another method has more residues at the N or C cap than the strand assigned by DSSP; “−”, a strand assigned by another method has fewer residues at the N or C cap than the strand assigned by DSSP.
Figure 2Examples of disagreement between SACF and DSSP. (a–d) show difference in helix assignment between SACF and DSSP while (e,f) illustrate the difference in β-sheet. The divergently assigned regions are shown in magenta in the top four panels and are labeled with arrows in the bottom two panels. The PDB ID and residue number are labeled in the figures, and we also provide the hydrogen bond information for (a,b) (Figure S1).
Figure 3The 5-residue-long fragments assigned by DSSP (a) and SACF (b). Three helix elements (α-helix, 310-helix and π-helix) are involved in the figure. We randomly selected 1000 fragments for the three helix elements assigned by DSSP (a) and SACF (b). As can be seen, the three helix elements assigned by SACF can be better separated compared with DSSP assignment.
The normal distribution parameters and clustering information for 21 secondary structure elements.
| SSE Name | Len | μ (Å) 1 | Σ 2 | Adj.R-Square | Total Number of SSEs | Number of Outliers | Number of Clusters | Max 3 |
|---|---|---|---|---|---|---|---|---|
| α-helix | 4 + 24 | 0.411 | 0.218 | 0.969 | 4776 | 496 | 18 | 682 |
| α-helix | 5 + 2 | 0.388 | 0.173 | 0.971 | 2842 | 349 | 25 | 276 |
| α-helix | 6 + 2 | 0.393 | 0.150 | 0.979 | 3159 | 357 | 28 | 315 |
| α-helix | 7 + 2 | 0.418 | 0.185 | 0.976 | 3578 | 326 | 33 | 383 |
| α-helix | 8 + 2 | 0.435 | 0.189 | 0.970 | 3521 | 563 | 25 | 273 |
| 310-helix | 3 + 2 | 0.303 | 0.157 | 0.980 | 15,689 | 2334 | 32 | 1830 |
| π-helix | 5 + 2 | 0.516 | 0.437 | 0.955 | 1243 | 224 | 19 | 304 |
| Left-α-helix | 4 + 2 | 1.012 | 8.004 | 0.815 | 72 | 23 | 8 | 16 |
| Left-310-helix | 3 + 2 | 0.596 | 0.239 | 0.898 | 812 | 211 | 21 | 82 |
| Parallel β-ladder | 4 | 0.352 | 0.314 | 0.987 | 62,204 | 6917 | 22 | 7821 |
| Antiparallel β-ladder | 4 | 0.427 | 0.201 | 0.989 | 97,088 | 8562 | 23 | 8787 |
| Parallel β-strand | 4 | 0.383 | 0.189 | 0.999 | 5374 | 689 | 25 | 878 |
| Parallel β-strand | 5 | 0.496 | 0.486 | 0.973 | 5846 | 858 | 28 | 664 |
| Parallel β-strand | 6 | 0.776 | 0.579 | 0.966 | 4678 | 868 | 31 | 670 |
| Parallel β-strand | 7 | 1.400 | 0.623 | 0.898 | 2608 | 419 | 28 | 385 |
| Parallel β-strand | 8 | 1.631 | 1.282 | 0.921 | 1627 | 37 | 32 | 337 |
| Antiparallel β-strand | 4 | 0.543 | 0.571 | 0.984 | 6176 | 821 | 19 | 1048 |
| Antiparallel β-strand | 5 | 0.546 | 0.671 | 0.959 | 6554 | 867 | 28 | 886 |
| Antiparallel β-strand | 6 | 1.367 | 1.825 | 0.926 | 4600 | 672 | 25 | 738 |
| Antiparallel β-strand | 7 | 1.882 | 0.817 | 0.943 | 5217 | 841 | 26 | 909 |
| Antiparallel β-strand | 8 | 1.994 | 0.824 | 0.945 | 4221 | 898 | 31 | 682 |
1 Expectation value of the dist distribution. The statistics of dist is fitted to a normal distribution while dist is the RMSD between any two of the fragments of same length (column 2) and secondary structure (column 1); 2 Variance of the dist distribution; 3 Number of fragments in the largest cluster; 4 For a DSSP assigned helix composed of n residues, we extend one residue at both N and C terminal of the helix since the two residues also form hydrogen bond with the residues in the helix, thus the finally length of the helix is n + 2.
Figure 4The clusters of α-helix bend fragments. (a–o) show 15 clusters after clustering α-helix bend fragments. The central fragments within clusters are displayed as red stick and green cartoon models, and the other fragments within clusters are displayed as green lines. We only show the odd clusters after the clusters were ordered by the number of fragments because this figure is an intuitive illustration of our algorithm.
Figure 5A histogram of the correlation between protein–ligand binding sites and two types of fragments: outlier fragments (black bar) and other fragments (white bar). (a) shows a histogram of the two types of fragments vs. a protein-ligand binding site. The x-axis is their secondary structure feature and length, while the y-axis is the probability of the secondary structure observed at the protein-ligand binding site (distance less than 4Å). Figure b shows an example illustrating the outlier poses detected at protein–ligand binding sites: for cytochrome cd1 nitrite reductase (pdb ID: 1qks), there are three outlier helix fragments (colored green, blue and cyan) around the binding site (the ligand is colored in purple). The LDOF values and residue index for the helix fragments are also labeled in figure b.