| Literature DB >> 20025785 |
Petros Kountouris1, Jonathan D Hirst.
Abstract
BACKGROUND: The prediction of the secondary structure of a protein is a critical step in the prediction of its tertiary structure and, potentially, its function. Moreover, the backbone dihedral angles, highly correlated with secondary structures, provide crucial information about the local three-dimensional structure.Entities:
Mesh:
Substances:
Year: 2009 PMID: 20025785 PMCID: PMC2811710 DOI: 10.1186/1471-2105-10-437
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Schematic representation of our method. Firstly, the PSSM-only predictions are calculated. Then, they are used to augment the input vector and enhance the results.
The secondary structure prediction for CB513 dataset after three iterations.
| CB513 | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| PSSM-only | 0 | 78.3 | 80.3 | 66.7 | 82.6 | ||||||||
| EM | 2 | 79.3 | 82.3 | 68.5 | 82.5 | 78.3 | 81.1 | 68.0 | 81.2 | 79.0 | 83.0 | 69.1 | 81.0 |
| 3 | 79.4 | 82.6 | 68.7 | 82.3 | 78.3 | 81.3 | 68.0 | 81.0 | 79.1 | 83.8 | 68.9 | 80.6 | |
| 4 | 79.5 | 82.8 | 68.6 | 82.4 | 78.3 | 81.3 | 68.0 | 81.0 | 79.2 | 83.8 | 69.0 | 80.8 | |
| 5 | 79.9 | 83.4 | 69.2 | 82.7 | 78.4 | 81.2 | 68.2 | 81.1 | 79.5 | 83.6 | 69.3 | 81.4 | |
| 6 | 80.0 | 83.7 | 68.9 | 82.8 | 78.2 | 81.2 | 68.2 | 80.8 | 79.5 | 83.6 | 69.3 | 81.3 | |
| 7 | 78.2 | 81.4 | 67.9 | 80.9 | 79.4 | 83.6 | 69.3 | 81.3 | |||||
| 8 | 79.8 | 82.9 | 68.6 | 83.1 | 78.1 | 81.7 | 67.8 | 80.2 | 79.4 | 84.2 | 69.3 | 81.6 | |
| 9 | 79.9 | 82.8 | 69.2 | 83.2 | 78.2 | 81.6 | 67.9 | 80.7 | 79.3 | 84.3 | 69.5 | 80.3 | |
| 10 | 79.8 | 83.0 | 68.5 | 83.3 | 78.3 | 81.7 | 68.2 | 80.6 | 79.4 | 83.7 | 69.7 | 80.9 | |
| 11 | 79.6 | 82.4 | 68.5 | 83.2 | 78.2 | 81.6 | 67.9 | 80.5 | 79.5 | 83.6 | 69.6 | 81.4 | |
| 12 | 79.9 | 82.7 | 68.7 | 83.5 | 78.2 | 81.8 | 67.7 | 80.3 | 79.5 | 83.6 | 69.5 | 81.4 | |
| k-Means | 2 | 79.3 | 82.1 | 68.8 | 82.5 | 78.2 | 81.1 | 68.2 | 81.1 | 74.7 | 84.4 | 62.2 | 73.1 |
| 3 | 79.6 | 82.8 | 69.2 | 82.4 | 78.2 | 81.3 | 68.0 | 81.1 | 74.8 | 85.0 | 61.8 | 72.9 | |
| 4 | 79.9 | 83.4 | 68.8 | 82.7 | 78.2 | 81.4 | 67.9 | 81.0 | 79.3 | 83.7 | 68.6 | 81.3 | |
| 5 | 79.9 | 83.3 | 69.1 | 82.7 | 78.2 | 81.5 | 67.8 | 80.9 | 79.2 | 83.9 | 68.8 | 80.7 | |
| 6 | 79.9 | 83.4 | 68.6 | 82.9 | 78.1 | 81.5 | 67.8 | 80.8 | 79.0 | 83.5 | 68.8 | 80.6 | |
| 7 | 79.9 | 83.3 | 67.9 | 83.3 | 78.0 | 81.6 | 67.7 | 80.5 | 79.2 | 83.7 | 68.8 | 80.8 | |
| 8 | 79.7 | 82.9 | 68.3 | 83.1 | 78.0 | 81.6 | 67.5 | 80.5 | 79.3 | 83.8 | 68.8 | 80.9 | |
| 9 | 79.8 | 83.4 | 67.7 | 83.2 | 78.0 | 81.7 | 67.4 | 80.5 | 79.3 | 83.6 | 68.4 | 81.4 | |
| 10 | 79.7 | 82.8 | 67.7 | 83.4 | 78.0 | 81.7 | 67.5 | 80.5 | 79.3 | 83.5 | 68.3 | 81.6 | |
| 11 | 79.8 | 83.0 | 69.0 | 82.9 | 78.0 | 81.6 | 67.6 | 80.4 | 79.2 | 83.9 | 68.9 | 80.7 | |
| 12 | 79.7 | 83.2 | 68.1 | 83.1 | 78.1 | 81.0 | 68.0 | 81.1 | 79.2 | 82.9 | 68.7 | 81.5 | |
The accuracy from the initial PSSM-only prediction is shown in the first row. In bold are the most accurate predictions based on Q3. NC = number of clusters used to predict dihedral angles, DHR = input vector augmented by predicted dihedral cluster,
Figure 2Clustering of the dihedral angles using EM clustering with seven clusters (left) and the distribution of secondary structure in every cluster (right).
Comparison of cross-validated predictive accuracy on CB513 dataset with other secondary structure methods.
| Method | Q3 (%) | Q | Q | Q | C | C | C |
|---|---|---|---|---|---|---|---|
| DISSPred | 80.0 ± 0.5 | 83.3 | 69.0 | 83.1 | 0.77 | 0.68 | 0.62 |
| PSIPRED | 78.2 | N/A | N/A | N/A | N/A | N/A | N/A |
| PHD | 74.7 | N/A | N/A | N/A | N/A | N/A | N/A |
| DESTRUCT | 79.4 | N/A | N/A | N/A | N/A | N/A | N/A |
| YASSPP | 77.8 | N/A | N/A | N/A | 0.71 | 0.64 | 0.58 |
| PMSVM | 75.2 | 80.4 | 71.5 | 72.8 | 0.71 | 0.61 | 0.61 |
| SVMfreq | 73.5 | 75.0 | 60.0 | 79.0 | 0.65 | 0.53 | 0.54 |
| SVMpsi | 76.6 | 78.1 | 65.6 | 81.1 | 0.68 | 0.60 | 0.56 |
The results for PSIPRED, PHD and DESTRUCT were obtained from reference [8].
Prediction of the two main types of secondary structure: helix and strand.
| CB513 | ||
|---|---|---|
| Q>65 | 83.7% | 72.6% |
| Short ( | 65.6% | 74.8% |
| Med (8 < | 94.4% | 57.1% |
| Long ( | 97.3% | 27.3% |
| N-term res | 73.4% | 62.7% |
| C-term res | 62.5% | 59.1% |
l: length of the secondary structure element
Q>65: the percentage of elements that have more than 65% of their residues predicted correctly
Short: Q>65 of elements with length up to eight residues
Med: Q>65 of elements with length between nine and 15 residues
Long: Q>65 of elements with length more than 15 residues
N-term res: the percentage of elements whose first residue (N-terminal) is predicted correctly
C-term res: the percentage of elements whose last residue (C-terminal) is predicted correctly.
The cross-validated accuracy of dihedral prediction on CB513 dataset.
| CB513 | ||||||||
|---|---|---|---|---|---|---|---|---|
| 2 | 81.4 | 81.7 | 81.8 | 82.1 | 83.2 | 83.4 | 81.8 | 83.5 |
| 3 | 79.3 | 79.6 | 79.7 | 79.8 | 81.2 | 81.1 | 79.6 | 81.2 |
| 4 | 78.7 | 74.5 | 79.0 | 74.4 | 80.5 | 76.1 | 79.0 | 75.8 |
| 5 | 65.0 | 63.8 | 65.2 | 64.1 | 66.9 | 65.3 | 65.2 | 65.0 |
| 6 | 63.7 | 59.2 | 63.8 | 59.3 | 65.5 | 60.4 | 63.7 | 60.1 |
| 7 | 56.5 | 54.6 | 56.8 | 54.7 | 58.3 | 56.0 | 56.8 | 55.4 |
| 8 | 53.8 | 53.7 | 54.0 | 53.8 | 55.4 | 55.1 | 53.9 | 54.6 |
| 9 | 53.8 | 51.1 | 54.0 | 51.0 | 55.3 | 52.3 | 54.0 | 51.7 |
| 10 | 52.9 | 50.2 | 53.1 | 50.3 | 54.5 | 51.6 | 53.0 | 51.0 |
| 11 | 50.3 | 48.5 | 50.6 | 48.5 | 51.8 | 49.7 | 50.6 | 49.1 |
| 12 | 47.0 | 41.2 | 47.2 | 41.5 | 48.4 | 42.3 | 47.2 | 42.1 |
NC: the number of clusters,
PSSM-only: input vector with only PSSM values,
SSE: input vector augmented with predicted secondary structure elements.
Figure 3Top: the mean absolute error (MAE) after each iteration of the method for . Bottom: the percentage of predicted dihedral angles within 30° (Q30) of the real values for ψ angles (left) and ϕ angles (right).
The MAE and Q30 using six and seven clusters with EM clustering.
| CB513 | ||||
|---|---|---|---|---|
| MAE (°) | 25.8 | 25.1 | 38.5 | 38.5 |
| MAE | 12.2 | 11.3 | 22.3 | 19.7 |
| MAE | 24.1 | 25.4 | 30.7 | 33.7 |
| MAE | 38.4 | 36.9 | 56.7 | 57.3 |
| Q30 (%) | 73.4 | 75.6 | 71.3 | 70.8 |
| QH30 (%) | 90.4 | 91.9 | 88.0 | 89.5 |
| QE30 (%) | 72.9 | 72.2 | 78.3 | 76.2 |
| QC30 (%) | 59.0 | 63.3 | 53.0 | 51.8 |
The mean absolute error (MAE) and the percentage of predicted dihedral angles within 30° of the real value (Q30) for both backbone dihedral angles ϕ and ψ after two iterations, using six and seven clusters with EM clustering.
The MAE of each amino acid for ϕ angle.
| CB513 | ||||
|---|---|---|---|---|
| A | 21.0 | 9.0 | 29.2 | 36.7 |
| R | 23.0 | 9.6 | 25.2 | 38.8 |
| N | 37.4 | 16.0 | 35.4 | 48.6 |
| D | 29.1 | 11.3 | 32.6 | 38.6 |
| C | 25.8 | 14.1 | 21.5 | 37.9 |
| Q | 22.1 | 9.4 | 28.1 | 36.7 |
| E | 21.2 | 9.1 | 27.2 | 36.7 |
| G | 60.3 | 32.4 | 86.9 | 64.8 |
| H | 30.9 | 15.9 | 31.7 | 41.6 |
| I | 17.0 | 9.7 | 16.4 | 29.0 |
| L | 17.8 | 9.1 | 19.2 | 31.7 |
| K | 23.3 | 10.3 | 27.0 | 35.6 |
| M | 19.7 | 9.8 | 23.9 | 32.4 |
| F | 24.2 | 12.5 | 22.3 | 39.3 |
| P | 13.4 | 10.4 | 13.1 | 14.2 |
| S | 30.1 | 13.2 | 34.8 | 38.6 |
| T | 24.1 | 12.7 | 22.2 | 32.4 |
| W | 24.5 | 12.6 | 27.3 | 36.5 |
| Y | 25.3 | 12.5 | 24.9 | 40.2 |
| V | 18.0 | 9.6 | 17.0 | 30.1 |
The mean absolute error (MAE) of each amino acid for ϕ angle after two iterations, using seven clusters with EM clustering.
The MAE of each amino acid for ψ angle.
| CB513 - | ||||
|---|---|---|---|---|
| A | 32.9 | 18.1 | 32.8 | 57.9 |
| R | 35.3 | 17.5 | 33.4 | 59.2 |
| N | 45.5 | 23.4 | 46.3 | 56.7 |
| D | 44.8 | 23.6 | 43.0 | 57.9 |
| C | 42.9 | 37.0 | 29.1 | 58.2 |
| Q | 35.0 | 16.5 | 38.3 | 58.2 |
| E | 34.0 | 17.1 | 35.9 | 59.2 |
| G | 56.4 | 36.5 | 60.3 | 63.7 |
| H | 43.4 | 29.1 | 37.2 | 57.8 |
| I | 27.7 | 19.4 | 20.2 | 52.7 |
| L | 30.3 | 18.0 | 27.7 | 54.6 |
| K | 37.5 | 18.5 | 36.6 | 58.8 |
| M | 32.8 | 19.6 | 29.5 | 57.0 |
| F | 34.3 | 23.8 | 26.8 | 54.1 |
| P | 47.8 | 42.4 | 25.5 | 53.1 |
| S | 47.3 | 32.2 | 36.8 | 61.1 |
| T | 41.8 | 27.6 | 27.5 | 60.0 |
| W | 38.2 | 26.9 | 29.6 | 61.2 |
| Y | 37.0 | 25.6 | 31.6 | 55.5 |
| V | 29.1 | 19.8 | 21.9 | 52.3 |
The mean absolute error (MAE) of each amino acid for ψ angle after two iterations, using six clusters with EM clustering.
Per-residue predictive accuracy based on the SCOP classification of proteins in CB513 dataset.
| CB513 | |||
|---|---|---|---|
| all- | 83.6 | 84.2 | 67.3 |
| all- | 76.4 | 77.8 | 48.3 |
| 81.6 | 82.2 | 61.0 | |
| 79.2 | 81.4 | 58.4 | |
| Other | 75.9 | 77.8 | 53.3 |
| All residues | 80.0 | 81.2 | 58.3 |
The second column shows the secondary structure predictions while columns three and four show the dihedral prediction using three and seven clusters, respectively.
Secondary structure prediction on PDB-Select25 dataset.
| PDB-Select25 | ||
|---|---|---|
| Q3 (%) | 79.7 | 79.7 |
| ErrSig | 0.24 | 0.24 |
| Q | 82.3 | 82.6 |
| Q | 71.9 | 71.3 |
| Q | 81.8 | 82.1 |
| C | 0.76 | 0.76 |
| C | 0.69 | 0.69 |
| C | 0.62 | 0.62 |
| Info | 0.43 | 0.43 |
The second column shows the predictions for subset1 when SVMs are trained using subset2 and the converse is shown at the third column. Info is a measure of the per-residue information content [4].
Dihedral prediction on PDB-Select25 dataset.
| PDB-Select25 | ||||
|---|---|---|---|---|
| 2 | 82.9 | 83.0 | 82.5 | 83.1 |
| 3 | 79.0 | 79.0 | 78.9 | 79.1 |
| 4 | 74.6 | 71.5 | 74.2 | 72.1 |
| 5 | 59.8 | 57.3 | 59.5 | 57.5 |
| 6 | 58.9 | 53.4 | 58.4 | 53.5 |
| 7 | 48.6 | 48.0 | 48.5 | 47.8 |
Columns two and three show the predictions for subset1 when SVMs are trained using subset2 and the converse is shown in columns four and five.
Performance of DISSPred and other secondary structure predictors on EVA dataset.
| EVA subsets - Q3(%) | ||||
|---|---|---|---|---|
| DISSPred | 81.7 | 81.9 | 81.9 | 82.0 |
| PSIPRED | 76.8 | 77.4 | 77.3 | 77.8 |
| PHDpsi | 73.4 | 74.3 | 74.3 | 75.0 |
| PROFsec | 75.5 | 76.2 | 76.4 | 76.7 |
| SAM-T99 sec | 77.2 | 77.2 | 77.1 | N/A |
| PROFking | 71.6 | 71.7 | N/A | N/A |
| Prospect | 71.1 | N/A | N/A | N/A |
The results for the other methods were obtained from EVA secondary structure prediction server.