| Literature DB >> 23766688 |
Abstract
We propose a protein secondary structure prediction method based on position-specific scoring matrix (PSSM) profiles and four physicochemical features including conformation parameters, net charges, hydrophobic, and side chain mass. First, the SVM with the optimal window size and the optimal parameters of the kernel function is found. Then, we train the SVM using the PSSM profiles generated from PSI-BLAST and the physicochemical features extracted from the CB513 data set. Finally, we use the filter to refine the predicted results from the trained SVM. For all the performance measures of our method, Q 3 reaches 79.52, SOV94 reaches 86.10, and SOV99 reaches 74.60; all the measures are higher than those of the SVMpsi method and the SVMfreq method. This validates that considering these physicochemical features in predicting protein secondary structure would exhibit better performances.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23766688 PMCID: PMC3666292 DOI: 10.1155/2013/347106
Source DB: PubMed Journal: ScientificWorldJournal ISSN: 1537-744X
Figure 1System architecture.
Figure 2Process of feature extraction.
Conformation parameters for each amino acid in a data set.
| Amino acids | H | E | C |
|---|---|---|---|
| A | 0.49 | 0.16 | 0.35 |
| R | 0.42 | 0.19 | 0.39 |
| N | 0.27 | 0.13 | 0.6 |
| D | 0.31 | 0.11 | 0.58 |
| C | 0.26 | 0.29 | 0.45 |
| E | 0.49 | 0.15 | 0.36 |
| Q | 0.46 | 0.16 | 0.38 |
| G | 0.16 | 0.14 | 0.7 |
| H | 0.3 | 0.22 | 0.48 |
| I | 0.35 | 0.37 | 0.28 |
| L | 0.45 | 0.24 | 0.31 |
| K | 0.4 | 0.17 | 0.43 |
| M | 0.44 | 0.23 | 0.33 |
| F | 0.35 | 0.3 | 0.35 |
| P | 0.18 | 0.09 | 0.74 |
| S | 0.28 | 0.19 | 0.54 |
| T | 0.25 | 0.27 | 0.48 |
| W | 0.37 | 0.29 | 0.35 |
| Y | 0.34 | 0.3 | 0.36 |
| V | 0.3 | 0.41 | 0.29 |
Net charge of amino acids.
| Amino acids | Mass |
|---|---|
| A | 0 |
| R | +1 |
| N | 0 |
| D | −1 |
| C | 0 |
| E | −1 |
| Q | 0 |
| G | 0 |
| H | +1 |
| I | 0 |
| L | 0 |
| K | +1 |
| M | 0 |
| F | 0 |
| P | 0 |
| S | 0 |
| T | 0 |
| W | 0 |
| Y | 0 |
| V | 0 |
Hydrophobic values of amino acids.
| Amino acids | Mass |
|---|---|
| A | 1.8 |
| R | −4.5 |
| N | −3.5 |
| D | −3.5 |
| C | 2.5 |
| E | −3.5 |
| Q | −3.5 |
| G | −0.4 |
| H | −3.2 |
| I | 4.5 |
| L | 3.8 |
| K | −3.9 |
| M | 1.9 |
| F | 2.8 |
| P | −1.6 |
| S | −0.8 |
| T | −0.7 |
| W | −0.9 |
| Y | −1.3 |
| V | 4.2 |
Figure 3Basic structure of amino acids.
Side chain mass of amino acids.
| Amino acids | Mass |
|---|---|
| A | 15.0347 |
| R | 100.1431 |
| N | 58.0597 |
| D | 59.0445 |
| C | 47.0947 |
| E | 73.0713 |
| Q | 72.0865 |
| G | 1.0079 |
| H | 81.0969 |
| I | 57.1151 |
| L | 57.1151 |
| K | 72.1297 |
| M | 75.1483 |
| F | 91.1323 |
| P | 41.0725 |
| S | 31.0341 |
| T | 45.0609 |
| W | 130.1689 |
| Y | 107.1317 |
| V | 43.0883 |
Figure 4Schematic diagram for filtering 9INSb.
Structures of the CB513 data set.
| Structures | H | E | C | Total |
|---|---|---|---|---|
| Residues | 29090 | 17950 | 37053 | 84093 |
Optimal parameters for different window sizes.
| Window sizes | Features | Best | Best | Accuracy (%) |
|---|---|---|---|---|
| 7 | 146 | 20 | 2−3 | 76.3203 |
| 9 | 186 | 21 | 2−4 | 76.7935 |
| 11 | 226 | 20 | 2−4 | 77.4464 |
| 13 | 266 | 21 | 2−4 | 78.0029 |
| 15 | 306 | 21 | 2−4 | 77.7806 |
| 17 | 346 | 21 | 2−5 | 77.6549 |
| 19 | 386 | 21 | 2−4 | 77.5796 |
Confusion matrix without filtering.
| Actual | Predicted | |||
|---|---|---|---|---|
| H | E | C | Recall (%) | |
| H | 22976 | 931 | 5183 | 78.98 |
| E | 1044 | 11569 | 5337 | 64.45 |
| C | 3451 | 3059 | 30543 | 82.43 |
| Precision (%) | 83.64 | 74.36 | 74.38 | 77.40 |
Confusion matrix with filtering.
| Actual | Predicted | |||
|---|---|---|---|---|
| H | E | C | Recall (%) | |
| H | 22372 | 818 | 5900 | 76.91 |
| E | 432 | 11776 | 5742 | 65.60 |
| C | 1514 | 2819 | 32720 | 88.31 |
| Precision (%) | 92.00 | 76.40 | 73.76 | 79.52 |
Comparisons between ours and other methods.
| Methods |
| SOV94 | SOV99 | R(H) | R(E) | R(C) |
|---|---|---|---|---|---|---|
| PHD (RS126) [ | 70.8 | 73.5 | — | 72.0 | 66.0 | 72.0 |
| SVMfreq (RS126) [ | 71.2 | 74.6 | — | 73.0 | 58.0 | 73.0 |
| SVMfreq (CB513) [ | 73.5 | 76.2 | — | 75.0 | 60.0 | 79.0 |
| PMSVM (CB513) [ | 75.2 | 80.0 | — | 80.4 | 71.5 | 72.8 |
| SVMpsi (RS126) [ | 76.1 | 79.6 | 72.0 | 77.2 | 63.9 | 81.5 |
| SVMpsi (CB513) [ | 76.6 | 80.1 | 73.5 | 78.1 | 65.6 | 81.1 |
| Ours without filtering (CB513) | 77.40 | 90.20 | 71.10 | 78.98 | 64.45 | 82.43 |
| Ours with filtering (CB513) | 79.52 | 86.10 | 74.60 | 76.91 | 65.60 | 88.31 |