| Literature DB >> 17217506 |
Hang Chen1, Fei Gu, Zhengge Huang.
Abstract
BACKGROUND: Protein secondary structure prediction is a fundamental and important component in the analytical study of protein structure and functions. The prediction technique has been developed for several decades. The Chou-Fasman algorithm, one of the earliest methods, has been successfully applied to the prediction. However, this method has its limitations due to low accuracy, unreliable parameters, and over prediction. Thanks to the recent development in protein folding type-specific structure propensities and wavelet transformation, the shortcomings in Chou-Fasman method are able to be overcome.Entities:
Mesh:
Substances:
Year: 2006 PMID: 17217506 PMCID: PMC1780123 DOI: 10.1186/1471-2105-7-S4-S14
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Hydrophobicity sequence of protein 3dfr.
The hydrophobic values of 20 amino acids
| amino acid | value | amino acid | value |
| Gly | 0.00 | Cys | 1.52 |
| Gln | 0.00 | Lys | 1.64 |
| Ser | 0.07 | Met | 1.67 |
| Thr | 0.07 | Val | 1.87 |
| Asn | 0.09 | Leu | 2.17 |
| Asp | 0.66 | Tyr | 2.76 |
| Glu | 0.67 | Pro | 2.77 |
| Arg | 0.85 | Phe | 2.87 |
| Ala | 0.87 | Ile | 3.15 |
| His | 0.87 | Trp | 3.77 |
Figure 2Plot of Morlet wavelet transform of the amino acid hydrophobic free energy sequence at scales from 1 to 64 (dark represents coefficient at minimum value whereas light correspond to maximum value).
Figure 3Plot of CWT coefficients at scale 9.
Figure 4Algorithm flowchart with prediction and evaluation.
Compare traditional CFM with four current methods
| Method | Q3(%) | QH(%) | QE(%) | QHPRE (%) | QEPRE (%) | SOV (%) | SOVH (%) | SOVE (%) |
| CFM | 46.88 | 55.64 | 60.77 | 49.22 | 34.40 | 36.26 | 42.63 | 43.95 |
| DSC | 69.10 | 63.65 | 54.95 | 73.70 | 71.31 | 66.19 | 63.14 | 60.86 |
| NNSSP | 72.31 | 64.93 | 55.28 | 80.42 | 73.34 | 67.32 | 66.07 | 63.40 |
| PHD | 72.60 | 65.38 | 68.59 | 77.98 | 63.55 | 69.94 | 65.74 | 72.05 |
| PREDATOR | 69.58 | 62.21 | 54.50 | 75.33 | 70.14 | 64.85 | 64.23 | 60.57 |
Result with the improvement of nucleation
| Method | Q3(%) | QH (%) | QE(%) | QHPRE (%) | QEPRE (%) | SOV (%) | SOVH (%) | SOVE (%) |
| CFM | 46.88 | 55.64 | 60.77 | 49.22 | 34.40 | 36.26 | 42.63 | 43.95 |
| Improved nucleation | 48.10 | 52.14 | 58.30 | 49.54 | 34.71 | 40.82 | 44.97 | 47.82 |
Result with the improvement of propensities
| Method | Q3(%) | QH (%) | QE (%) | QHPRE (%) | QEPRE (%) | SOV (%) | SOVH (%) | SOVE (%) |
| CFM | 46.88 | 55.64 | 60.77 | 49.22 | 34.40 | 36.26 | 42.63 | 43.95 |
| Improved propensity | 54.56 | 57.14 | 60.14 | 69.70 | 55.48 | 40.41 | 43.93 | 46.52 |
Result with the improvement of Chou-Fasman rules
| Method | Q3(%) | QH(%) | QE(%) | QHPRE (%) | QEPRE (%) | SOV (%) | SOVH (%) | SOVE (%) |
| CFM | 46.88 | 55.64 | 60.77 | 49.22 | 34.40 | 36.26 | 42.63 | 43.95 |
| Improved rules | 44.09 | 57.72 | 72.21 | 45.86 | 31.11 | 36.51 | 46.06 | 51.86 |
The degree of improvement with 3 different steps of our method
| Q3 | QPRE | SOV | |
| Step 1 | No difference | No difference | Distinct better |
| Step 2 | No difference | Much better | A little better |
| Step 3 | No difference | A little worse | Distinct better |
Extension threshold for proteins of 4 classes
| Protein Class | helix extension threshold | strand extension threshold |
| All alpha | 0.98 | No statistic |
| All beta | No statistic | 1.01 |
| Alpha and beta (α/β) | 1 | 1.02 |
| Alpha or beta (α+β) | 0.99 | 0.98 |
Notice that the strand extension threshold in all alpha class and the helix extension threshold in all beta class are not statistical in reference 20.
Result with all three improvements
| Method | Q3(%) | QH(%) | QE(%) | QHPRE (%) | QEPRE (%) | SOV (%) | SOVH (%) | SOVE (%) |
| CFM | 46.88 | 55.64 | 60.77 | 49.22 | 34.40 | 36.26 | 42.63 | 43.95 |
| With all improvements | 56.10 | 72.86 | 68.17 | 67.17 | 53.35 | 51.14 | 60.89 | 57.46 |
Compare our method with 4 current methods
| Method | Q3(%) | QH(%) | QE(%) | QHPRE (%) | QEPRE (%) | SOV (%) | SOVH (%) | SOVE (%) |
| Our method | 56.10- | 72.86* | 68.17* | 67.17+ | 53.35- | 51.14- | 60.89+ | 57.46+ |
| DSC | 69.10 | 63.65 | 54.95 | 73.70 | 71.31 | 66.19 | 63.14 | 60.86 |
| NNSSP | 72.31 | 64.93 | 55.28 | 80.42 | 73.34 | 67.32 | 66.07 | 63.40 |
| PHD | 72.60 | 65.38 | 68.59 | 77.98 | 63.55 | 69.94 | 65.74 | 72.05 |
| PREDATOR | 69.58 | 62.21 | 54.50 | 75.33 | 70.14 | 64.85 | 64.23 | 60.57 |
Superscript marker with sign '*', '+', '-' means the accuracy of our method was better than, close to, worse than other four methods in these indices, respectively.