| Literature DB >> 28040746 |
Yuedong Yang1, Jianzhao Gao2, Jihua Wang3, Rhys Heffernan4, Jack Hanson4, Kuldip Paliwal4, Yaoqi Zhou1,3.
Abstract
Protein secondary structure prediction began in 1951 when Pauling and Corey predicted helical and sheet conformations for protein polypeptide backbone even before the first protein structure was determined. Sixty-five years later, powerful new methods breathe new life into this field. The highest three-state accuracy without relying on structure templates is now at 82-84%, a number unthinkable just a few years ago. These improvements came from increasingly larger databases of protein sequences and structures for training, the use of template secondary structure information and more powerful deep learning techniques. As we are approaching to the theoretical limit of three-state prediction (88-90%), alternative to secondary structure prediction (prediction of backbone torsion angles and Cα-atom-based angles and torsion angles) not only has more room for further improvement but also allows direct prediction of three-dimensional fragment structures with constantly improved accuracy. About 20% of all 40-residue fragments in a database of 1199 non-redundant proteins have <6 Å root-mean-squared distance from the native conformations by SPIDER2. More powerful deep learning methods with improved capability of capturing long-range interactions begin to emerge as the next generation of techniques for secondary structure prediction. The time has come to finish off the final stretch of the long march towards protein secondary structure prediction.Entities:
Mesh:
Substances:
Year: 2018 PMID: 28040746 PMCID: PMC5952956 DOI: 10.1093/bib/bbw129
Source DB: PubMed Journal: Brief Bioinform ISSN: 1467-5463 Impact factor: 11.622
Figure 1New methods continue in development for secondary structure prediction. The number of publications on protein secondary structure prediction per year and its cumulative increment.
Figure 2Conservation of secondary structure in homologous sequences. The average consistency on secondary structure of homologous sequences at a given sequence identity between two sequences compared or over all compared sequences above a given sequence identity (cumulative from high sequence identity).
Method comparison based on Q3 using newly released structures (TS115) and CASP12 targets (15 proteins) for secondary structure prediction
| Data set Method | TS115 | CASP12 | Server location | ||
|---|---|---|---|---|---|
| Q3 | Q3 | ||||
| Jpred4 | 0.771 | 0.0007 | 0.751 | 0.04 | |
| SPINE X | 0.801 | 0.0002 | 0.769 | 0.006 | |
| PSIPRED 3.3 | 0.802 | 0.12 | 0.780 | 0.19 | |
| SCORPION | 0.817 | 0.45 | 0.805 | 0.44 | Stand-alone version from |
| SPIDER2 | 0.819 | NA | 0.798 | NA | |
| PORTER 4.0 | 0.820 | 0.17 | 0.798 | 0.67 | |
| DeepCNF | 0.823 | 0.01 | 0.821 | 0.14 | |
Note.
aPaired t-test from SPIDER2.
bJpred only predicts the sequence <800 residues. For TS115, there is one sequence (5hdtA) with 1085 residues. 5hdtA was divided into two chains with 800 residues and 285 residues, respectively.
Overall misclassification errors of H, E and C states and the errors in the internal and at the boundary region of secondary structure elements along with prediction accuracy in the internal and at boundary regions of secondary structures for newly released structures (TS115) by SPIDER2 and DeepCNF
| Method | SPIDER2 (%) | DeepCNF (%) |
|---|---|---|
| H⇔E | 0.96 | 0.52 |
| H⇔C | 9.5 | 10.1 |
| E⇔C | 7.6 | 7.0 |
| H/E/C (internal) | 9.7/14.5/14.3 | 11.0/11.9/10.9 |
| H/E/C (boundary) | 37.9/38.3/24.3 | 43.1/37.8/23.4 |
| Q3 (internal) | 87.9 | 88.9 |
| Q3 (boundary) | 68.6 | 67.9 |
Figure 3The dependence of accuracies and misclassifications on solvent accessibility. The accuracy of predicting helices (QH), sheets (QE) and coils (QC) and the overall accuracy (Q3) (A) and the misclassifications of helices to coils, sheets, sheets to coils and helices and coils to helices and sheets (B) as a function of solvent accessibility for TS115 by SPIDER2.
Figure 4The dependence of accuracy on non-local contacts. The secondary structure accuracy as a function of the number of non-local contacts (|i − j| > 19) for the independent test set (TS1199) by SPINE X and SPIDER2.
Method comparison based on Q8 using newly released structures (TS115) and CASP12 targets (15 proteins) for eight-state secondary structure prediction
| Data set Method | TS115 | CASP12 | Server location | ||
|---|---|---|---|---|---|
| Q8 | Q8 | ||||
| SSPRO8 | 0.68 | 3E-9 | 0.69 | 0.014 | |
| DeepCNF | 0.72 | NA | 0.73 | NA | |
aPaired-t test from DeepCNF.
Method comparison based on MAE using newly released structures (TS115) and CASP12 targets (15 proteins) for prediction of backbone angles (ϕ, ψ, θ and τ)
| Data set Method | TS115 | CASP12 | Server location | ||
|---|---|---|---|---|---|
| ϕ (°) | ψ (°) | ϕ (°) | ψ (°) | ||
| SPINE-X | 19.4 | 32.9 | 19.1 | 33.2 | |
| SPIDER2 | 18.2 | 29.3 | 17.8 | 28.1 | |
| θ (°) | τ (°) | θ (°) | τ (°) | ||
| SPIDER2 | 7.89 | 30.8 | 8.31 | 31.1 | |
Figure 5Direct prediction of three-dimensional structure by predicted angles. Structure (dark colour) constructed directly from φ/ψ angles compared with native structure (light colour) for residues 24–63 from PDB 5fdy chain A.