| Literature DB >> 31451723 |
Mirko Torrisi1, Manaz Kaleel1, Gianluca Pollastri2.
Abstract
Protein Secondary Structure prediction has been a central topic of research in Bioinformatics for decades. In spite of this, even the most sophisticated ab initio SS predictors are not able to reach the theoretical limit of three-state prediction accuracy (88-90%), while only a few predict more than the 3 traditional Helix, Strand and Coil classes. In this study we present tests on different models trained both on single sequence and evolutionary profile-based inputs and develop a new state-of-the-art system with Porter 5. Porter 5 is composed of ensembles of cascaded Bidirectional Recurrent Neural Networks and Convolutional Neural Networks, incorporates new input encoding techniques and is trained on a large set of protein structures. Porter 5 achieves 84% accuracy (81% SOV) when tested on 3 classes and 73% accuracy (70% SOV) on 8 classes on a large independent set. In our tests Porter 5 is 2% more accurate than its previous version and outperforms or matches the most recent predictors of secondary structure we tested. When Porter 5 is retrained on SCOPe based sets that eliminate homology between training/testing samples we obtain similar results. Porter is available as a web server and standalone program at http://distilldeep.ucd.ie/porter/ alongside all the datasets and alignments.Entities:
Mesh:
Year: 2019 PMID: 31451723 PMCID: PMC6710256 DOI: 10.1038/s41598-019-48786-x
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Performances of single models of different NN architectures on the validation set.
| Method | profile-less | plain profiles | deep profiles |
|---|---|---|---|
| Baseline | 45.19% | 60.9% | 61.33% |
| FFNN | 69.85% | 80.02% | 80.45% |
| CBRCNN | 71.33% | 82.32% | 83.1% |
Performances of single CBRCNN trained with different approaches relaying on both PSI-BLAST and HHblits.
| Training |
| Refining | Average | Union | Intersection | Concatenation |
|---|---|---|---|---|---|---|
| Q3 Accuracy | 83.41% | 83.79% | 83.41% | 82.81% | 83.77% |
Assessment on the 2017_test set of three-state ensembles trained on either five-fold cross-validation or full set.
| Training strategy | PSI-BLAST | HHblits | Concatenation | PSI-BLAST and HHblits | All the previous |
|---|---|---|---|---|---|
| Five-fold cross-validation | 83.55% | 83.55% | 83.98% | 84.18% | 84.19% |
| Full set (Porter 5) | 83.42% | 83.39% | 83.49% | 84.13% | 84.19% |
Q3/Q8 accuracy and SOV score per AA on the full test set.
| Method | Q3 | SOV’99 | SOV_refine | Q8 | SOV8’99 | SOV8_refine |
|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
| Porter 5 (HHblits and PSI-BLAST) | 83.49% | 80.17% | 75.64% | 71.94% | 69.03% | 71.45% |
| Porter 5 (PSI-BLAST only) | 83.42% | 80.41% | 75.8% | 72.11% | 69.28% | 71.56% |
| Porter 5 (HHblits only) | 83.39% | 80.19% | 75.59% | 71.8% | 68.87% | 71.16% |
| SSpro 5.1 with templates | 82.62% | 79% | 74.58% | 71.91% | 68.68% | 70.72% |
| PSIPRED 4.01 | 82.06% | 77.83% | 72.95% | N.A. | N.A. | N.A. |
| RaptorX-Property | 82.04% | 78.57% | 73.66% | 70.74% | 67.59% | 69.65% |
| Porter 4 | 82% | 78.85% | 73.89% | N.A. | N.A. | N.A. |
| DeepCNF | 81% | 76.96% | 71.84% | 69.76% | 66.42% | 68.5% |
| SSpro 5.1 ab initio | 80.7% | 76.85% | 72% | 68.85% | 65.33% | 67.54% |
Performances on the smaller 2017_test set for which Spider3 generates predictions, sorted by Q3 accuracy.
| Method | Q3 per AA | SOV’99 per AA | SOV_refine per AA | Q3 per protein | SOV’99 per protein | SOV_refine per protein |
|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
| Spider3 | 83.15% | 79.43% | 74.68% | 83.42% | 79.79% | 75.07% |
| Porter 5 (HHblits only) | 83.06% | 79.49% | 74.71% | 83.68% | 80.26% | 75.58% |
| SSpro 5.1 with templates | 82.58% | 78.54% | 74.02% | 83.94% | 80.29% | 76.15% |
| PSIPRED 4.01 | 81.88% | 77.36% | 72.33% | 82.48% | 78.22% | 73.31% |
| RaptorX-Property | 81.86% | 78.08% | 72.99% | 82.57% | 78.99% | 74.03% |
| Porter 4 | 81.66% | 78.05% | 72.89% | 82.29% | 78.61% | 73.55% |
| SSpro 5.1 ab initio | 81.17% | 76.87% | 72.03% | 81.1% | 76.92% | 72.12% |
| DeepCNF | 81.04% | 76.74% | 71.47% | 81.16% | 76.99% | 71.7% |
Assessment of Porter 5 on CASP13, i.e. 43 targets, and on the last 6 months of CAMEO, i.e. 463 proteins released from Dec 28, 2018 to Jun 22, 2019.
| Method | Q3 | SOV’99 | SOV_refine | Q8 | SOV8’99 | SOV8_refine |
|---|---|---|---|---|---|---|
| CAMEO | 85.48% | 82.08% | 78.08% | 74.99% | 72.36% | 74.81% |
| CASP13 | 82.99% | 78.36% | 73.39% | 71.08% | 66.95% | 69.27% |
Porter 5 on NMR vs X-ray crystallography proteins.
| Method | Q3 | SOV’99 | SOV_refine | Q8 | SOV’99 | SOV_refine |
|---|---|---|---|---|---|---|
| NMR | 81.61% | 76.64% | 70.55% | 67.52% | 62.41% | 63.86% |
|
|
|
|
|
|
|
|
Most recent predictors and Jpred4 assessed on 2019_test set of 618 proteins.
| Method | Q3 | SOV’99 | SOV_refine | Q8 | SOV8’99 | SOV8_refine |
|---|---|---|---|---|---|---|
| SPOT-1D | 82.13% | 76.65% | 71.37% | 69.69% | 65.52% | 67.18% |
|
|
|
|
|
|
|
|
| NetSurfP-2.0 | 81.3% | 75.64% | 70.3% | 67.93% | 62.77% | 64.66% |
| MUFOLD-SS | 81.09% | 75.28% | 69.87% | 68.21% | 64.3% | 66.33% |
|
|
|
|
|
Overview of AA composition of Training, 2017_test and 2019_test.
| 3-states | 8-states | Training Set | 2017_test Set | 2019_test Set | |||
|---|---|---|---|---|---|---|---|
| Helices | G | 38% | 135,498 | 39.22% | 21,404 | 37.41% | 2,300 |
| H | 1,306,610 | 233,961 | 31,854 | ||||
| I | 714 | 177 | 33 | ||||
| Sheets | E | 22.15% | 800,297 | 20.9% | 129,425 | 17.87% | 15,411 |
| B | 41,026 | 6,793 | 916 | ||||
| Coils | C | 39.85% | 764,391 | 39.88% | 133,183 | 44.72% | 21,605 |
| S | 331,075 | 57,678 | 9,804 | ||||
| T | 417,815 | 68,973 | 9,452 | ||||
Figure 1Diagram of the BRCNN. The input sequence is processed by three stages, i.e. one BRNN and two CNN stages, in order to predict the SS. The final architecture of Porter 5 - the CBRCNN - is the (two) cascaded version of the above.
The hyperparameters of the models employed for Porter 5.
| Input | 3-state | 8-state | ||||||
|---|---|---|---|---|---|---|---|---|
| PSI-BLAST or HHblits | Concatenated | PSI-BLAST or HHblits | Concatenated | |||||
| NF/B | 25 | 30 | 30 | 25 | 30 | 35 | 36 | 30 |
| NHF/B | 40 | 40 | 45 | 40 | 45 | 55 | 60 | 45 |
| NHY | 50 | 50 | 55 | 50 | 50 | 45 | 48 | 50 |
| CoF/B | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 |
| Cseg | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 |
| Cwin | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 |
A total of 7 models are ensembled in both 3- and 8-state component. The PSI-BLAST or HHblits only models share the same hyperparameters.