| Literature DB >> 34213528 |
Lewis Moffat1, David T Jones1.
Abstract
MOTIVATION: Over the past 50 years, our ability to model protein sequences with evolutionary information has progressed in leaps and bounds. However, even with the latest deep learning methods, the modelling of a critically important class of proteins, single orphan sequences, remains unsolved.Entities:
Year: 2021 PMID: 34213528 PMCID: PMC8570780 DOI: 10.1093/bioinformatics/btab491
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Plot showing reported test Q3 scores for a range of published secondary structure prediction methods over the previous three decades. This includes single-sequence methods (Asai ; Aydin ; Bidargaddi ; Frishman and Argos, 1996; Heffernan ; Schmidler ) and homology methods (Cole ; Cuff ; Hanson ; Jones, 1999; Li and Yu, 2016; Meiler and Baker, 2003; Mirabello and Pollastri, 2013; Rost and Sander, 1993) separately to provide an illustrative view of how single-sequence methods have improved very slowly, compared to homology methods, over time. We include this work, S4PRED, to demonstrate how it is a step upwards in accuracy. In order to avoid conflation with Rosetta ab initio, we use the name Rosetta + Neural Network (Rosetta+NN) in this figure to refer to the work of Meiler & Baker (Meiler and Baker, 2003)
Fig. 2.(A) Table showing the difference in final accuracy (Q3 score) between the improved S4PRED, the AWD-GRU benchmark, and the current version of PSIPRED-Single on the CB513 test set. (B) Table of classification metrics for the S4PRED model test set predictions. These are shown for each of the three predicted class; α-helix, β-sheet and loop (or coil). The support is normalized across classes to 100 for clarity—there are a total of 84484 residue predictions in the test set. (C) Confusion matrix for the three classes in the S4PRED model test set predictions
Fig. 3.(A) Histogram of Q3 scores on the CB513 test set showing the improved results of S4PRED over PSIPRED-Single (PSIPRED-S). (B) Example of S4PRED and PSIPRED-Single secondary structure predictions relative to the true structure for the C terminal domain of pyruvate oxidase and decarboxylase (PDB ID 1POW)
Showing the Q3 scores and micro-averaged F1 scores achieved by S4PRED, SPIDER3-Single and PSIPRED-Single on two test sets; a test set of de novo designed proteins (labelled ‘Designed’) and a test set of orphan proteins (labelled ‘Orphans’)
| Q3 | F1 | |||
|---|---|---|---|---|
| Orphans | Designed | Orphans | Designed | |
|
|
|
|
|
|
|
| 73.3% | 89.4% | 0.733 | 0.890 |
|
| 71.1% | 86.6% | 0.718 | 0.868 |
Note: Results in bold show the superior performance of S4PRED.