| Literature DB >> 32612753 |
Mirko Torrisi1, Gianluca Pollastri1, Quan Le2.
Abstract
Protein Structure Prediction is a central topic in Structural Bioinformatics. Since the '60s statistical methods, followed by increasingly complex Machine Learning and recently Deep Learning methods, have been employed to predict protein structural information at various levels of detail. In this review, we briefly introduce the problem of protein structure prediction and essential elements of Deep Learning (such as Convolutional Neural Networks, Recurrent Neural Networks and basic feed-forward Neural Networks they are founded on), after which we discuss the evolution of predictive methods for one-dimensional and two-dimensional Protein Structure Annotations, from the simple statistical methods of the early days, to the computationally intensive highly-sophisticated Deep Learning algorithms of the last decade. In the process, we review the growth of the databases these algorithms are based on, and how this has impacted our ability to leverage knowledge about evolution and co-evolution to achieve improved predictions. We conclude this review outlining the current role of Deep Learning techniques within the wider pipelines to predict protein structures and trying to anticipate what challenges and opportunities may arise next.Entities:
Keywords: Deep learning; Machine learning; Protein structure prediction
Year: 2020 PMID: 32612753 PMCID: PMC7305407 DOI: 10.1016/j.csbj.2019.12.011
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 7.271
Fig. 1A generic pipeline for ab initio Protein Structure Prediction, in which evolutionary information in the form of alignments, 1D and 2D PSA are intermediate steps.
Fig. 2Growth of known structures in the Protein Data Bank (left) and known sequences in Uniprot (right). The y-axis is shown in logarithmic scale for the Uniprot.
Fig. 3Performances of secondary structure predictors over the years. “stat” are predictors based on statistical methods other than Neural Networks. “ML” are predictors based on shallow Neural Networks or Support Vector Machines. “DL-CNN” are Deep Learning methods based on Convolutional Neural Networks. “DL-RNN” are Deep Learning methods based on Recurrent Neural Networks. Data extracted from accompanying publications of predictors referenced in this article.
Fig. 4Improvements in quality of 3D predictions for free modelling (ab initio) targets between CASP9 and CASP13.
Deep Learning methods for 1D PSA prediction, along with models adopted and tools to gather evolutionary information, respectively. Secondary structure (SS), solvent accessibility (SA), torsion angles (TA), contact density (CD) and disordered regions (DR) are the PSA predicted.
| Predictor | PSA | Model | Evolutionary Information |
|---|---|---|---|
| SPIDER2 | SS, SA | Multi-stage FFNN | PSI-BLAST |
| SSpro/ACCpro5 | SS, SA | BRNN-CNN | PSI-BLAST |
| Brewery | SS, SA, TA, CD | Multi-stage BRNN-CNN | PSI-BLAST, HHblits |
| SPIDER3 | SS, SA, TA, CD | BLSTM | PSI-BLAST, HHblits |
| RaptorX-Property | SS, SA, DR | CNF | PSI-BLAST, HHblits |
| NetSurfP-2.0 | SS, SA, TA, DR | BLSTM | HHblits, (or) MMseqs2 |
Modern and Deep Learning methods for 2D PSA prediction, along with models adopted and tools to gather evolutionary information, respectively. Contact maps (CM), multi-class CM and distance maps (DM) are the PSA predicted.
| Predictor | PSA | Model | Evolutionary Information |
|---|---|---|---|
| MetaPSICOV2 | CM | Multi-stage FFNN | HHblits, JackHMMer |
| DeepCDpred | multi-class CM | Multi-stage FFNN | HHblits |
| RaptorX-Contact | multi-class CM | Residual CNN | HHblits |
| DNCON2 | CM | Multi-stage CNN | HHblits, jackHMMer |
| DeepContact | CM | Residual CNN | HHblits, jackHMMer |
| DeepCov | CM | CNN | HHblits |
| Pconsc4 | CM | CNN | HHblits |
| SPOT-Contact | CM | Residual CNN 2D-BLSTM | HHblits, PSI-BLAST |
| TripletRes | CM | Multi-stage residual CNN | HHblits, jackHMMer, HMMER |
| AlphaFold | DM | Residual CNN | HHblits, PSI-BLAST |