| Literature DB >> 34074028 |
Subash C Pakhrin1, Bikash Shrestha2, Badri Adhikari2, Dukka B Kc1.
Abstract
Obtaining an accurate description of protein structure is a fundamental step toward understanding the underpinning of biology. Although recent advances in experimental approaches have greatly enhanced our capabilities to experimentally determine protein structures, the gap between the number of protein sequences and known protein structures is ever increasing. Computational protein structure prediction is one of the ways to fill this gap. Recently, the protein structure prediction field has witnessed a lot of advances due to Deep Learning (DL)-based approaches as evidenced by the success of AlphaFold2 in the most recent Critical Assessment of protein Structure Prediction (CASP14). In this article, we highlight important milestones and progresses in the field of protein structure prediction due to DL-based methods as observed in CASP experiments. We describe advances in various steps of protein structure prediction pipeline viz. protein contact map prediction, protein distogram prediction, protein real-valued distance prediction, and Quality Assessment/refinement. We also highlight some end-to-end DL-based approaches for protein structure prediction approaches. Additionally, as there have been some recent DL-based advances in protein structure determination using Cryo-Electron (Cryo-EM) microscopy based, we also highlight some of the important progress in the field. Finally, we provide an outlook and possible future research directions for DL-based approaches in the protein structure prediction arena.Entities:
Keywords: deep learning; protein contact map prediction; protein distance prediction; protein quality assessment; protein structure prediction
Mesh:
Substances:
Year: 2021 PMID: 34074028 PMCID: PMC8197379 DOI: 10.3390/ijms22115553
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1General schematic of template-free protein structure prediction pipeline. Most of the successful existing pipelines for protein structure prediction have these important steps: (i) generation of multiple sequence alignment (MSA), (ii) contact map prediction, distogram prediction or real-value distance prediction, (iii) structure/fragment assembly, and (iv) QA/refinement.
Figure 2Growth of EMD density maps in EMDB from 2002 to 2020.
Figure 3From the perspective of Deep Learning method development, the problem of protein distogram or real-valued distance prediction (bottom row) is similar to the ‘depth prediction problem’ in computer vision (top row). In all these problems, the input to the Deep Learning model is a volume (3D tensor). In case of computer vision, 2D images expand as a volume because of the RGB or HSV channels. Similarly, in the case of distance prediction, predicted 1D and 2D features are transformed and packed into 3D volume with many channels of inter-residue information.
Summary of tools: category, architecture, strength/uniqueness, and availability of the tools described in this article.
| Category | Tool | Architecture | Strength | Code/Web Server |
|---|---|---|---|---|
| End-to-end structure prediction | AlQuarishi’s end-to-end model [ | Recurrent geometric network (RGN) | Predicted novel folds without co-evolutionary data, it achieved state-of-the-art accuracy |
|
| NEMO [ | DL | First end-to-end Deep Learning-based approach | NA | |
| AlphaFold 2 | Transformers (attention mechanism) | Evolutionary related sequences and MSA are fetched into transformers to accurately predict protein 3D structure | NA | |
| Real-valued distance prediction | PDNET [ | ResNet | A fully open-source and light framework for distance, contact, and distogram prediction |
|
| GAN-based method [ | GAN+ ResNet | One of the initial efforts to predict real-valued distance maps; GANs developed to predict real-valued distance maps | ||
| Xu’s method [ | ResNet | Predicts not only real-valued distance but also mean and deviation of a distance for folding | NA | |
| REALDIST [ | ResNet | Highly accurate distance prediction method focusing only on real-valued distance map predictions and distance-guided 3D modeling |
| |
| DeepDist [ | ResNet | Predicts both distograms and real-valued distances and delivers high-accuracy distance maps |
| |
| Distogram | RaptorX [ | ResNet | The original RaptorX method upgraded to predict distograms | |
| ProSPr [ | ResNet | An open-source protein distance prediction network inspired from the AlphaFold implementation |
| |
| trRosetta [ | ResNet | A fully Tensorflow-based open-source implementation to predict distograms; demonstrated to outperform AlphaFold | ||
| DeepH3 [ | ResNet | It predicts inter-residue distances and orientation from antibody heavy and light chain sequences |
| |
| AttentiveDist [ | RestNet with Attention | It uses MSAs generated with different E-values to increase the co-evolutionary information provided to the model |
| |
| DISTEVAL [ | A tool and web server for evaluating predicted real-values distances, distograms, and contacts |
| ||
| Contact map prediction | QDeep [ | ResNets | Distance-based single-model protein quality estimation method based on residue-level ensemble error classifications. |
|
| ResPRE [ | Deep residual convolutional neural network | ResPRE is better than the methods that are built on co-evolution coupling analyses or a meta-server based neural network | ||
| MapPred [ | Deep ResNet | Covariance features derived from MSA are used to predict contact maps, distance maps, and distance distribution | ||
| DEEPCON [ | ResNet, U-Net, and FCN | Compares various deep learning architectures for protein contact prediction |
| |
| DeepECA [ | CNN with ResNet | Structures predicted by DeepECA, based on contacts and SS, are more accurate than existing evolutionary coupling analysis methods | ||
| ContactGAN [ | GAN | GAN-based denoising framework to push the limit of protein contact prediction |
| |
| InterPretContactMap [ | Attention based CNN | Attention mechanisms was used to improve the interpretability of deep learning contact |
| |
| TripletRes [ | ResNet | TripletRes model inputs are raw co-evolutionary features, and it predicts high-accuracy contact maps | ||
| Overall protein structure prediction pipeline | AlphaFold [ | Deep Neural Network | Accurate predictions of the distances between pairs of residues, which convey more information about the structure than contact predictions |
|
| trRosetta [ | ResNet | A fully Tensorflow-based open-source implementation to predict distograms; demonstrated to outperform AlphaFold | ||
| RaptorX [ | ResNet | The original RaptorX method upgraded to predict distograms | ||
| MULTICOM [ | Deep Convolutional neural network | Predicts protein structure, secondary structure, solvent accessibility, disorder region, as well as contact map | ||
| C-I-TASSER and C-QUARK [ | Deep residual CNN | C-I-TASSER is derived from I-TASSER for high-accuracy protein structure and function predictions. | ||
| Quality Assessment (QA) and refinements | QDeep [ | ResNets | QDeep is a new distance-based single-model protein quality estimation method based on residue-level ensemble error classifications. |
|
| ResNetQA [ | ResNet | It is a new single-model-based QA method for both local and global quality assessment. |
| |
| DeepAccNet [ | 3D Convolution, 2D convolutions | DeepAccNet estimates per-residue accuracy and residue–residue distance signed error in protein models and uses these predictions to guide Rosetta protein structure refinement. |
| |
| Single Particle picking | PIXER [ | Deep Neural Network | PIXER is a fully automated particle-selection method, it can acquire accurate results under low-SNR conditions within minutes. |
|
| AutoCryoPicker [ | Unsupervised ML algorithm | AutoCryoPicker can recognize particle-like objects from noisy Cryo-EM micrographs without the need of labeled training data, it is a useful tool for Cryo-EM protein structure determination |
| |
| MicroGraphCleaner [ | U-net architecture | MicrographCleaner is a tool that automatically discriminates between regions of micrographs which are suitable for particle picking, and those that are not. |
| |
| CASSPER [ | InceptionV4, | CASSPER is the first particle picking tool implementing the Residual Network architecture for efficient pixel-wise classification. |
| |
| Structure Prediction in Cryo-EM etc. | Dong Si Method [ | Cascade CNN | It predicts secondary structure elements, backbone structure, and Cα atoms, combining the results of each to produce a complete prediction map. |
|
| Emap2sec [ | CNN | Emap2sec identifies the secondary structures of proteins in Electron Microscopy maps at resolutions of between 5 and 10 Å. |
| |
| DeepTracer [ | Convolutional Network Architecture | DeepTracer determines the all-atom structure of a protein complex based on a Cryo-EM map and amino acid sequence. |
| |
| DEFMap [ | 3D convolution | DEFMap directly extracts the dynamics associated with the atomic fluctuations that are hidden in Cryo-EM density maps. |
| |
| Cryo-EM | EMRefiner [ | Monte Carlo | It is a Monte Carlo-based method for protein structure refinement and determination using a Cryo-EM density map |
|
| DEMO - | Deep Neural Network | DEMO-EM, does structure assembly of multi-domain proteins from Cryo-EM density maps. | ||
| SuperEM [ | GAN | SuperEM captures protein structure information from Cryo-EM maps more effectively than raw maps. |
| |
| Multi Domain Protein Structures | FUpred [ | ResNet | FUpred has better ability of domain boundary prediction than threading-based and machine learning-based methods. |
W: web server.