| Literature DB >> 35296230 |
Yiyu Hong1, Juyong Lee2,3, Junsu Ko1.
Abstract
BACKGROUND: The accuracy of protein 3D structure prediction has been dramatically improved with the help of advances in deep learning. In the recent CASP14, Deepmind demonstrated that their new version of AlphaFold (AF) produces highly accurate 3D models almost close to experimental structures. The success of AF shows that the multiple sequence alignment of a sequence contains rich evolutionary information, leading to accurate 3D models. Despite the success of AF, only the prediction code is open, and training a similar model requires a vast amount of computational resources. Thus, developing a lighter prediction model is still necessary.Entities:
Keywords: Deep learning; Multiple sequence alignment; Protein language model; Protein structure prediction
Mesh:
Substances:
Year: 2022 PMID: 35296230 PMCID: PMC8925138 DOI: 10.1186/s12859-022-04628-8
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Pipeline of the proposed protein 3D structure prediction method. Input an MSA to the MSA Transformer to extract MSA Features and row attention maps. Then, the MSA features corresponding to the query sequence and the row attention maps are combined to a 2D feature maps by a set of transformations. Next, the 2D feature maps are input to a Dilated ResNet after dimension reduction to output inter-residue geometries, which further input to the trRosetta Protein Structure Modeling to output a predicted protein 3D structure
Contact Precision on CASP13 FM and FM/TBM targets corresponding to 43 domains (results are
adapted from DeepDist [34])
| Group | Top L | Top L/2 | Top L/5 |
|---|---|---|---|
| TripletRes | 0.451 | 0.587 | 0.700 |
| AlphaFold | 0.497 | 0.629 | 0.742 |
| RaptorX-Contact | 0.481 | 0.612 | 0.744 |
| trRosetta | 0.506 | 0.652 | 0.751 |
| DeepDist | 0.517 | 0.661 | 0.793 |
| A-Prot (w BFD) | |||
| A-Prot (w/o BFD) | 0.540 | 0.681 | 0.780 |
The highest score of each column is highlighted in bold
The average TM-score and lDDT of the model structures of 25 CASP14 FM, FM/TBM, and TBM-hard domains
| Server group | TM-score | lDDT | ||
|---|---|---|---|---|
| FEIG-S | 0.461 | 0.0007 | 0.413 | 0.0172 |
| BAKER-ROSETTASERVER | 0.517 | 0.0391 | 0.455 | 0.2636 |
| Zhang-Server | 0.6684 | 0.489 | 0.8510 | |
| QUARK | 0.588 | 0.6208 | 0.484 | 0.7408 |
| A-Prot | 0.576 |
Performance of A-Prot and trRosetta using same MSA of ours on 25 CASP14 FM, FM/TBM, TBM-Hard domains. (Inter-residue distance and angles are measured using Pearson correlation between predicted bin indexes of max probability and ground truth, Top L for long range contact precision)
| Method | Top L | TMS | lDDT | ||||
|---|---|---|---|---|---|---|---|
| trRosetta | 0.539 | 0.498 | 0.459 | 0.362 | 0.363 | 0.524 | 0.457 |
| A-Prot | 0.604 | 0.578 | 0.528 | 0.464 | 0.424 | 0.576 | 0.499 |
Fig. 2TM-score and lDDT on CASP14 FM, FM/TBM and TBM-hard 25 domains compared with BAKER-ROSETTASERVER
Fig. 3Model comparison of four high-quality CASP14 models generated from our method versus their native structures. Brown: native structure; Blue: model