| Literature DB >> 30505403 |
Wenze Ding1,2, Wenzhi Mao1,2, Di Shao1,2, Wenxuan Zhang1,2, Haipeng Gong1,2.
Abstract
Information of residue-residue contacts is essential for understanding the mechanism of protein folding, and has been successfully applied as special topological restraints to simplify the conformational sampling in de novo protein structure prediction. Prediction of protein residue contacts has experienced amazingly rapid progresses recently, with prediction accuracy approaching impressively high levels in the past two years. In this work, we introduce a second version of our residue contact predictor, DeepConPred2, which exhibits substantially improved performance and sufficiently reduced running time after model re-optimization and feature updates. When testing on the CASP12 free modeling targets, our program reaches at least the same level of prediction accuracy as the best contact predictors so far and provides information complementary to other state-of-the-art methods in contact-assisted folding.Entities:
Keywords: Contact-assisted folding; Machine learning; Protein structure prediction; Residue contact prediction; Web server
Year: 2018 PMID: 30505403 PMCID: PMC6247404 DOI: 10.1016/j.csbj.2018.10.009
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 7.271
Fig. 1The schematic layout of DeepConPred2. Three modules are framed with dashed lines and marked with red italic.
ResNet models in the third module.
| Number of layers | Activation | F1-score | |
|---|---|---|---|
| Model 1 | 50 | Pre-activation | 0.5419 |
| Model 2 | 60 | Pre-activation | 0.5527 |
| Model 3 | 70 | Pre-activation | 0.5495 |
| Model 4 | 80 | Pre-activation | 0.5561 |
| Model 5 | 80 | Post-activation | 0.5508 |
| Ensemble | Average of above 5 models | 0.5711 | |
Performance evaluation on the independent test set.
| Range | Version | ||||
|---|---|---|---|---|---|
| Short | New | 0.7539 | 0.6667 | 0.5134 | 0.3622 |
| Old | – | – | – | – | |
| Medium | New | 0.6926 | 0.6176 | 0.4871 | 0.3806 |
| Old | – | – | – | – | |
| Long | New | 0.7411 | 0.7067 | 0.6294 | 0.5378 |
| Old | 0.5517 | 0.4061 | 0.3414 | 0.2993 |
Performance evaluation on the CASP11 set.
| Range | Version | ||||
|---|---|---|---|---|---|
| Short | New | 0.8345 | 0.7479 | 0.5680 | 0.3978 |
| Old | – | – | – | – | |
| Medium | New | 0.7881 | 0.7314 | 0.6135 | 0.4617 |
| Old | – | – | – | – | |
| Long | New | 0.7315 | 0.7126 | 0.6619 | 0.5695 |
| Old | 0.5306 | 0.4335 | 0.3813 | 0.2935 |
Performance evaluation on the CASP12 set.
| Range | Version | ||||
|---|---|---|---|---|---|
| Short | New | 0.7075 | 0.6689 | 0.5152 | 0.3600 |
| Old | – | – | – | – | |
| Medium | New | 0.6966 | 0.6568 | 0.5211 | 0.3860 |
| Old | – | – | – | – | |
| Long | New | 0.7039 | 0.6960 | 0.6225 | 0.5310 |
| Old | 0.5157 | 0.4400 | 0.3138 | 0.2579 |
Fig. 2Comparison of contact maps for a specific target (PDB ID: 1A3A). (a) The contact map predicted by our previous version. (b) The contact map predicted by the new version. (c) The contact map of the native structure.
Fig. 3Consumption of computational time of different versions on 102 CASP11 targets.
Comparison of prediction precisions on 22 CASP12 FM targets.
| Range | Methods | ||||
|---|---|---|---|---|---|
| Short | DNCON2 | 0.5219 | 0.5145 | 0.3879 | 0.2807 |
| RaptorX-Contact | 0.6871 | 0.5720 | 0.3583 | 0.2257 | |
| SPOT-Contact | 0.7220 | 0.6146 | 0.3994 | 0.2421 | |
| DeepConPred2 | 0.6257 | 0.5849 | 0.4434 | 0.3190 | |
| Medium | DNCON2 | 0.4698 | 0.4682 | 0.3837 | 0.2859 |
| RaptorX-Contact | 0.6104 | 0.5341 | 0.3608 | 0.2364 | |
| SPOT-Contact | 0.7120 | 0.6195 | 0.4182 | 0.2699 | |
| DeepConPred2 | 0.6091 | 0.5687 | 0.4263 | 0.3078 | |
| Long | DNCON2 | 0.5864 | 0.5349 | 0.4241 | 0.3378 |
| RaptorX-Contact | 0.6765 | 0.5855 | 0.5115 | 0.3950 | |
| SPOT-Contact | 0.6758 | 0.6318 | 0.5269 | 0.4393 | |
| DeepConPred2 | 0.6100 | 0.5756 | 0.4916 | 0.4127 |
P-values of paired t-test and Levene-test on CONFOLD RMSD groups from different methods.
| Paired t-test/Levene test | DNCON2 (12.50) | RaptorX-Contact (11.82) | SPOT-Contact (10.43) |
|---|---|---|---|
| DeepConPred2 (11.60) | 0.07238/0.9463 | 0.8031/0.6597 | 0.1145/0.8913 |
P-values of the paired t-test and Levene test are listed before and after the slash, respectively. Numbers in brackets are the average RMSD values of the 22 CASP12 FM targets for corresponding algorithms.
Fig. 4RMSD comparison between CONFOLD results of models generated using our program and other 3 methods on 22 CASP12 FM targets. Each point denotes a protein target, with various colors labeling the proteins of different sizes: red for small domains (length <120), blue for medium domains (120 ≤ length < 180), and black for large domains (length ≥ 180). The lime green dashed lines and fuchsia dotted lines denote the results of Deming regression and the Passing-Bablock regression, respectively.