| Literature DB >> 31846452 |
Giacomo Janson1, Alessandro Grottesi2, Marco Pietrosanto3, Gabriele Ausiello3, Giulia Guarguaglini4, Alessandro Paiardini1.
Abstract
The most frequently used approach for protein structure prediction is currently homology modeling. The 3D model building phase of this methodology is critical for obtaining an accurate and biologically useful prediction. The most widely employed tool to perform this task is MODELLER. This program implements the "modeling by satisfaction of spatial restraints" strategy and its core algorithm has not been altered significantly since the early 1990s. In this work, we have explored the idea of modifying MODELLER with two effective, yet computationally light strategies to improve its 3D modeling performance. Firstly, we have investigated how the level of accuracy in the estimation of structural variability between a target protein and its templates in the form of σ values profoundly influences 3D modeling. We show that the σ values produced by MODELLER are on average weakly correlated to the true level of structural divergence between target-template pairs and that increasing this correlation greatly improves the program's predictions, especially in multiple-template modeling. Secondly, we have inquired into how the incorporation of statistical potential terms (such as the DOPE potential) in the MODELLER's objective function impacts positively 3D modeling quality by providing a small but consistent improvement in metrics such as GDT-HA and lDDT and a large increase in stereochemical quality. Python modules to harness this second strategy are freely available at https://github.com/pymodproject/altmod. In summary, we show that there is a large room for improving MODELLER in terms of 3D modeling quality and we propose strategies that could be pursued in order to further increase its performance.Entities:
Year: 2019 PMID: 31846452 PMCID: PMC6938380 DOI: 10.1371/journal.pcbi.1007219
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Fig 2Modeling with uniform σ values.
Average GDT-HA (A) and lDDT (B) scores of the AS models as a function of the uniform σ value (ranging from 0.01 to 7.0 Å) applied to their HDDRs. The horizontal dashed lines represent the average scores obtained with the original σ values.
Fig 3The use of optimal parameters for HDDRs improves 3D modeling quality.
(A) and (B) GDT-HA and lDDT scores of the AS models built with σ (reported on the x-axis) and with optimal |Δd| (y-axis) values. (C) and (D) GDT-HA and lDDT scores for the AM models obtained with MODELLER-generated (x-axis) and optimal (y-axis) HDDRs.
Fig 1Distribution of |Δd| and σ values.
Distributions of the |Δd| (A) and σ (B) values observed in the AS models for the four HDDR groups of MODELLER. Beside the names of the restraints groups, their mean values are reported.
Fig 5Effect of |Δd| perturbation on 3D modeling.
(A) and (B) Average GDT-HA and lDDT scores of the AS models as a function of their average Cα-Cα PCC values (see the “Methods” section). (C) and (D) Similar data obtained for the multiple-templates AM models. In (A) through (D), the dashed horizontal lines represent the average quality scores obtained by the default MODELLER.
Fig 6Average quality scores of the models of the analysis set as a function of the w with which the DOPE potential has been included in the objective function of MODELLER.
(A) to (C) Quality scores of the AS models. (D) to (F) Quality scores of the AM models. (A) through (F) The horizontal dashed lines correspond to the scores obtained when modeling with MODELLER-generated (blue color) or optimal (orange) HDDRs without the use of DOPE.
3D modeling qualities of the AS single-template models built with optimal HDDRs and alignments.
| Strategy | GDT-HA | lDDT | MolProbity score |
|---|---|---|---|
| MODELLER | 0.6014 (-) | 0.6563 (-) | 3.0104 (-) |
| OPTIMAL | 0.6377 (+6.0%) | 0.6842 (+4.2%) | 3.0311 (+0.7%) |
| MODELLER-SLOW | 0.6036 (+0.4%) | 0.6594 (+0.5%) | 2.8512 (-5.3%) |
| OPTIMAL-SLOW | 0.6377 (+6.0%) | 0.6853 (+4.4%) | 2.9039 (-3.5%) |
| MODELLER-TMalign | 0.6383 (+6.1%) | 0.6951 (+5.9%) | 3.0411 (+1.0%) |
| OPTIMAL-TMalign | 0.6805 (+13.2%) | 0.7259 (+10.6%) | 3.0870 (+2.5%) |
The “GDT-HA”, “lDDT” and “MolProbity score” columns report the average values for those metrics. Percent improvements are computed with respect to the scores of the default MODELLER (first row).
aThe “MODELLER” prefix indicates that the strategy employs HDDRs generated by MODELLER.
bThe “OPTIMAL” prefix indicates the use of optimal HDDRs.
cThe “SLOW” suffix indicates the use of the slow MDSA protocol instead of the default very_fast one.
dThe “TMalign” prefix indicates the use of target-template alignment built through TM-align.
*Asterisks denote a statistically significant difference (according to a Wilcoxon signed-rank test with a significance level of 0.05) between the scores of a strategy and the scores of the default MODELLER. See for a full list of the numerical p-values.
3D modeling qualities of the AM multiple-template models built with optimal HDDRs and alignments.
| Strategy | GDT-HA | lDDT | MolProbity score |
|---|---|---|---|
| MODELLER | 0.6287 (-) | 0.6819 (-) | 3.0725 (-) |
| OPTIMAL | 0.8733 (+38.9%) | 0.8106 (+18.9%) | 3.1478 (+2.4%) |
| MODELLER-SLOW | 0.6310 (+0.4%) | 0.6850 (+0.5%) | 2.9143 (-5.2%) |
| OPTIMAL-SLOW | 0.8747 (+39.1%) | 0.8133 (+19.3%) | 3.0475 (-0.8%) |
| OPTIMAL-U | 0.7438 (+18.3%) | 0.7427 (+8.9%) | 3.1744 (+3.3%) |
| MODELLER-ST | 0.6168 (-1.9%) | 0.6683 (-2.0%) | 3.0231 (-1.6%) |
| OPTIMAL-ST | 0.6557 (+4.3%) | 0.6986 (+2.5%) | 3.0398 (-1.1%) |
| MODELLER-TMalign | 0.6645 (+5.7%) | 0.7165 (+5.1%) | 3.0529 (-0.6%) |
| OPTIMAL-TMalign | 0.9222 (+46.7%) | 0.8498 (+24.6%) | 3.1044 (+1.0%) |
See for the description of contents, columns and most modeling strategies names.
aThe “U” suffix indicates the use of uniform template weights for multiple-template HDDRs.
bThe “ST” suffix indicates that only the top template for each target was used (thus resulting in single-template modeling).
*Asterisks denote a statistically significant difference (according to a Wilcoxon signed-rank test with a significance level of 0.05) between the scores of a strategy and the scores of the default MODELLER. See for a full list of the numerical p-values.
3D modeling qualities of the AS single-template models built by including DOPE in the objective function of MODELLER.
| Strategy | GDT-HA | lDDT | MolProbity score |
|---|---|---|---|
| MODELLER | 0.6014 (-) | 0.6563 (-) | 3.0104 (-) |
| OPTIMAL | 0.6377 (+6.0%) | 0.6842 (+4.2%) | 3.0311 (+0.7%) |
| MODELLER-DOPE-0.5 | 0.6089 (+1.3%) | 0.6692 (+2.0%) | 2.1138 (-29.8%) |
| MODELLER-SLOW-DOPE-0.5 | 0.6112 (+1.6%) | 0.6746 (+2.8%) | 2.0344 (-32.4%) |
| MODELLER-DOPE-3.5 | 0.5631 (-6.4%) | 0.6397 (-2.5%) | 2.9977 (-0.4%) |
| OPTIMAL-DOPE-0.5 | 0.6549 (+8.9%) | 0.7029 (+7.1%) | 2.2960 (-23.7%) |
| OPTIMAL-DOPE-3.5 | 0.6885 (+14.5%) | 0.7158 (+9.1%) | 2.6280 (-12.7%) |
See for the description of contents, columns and most modeling strategies names.
aThe “DOPE-X.X” suffix indicates the use of DOPE with a w of X.X.
*Asterisks denote a statistically significant difference (according to a Wilcoxon signed-rank test with a significance level of 0.05) between the scores of a strategy and the scores of the default MODELLER. See for a full list of the numerical p-values.
3D modeling qualities of the AM multiple-template models built by including DOPE in the objective function of MODELLER.
| Strategy | GDT-HA | lDDT | MolProbity score |
|---|---|---|---|
| MODELLER | 0.6287 (-) | 0.6819 (-) | 3.0725 (-) |
| OPTIMAL | 0.8733 (+38.9%) | 0.8106 (+18.9%) | 3.1478 (+2.4%) |
| MODELLER-DOPE-0.5 | 0.6327 (+0.6%) | 0.6926 (+1.6%) | 2.2086 (-28.1%) |
| MODELLER-SLOW-DOPE-0.5 | 0.6347 (+1.0%) | 0.6971 (+2.2%) | 2.1152 (-31.2%) |
| MODELLER-DOPE-3.5 | 0.5646 (-10.2%) | 0.6453 (-5.4%) | 3.1267 (+1.8%) |
| OPTIMAL-DOPE-0.5 | 0.8736 (+39.0%) | 0.8229 (+20.7%) | 2.5635 (-16.6%) |
| OPTIMAL-DOPE-3.5 | 0.8519 (+35.5%) | 0.8061 (+18.2%) | 2.7520 (-10.4%) |
See for the description of contents, columns and most modeling strategies names.
*Asterisks denote a statistically significant difference (according to a Wilcoxon signed-rank test with a significance level of 0.05) between the scores of a strategy and the scores of the default MODELLER. See for a full list of the numerical p-values.