| Literature DB >> 35423805 |
Xiaoyong Cao1, Pu Tian1,2.
Abstract
Free energy is arguably the most important property of molecular systems. Despite great progress in both its efficient estimation by scoring functions/potentials and more rigorous computation based on extensive sampling, we remain far from accurately predicting and manipulating biomolecular structures and their interactions. There are fundamental limitations, including accuracy of interaction description and difficulty of sampling in high dimensional space, to be tackled. Computational graph underlies major artificial intelligence platforms and is proven to facilitate training, optimization and learning. Combining autodifferentiation, coordinates transformation and generalized solvation free energy theory, we construct a computational graph infrastructure to realize seamless integration of fully trainable local free energy landscape with end to end differentiable iterative free energy optimization. This new framework drastically improves efficiency by replacing local sampling with differentiation. Its specific implementation in protein structure refinement achieves superb efficiency and competitive accuracy when compared with state of the art all-atom mainstream methods. This journal is © The Royal Society of Chemistry.Entities:
Year: 2021 PMID: 35423805 PMCID: PMC8697515 DOI: 10.1039/d1ra01455b
Source DB: PubMed Journal: RSC Adv ISSN: 2046-2069 Impact factor: 3.361
Fig. 2Schematic representation of GSFE-refinement training and optimization. (A) Illustration of LFEL training based on LMLA-GSFE (see ref. 8 for details). (B) Illustration of the fully end-to-end differentiable optimization pipeline. Small solid black arrows represent forward pass computation of approximate free energy, small solid red arrows represent BP operation for taking derivatives. The empty arrow represents input of decoys. Final result is delivered as output after a preset iteration number N.
Fig. 1Illustration of implicit mediated global correlations and effective larger cutoff in GSFE-refinement. Two dashed circles are two “local” region for LFEL centered at solute unit B and D respectively. As a comprising unit of LFELs centered on both unit B and D, unit C experience effective force from unit A as mediated by B, and effective force from E as mediated by D. In fact, each unit experience effective forces mediated by LFELs defined by all of its solvent unit, resulting in an effective large cutoff that is approximately two times the radius of dashed circle for defining LFEL as shown. All mediated global correlations in essence is the equality of shared states for overlapping DOFs belong to different LFELs, since only one set of coordinates is used in the whole optimization process, these correlations are naturally maintained.
Summary for the best of top 5 models with various combinations for LR/λ/W on 3DRobot dataset
| LR/ | Avg-ΔGDT-HA | GDT-HA-num | Avg-ΔRMSD | RMSD-num |
|---|---|---|---|---|
| 0.001/0/0 | −1.38 | 95/322 | 0.0022 | 130/322 |
| 0.0005/0/0 | −0.18 | 134/322 | −0.0175 | 158/322 |
| 0.0001/0/0 | 0.2 | 182/322 | −0.0099 | 201/322 |
| 0.0005/0.1/0 | −0.19 | 137/322 | −0.0168 | 166/322 |
| 0.0005/1.2/0 | −0.02 | 162/322 | −0.0147 | 188/322 |
| 0.0005/10.0/0 | 0.24 | 211/322 | −0.0167 | 257/322 |
| 0.0005/0.1/1 | 0.02 | 163/322 | −0.0136 | 191/322 |
| 0.0005/1.2/1 | 0.26 | 214/322 | −0.0178 | 275/322 |
| 0.0005/10.0/1 | 0.27 | 210/322 | −0.019 | 291/322 |
LR is the learning rate, λ is the coefficient of smooth_l1 loss for conformation restraints, W is the AA weight (see eqn (5) and (6), with 1 represents on and 0 represents off).
The average value of ΔGDT-HA for all decoys.
The average number of decoys with ΔGDT-HA > 0 for all 36 decoy sets.
The average value of ΔRMSD for all decoys.
The average number of decoys with ΔRMSD < 0 for all 36 decoy sets.
Fig. 3Box plots for ΔGDT-HA with different LR/λ/W combinations for 3DRobot dataset. Effects of variation are exhibited for (A) learning rates, (B) structural restraints and (C) weights for approximating local priors. More box plots of ΔRMSD and ΔGDT-HA are available in ESI (Fig. S1–S3).†
Fig. 4Scatter plots of AΔGDT-HA as a function of start GDT-HA score for best of top 5 models from GSFE-refinement for 3DRobot dataset. Corresponding LR/λ/W combination is noted on top of each plot. Effects of variation are exhibited for (A–C) learning rates; (D–F) structural restraints and (G–I) weights for approximating local priors. More scattered plots of ΔRMSD and ΔGDT-HA are available in ESI (Fig. S4–S6).†
Summary of GSFE-refinement and other refinement methods on the 150-target refineD dataset (results for other methods are taken from ref. 22)a
| Method | Avg. top 1 | Avg. best of 5 | GDT-HA-num |
|---|---|---|---|
| refineD-C | 0.6365 | 1.3109 | 121/150 |
| refineD-NC | −1.2403 | 1.5343 | 104/150 |
| FG-MD | 0.5597 | 0.5597 | — |
| FastRelax | −3.4317 | −0.1999 | — |
| FastRelax-0.5 Å | −0.3411 | 0.8811 | 90/150 |
| FastRelax-2.0 Å | −1.2120 | 0.8223 | 77/150 |
| FastRelax-4.0 Å | −2.5471 | 0.0751 | 67/150 |
| ModRefiner-0 | −0.8400 | −0.8400 | — |
| ModRefiner-100 | 0.1491 | 0.1491 | — |
| GSFE-refinement | 0.0800 | 0.4400 | 112/150 |
More details for top 1 and the best of top 5 models for refineD data set are available in ESI (Tables S1 and S2).
Fig. 5ΔGDT-HA and ΔRMSD of best of top 5 models as a function of start GDT-HA score obtained from GSFE-refinement of CASP11 (A and B) and CASP12 (C and D) datasets. Corresponding plots for top 1 models are presented in ESI (Fig. S7).†
Fig. 6GSFE-refinement performance in CASP14. Success (percentage of improved targets for selected indicators) rate of GSFE-refinement and CASP14 average are shown. (A) Distribution of ΔGDT-HA. (B) Distribution of ΔRMSD.