| Literature DB >> 32956350 |
Ryan E Pavlovicz1,2, Hahnbeom Park1,2, Frank DiMaio1,2.
Abstract
Highly coordinated water molecules are frequently an integral part of protein-protein and protein-ligand interfaces. We introduce an updated energy model that efficiently captures the energetic effects of these ordered water molecules on the surfaces of proteins. A two-stage method is developed in which polar groups arranged in geometries suitable for water placement are first identified, then a modified Monte Carlo simulation allows highly coordinated waters to be placed on the surface of a protein while simultaneously sampling amino acid side chain orientations. This "semi-explicit" water model is implemented in Rosetta and is suitable for both structure prediction and protein design. We show that our new approach and energy model yield significant improvements in native structure recovery of protein-protein and protein-ligand docking discrimination tests.Entities:
Mesh:
Substances:
Year: 2020 PMID: 32956350 PMCID: PMC7529342 DOI: 10.1371/journal.pcbi.1008103
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Fig 1Implicit and explicit treatment of water In Rosetta.
Implicit water score function potentials, panels A-D. Potential plots were generated by orienting the N-H and C = O groups of two ALA residues along the same axis with a H—O distance of 1.3 Å (origin). The donor residue is then shifted +/- 7 Å to generate a planar cut of the solvation potentials between the N and O atoms. All plots have units of kcal/mol[13, 14]. (A) fa_sol term: isotropic desolvation penalty implemented in Rosetta using the Lazaridis-Karplus model. (B) lk_ball term: anisotropic correction for polar atom types, first introduced into the REF2015 score function. (C) lk_bridge term: anisotropic solvation reward introduced into the Rosetta-ICO score function. (D) Composite of panels A-C, using the finalized Rosetta-ICO score term weights. Explicit water placement with Initial possible solvation sites (blue) are based on statistics of water positions around backbone polar atoms in addition to sites around side chain polar atoms considering all possible non-clashing rotamers. Pictured is the interface of PDB ID 1P57, between the N-terminal (pink) and catalytic (teal) domains of hepsin, with crystallographic waters in transparent grey. (F) After an initial stage of Monte Carlo packing of both the possible water sites and surrounding protein side chains, a cutoff is applied based on the water occupancy of each site over the simulation (blue = 0% occupancy, green = 25%, red = 50%). (G) Remaining water sites are clustered, and a second cumulative dwell time cutoff is applied. (H) The final predicted water sites are converted into three-atom water molecules and the orientation is reoptimized together with nearby sidechain conformations using the Rosetta all-atom energy function.
Classification of predicted native waters (test set of 123).
| Type | Subset Size | % recovered | % precision |
|---|---|---|---|
| All | 2815 | 17.7 (0.08) | 17.7 (0.08) |
| Exposed | 630 | 6.0 (0.13) | 4.7 (0.1) |
| Partially Buried | 1803 | 19.5 (0.39) | 21.7 (0.5) |
| Buried | 382 | 28.3 (1.19) | 27.5 (1.3) |
| 1 protein coord | 770 | 6.3 (0.12) | 5.0 (0.2) |
| 2 protein coord | 1077 | 27.2 (0.24) | 25.3 (0.3) |
| 3 protein coord | 399 | 31.8 (0.43) | 26.2 (0.4) |
| BB only | 330 | 50.0 (1.24) | 23.1 (0.4) |
| SC only | 333 | 7.8 (0.65) | 18.1 (1.1) |
| BB+SC | 440 | 27.6 (0.18) | 26.6 (0.3) |
1Three groups of categorization of type of predicted water molecules. First, waters are classified ‘buriedness’ based on number of amino acid neighbors (nCβ) with Cβ within 10 Å. Exposed: nCβ < = 15; partially buried: 15 < nCβ < = 25; buried: nCβ > 25. Second, classification by 1, 2, or 3 protein coordination partners within 3.2 Å. Finally, by type of coordinating protein atoms with 3.2 Å of the water O atom: at least two backbone only (BB only), side chain only (SC only) or a mix of backbone and side chain coordination (BB+SC).
2-3Percent of specific types of waters recovered using recovery criteria described in Methods, averaged over three runs with standard deviations in parentheses.
Performance of solvation schemes on protein-protein and protein-small molecule docking discrimination.
| discrimination score | 0.749 ± 0.003 | 0.807 ± 0.002 | 0.873 ± 0.003 |
| percent correct | 77.1 ± 2.1 | 77.8 ± 1.8 | 94.1 ± 1.1 |
| run time | 1.00 | 1.09 | 1.52 |
| Protein-protein | |||
| discrimination score | 0.628 ± 0.014 | 0.739 ± 0.006 | 0.794 ± 0.004 |
| percent correct | 63.6 ± 0.9 | 74.9 ± 0.9 | 79.9 ± 2.3 |
| normalized | 1.00 | 1.25 | 2.59 |
1Implicit consideration of coordinated water molecules
2Inclusion of well-ordered explicit water molecules
3Reported are the average Boltzmann-weighted discrimination scores ± 1σ averaged over three independent runs for 46 protein-ligand and 53 protein-protein docking cases
4The percentage of cases in which the lowest scoring model is within 1.0 Å of the native conformation for protein-ligand docking and 2.0 Å for protein-protein docking, averaged over 3 independent runs
5Run time, normalized to baseline, is the sum of individual run times to calculate ΔGbind for each near-native and decoy conformation
Fig 2Protein-protein docking results.
(A) Scatter plot comparing results of 53 cases between REF2015 and Rosetta-ECO. Values are the average Boltzmann-weighted discrimination score ± 1σ from three independent runs. (B) Energy funnels for PDB ID 1E6E, adrenodoxin reductase bound to adrenodoxin (red data point in 2A), plotting computed ΔGbind vs. RMSD from the native binding conformation for three different scoring methods. Discrimination scores for each distribution are noted in bottom right of each plot. (C) Explicitly solvated near-native docking pose (RMSD = 0.14 Å; pink data point in 2B) with the reductase in grey and adrenodoxin in rainbow (N- to C-terminus colored blue to red). (D) Coordination of some predicted interface waters.
Fig 3Protein-ligand docking results.
(A) Scatter plot comparing results of 46 cases between baseline (REF2015) and Rosetta-ECO. Values are the Boltzmann-weighted discrimination score ± 1σ from an average of three independent runs. (B) Energy funnels, similar to Fig 2, for PDB ID 1X8X, tyrosyl t-RNA synthase bound to tyrosine (red data point in 3A) C. Explicitly-solvated, near-native docking pose in pink (RMSD = 0.43 Å; pink data point in 3B) with native ligand in transparent blue. (D) Explicitly-solvated decoy binding pose (RMSD = 6.57 Å; yellow data point in 3B). (E-H) A comparison of recovered waters (red) to high-resolution crystallographic waters (green spheres) from PDB ID: 1N2J (Panels E-G) and PDB ID: 1U4D (Panel H).