Literature DB >> 35936397

A Closed-Form, Analytical Approximation for Apparent Surface Charge and Electric Field of Molecules.

Dan E Folescu1, Alexey V Onufriev1,2,3.   

Abstract

Closed-form, analytical approximations for electrostatic properties of molecules are of unique value as these can provide computational speed, versatility, and physical insight. Here, we have derived a simple, closed-form formula for the apparent surface charge (ASC) as well as for the electric field generated by a molecular charge distribution in aqueous solution. The approximation, with no fitted parameters, was tested against numerical solutions of the Poisson equation, where it has produced a significant speed-up. For neutral small molecules, the hydration free energies estimated from the closed-form ASC formula are within 0.8 kcal/mol RMSD from the numerical Poisson reference; the electric field at the surface is in quantitative agreement with the reference. Performance of the approximation was also tested on larger structures, including a protein, a DNA fragment, and a viral receptor-target complex. For all structures tested, a near-quantitative agreement with the numerical Poisson reference was achieved, except in regions of high negative curvature, where the new approximation is still qualitatively correct. A unique efficiency feature of the proposed "source-based″ closed-form approximation is that the ASC and electric field can be estimated individually at any point or surface patch, without the need to obtain the full global solution. An open-source software implementation of the method is available: https://people.cs.vt.edu/~onufriev/CODES/aasc.zip.
© 2022 The Authors. Published by American Chemical Society.

Entities:  

Year:  2022        PMID: 35936397      PMCID: PMC9352323          DOI: 10.1021/acsomega.2c01484

Source DB:  PubMed          Journal:  ACS Omega        ISSN: 2470-1343


Introduction

Accurate and efficient modeling of solvation effects at the atomistic level is a critical component of modern efforts to understand molecular structure and function.[1−5] Analysis and visualization of electrostatic properties of biomolecules, including the electric field and surface charge generated by the molecular charge distribution, have made an impact on qualitative reasoning about biomolecules.[1,6] There are two broad approaches to the modeling of molecular electrostatics and solvation effects: explicit and implicit solvation methods.[7] Arguably the most widely used model of solvation is that for which individual solvent molecules are treated explicitly, on the same footing with the target molecule. However, accuracy of the explicit solvent representation comes at a high price computationally, limiting the practical utility of atomistic simulations in many areas. The implicit, continuum solvation approach—treating solvent as a continuum with the dielectric and nonpolar properties of water—can offer much greater effective simulation speeds compared to the explicit solvent models.[8−18] The Poisson equation[9,10,19−23] of classical electrostatics[24] provides an exact formalism—within the continuum, local, linear-response dielectric approximation of solvent in the absence of mobile ions—for computing the electrostatic potential V(r) produced by a molecular charge distribution ρ(r) characterizing the solutewhere ϵ(r) is the dielectric constant. Once V(r) is obtained, the electrostatic part of the solvation free energy is easily computed.[24] The problem of finding V(r) is mathematically equivalent[25,26] to finding a continuous charge density, σ, on the dielectric boundary (DB), such thatwhere ρ(r) is the discrete charge distribution, formed by n point charges q1, ···, q, and σ(s) is the apparent surface charge (ASC) associated with each surface patch s. The second term in eq represents the so-called reaction field potential.[27−29] Conceptually, once the ASC, σ(s), is found, all of the solvation effects, at the level of the Poisson equation, can be computed. The reformulation of the Poisson problem via eq has a number of technical advantages made apparent over the years, especially in quantum mechanical (QM) calculations.[8,32] A great variety of practical widely used methods, including multiple modern derivatives of PCM[25] and COSMO,[30] utilize this general idea—the apparent surface charge (ASC) formalism, see refs (31, 32) for comprehensive reviews. A number of ASC-based methods yield numerically exact solutions to the Poisson equation, in the sense that the exact solution can, at least in principle, be approximated to arbitrary precision. Formally exact linear-scaling implementations of numerical ASC methods based on a conjugate gradient or domain decomposition exist.[32] Concerns related to the computational cost of numerically exact approaches[31] have led to the development of approximate ASC-based methods, such as the widely used COSMO,[30] GCOSMO,[33] and C-PCM.[34] These methods rely on approximations to eq . Still, even these approximate ASC-based methods employ a significant numerical component, such as numerical matrix inversion, which may carry appreciable computational overhead, especially as the structure size grows.[35,36] Therefore, in applications where computational efficiency and algorithmic simplicity is paramount, numerical ASC-based methods may not be as competitive as approximations to the Poisson equation based purely on closed-form analytical expressions.[27] Among the fully analytical approximations to the Poisson equation, the generalized Born (GB) model[8,13,37−62] is arguably the most widely used, especially in atomistic simulations.[63−70] However, despite its multiple documented success stories,[71] the GB model does not have the versatility of eq and the associated benefits of an ASC-based formulation of biomolecular electrostatics. Here, we aim to fill the gap by deriving an analytical, closed-form approximation to the Poisson equation for the ASC and the (normal) electric field around an arbitrarily shaped molecule. Standard numerical solutions of the Poisson equation are used as the reference. We refrain from comparisons with well-established, optimized numerical implementations of ASC methods in this initial investigation. The outline of the paper is as follows: Section describes computational testing materials and methodology. Section is focused on our analytical ASC approximation: we first derive an exact analytical ASC reference on a sphere (3.1) and present our approximate form of the ASC for arbitrary molecular geometries (3.2). We test the approximate ASC against the exact analytical ASC reference using a spherical test case (3.3), which simulates relevant electrostatic configurations that we will encounter in later sections. Application of our model to solvation energy calculations is presented in Section . Numerical performance and accuracy analysis along with an example applications are presented in Section , where we first examine the computational speed of our method (4.1) before testing its accuracy on small and large molecules (4.2.1 and 4.2.2). Finally, we showcase our analytical ASC approximation on a presently relevant biomolecular complex: that of the human ACE2 receptor and SARS-CoV-2 spike glycoprotein (4.3).

Methods

Structures

For testing, a set of 173 neutral small molecules from version 0.52 of the FreeSolv database[72,73] was used. The original set of nearly 650 molecules was narrowed to include only those molecules containing hydrogen, oxygen, nitrogen, and carbon atoms. The small molecules under consideration are all rigid—having small conformational variability as seen in molecular dynamic (MD) simulations.[74] The choice of rigid molecules has allowed us to focus on the physics of solvation while mitigating the uncertainty related to conformational sampling. ambpdb76 was used to generate PQR format files from AMBER format coordinate and topology files.[72] Additionally, two larger biomolecules were used: a 25 bp poly-A B′-form dsDNA[6] and the hen-egg lysozyme (PDB: 2LZT).[76] Our method was also tested on a portion of the ACE2/SARS-CoV-2 complex (PDB: 6M0J)[77] receptor-binding domain (RBD). 6M0J RBD residues were determined through A/E chain contacts within 3.8 Å. In each chain, residues within 1.5 Å of the contacts were also included. The H++ server[78] was used to generate protonated PQR format files. PQR format files for small molecules and the SARS-CoV-2 complex along with RBD contact residue lists are provided in the accompanying code package.

Dielectric Boundary Representations

Just like the numerical Poisson solvers, ASC-based methods rely on molecular boundary representations.[79−81] These representations of the actual molecular shape are crucial to the accuracy and, to some extent, the efficiency of modern implicit solvation models. The question of which definition of the dielectric boundary (DB) is most appropriate is nontrivial; there is no universal default.[82−84] Here, our main goal is to assess the accuracy of the new ASC approximation against its primary accuracy metric, the numerical Poisson–Boltzmann (NPB). For this purpose, both methods can utilize whichever DB one considers most appropriate for the given application. Our secondary goal is to compare the new ASC approximation to a fast analytical GB model, which dictates the choice of the DB representation chosen here. Within our ASC method, we approximate the solute–solvent interface—the DB—using the solvent excluded surface (SES),[79,85] with Bondi[86] atomic radii and a water probe of 1.4 Å. This DB is triangulated with the open-source package NanoShaper.[80] In each relevant test, the NanoShaper grid spacing was matched with the grid spacing used by the NPB reference; 0.1 Å was used for small-molecule accuracy comparisons, 0.25 Å for fair speed comparisons, and 0.5 Å for large-molecule electric field normal comparisons. The two existing reference methods employed in this work, the NBP and GB (see Sections and 2.4), utilized matching dielectric boundaries based on the same set of Bondi radii with a 1.4 Å water probe radius (unless otherwise specified), for consistency.

Numerical Poisson–Boltzmann Reference

For NPB reference calculations, the macroscopic electrostatics with atomic detail (MEAD) package[87] was used. MEAD is a volumetric, finite-difference solver that can compute potential maps and hydration energies, utilizing an SES DB representation. The package was chosen primarily because it interfaces well with the visualization and analysis utility GEM,[36] used here to process the NPB-generated potential maps. Using GEM, visualization of the electric field around the solute can be achieved at any given distance from the DB. Convergence[23] of MEAD-generated hydration free energies was verified (Figure ), supporting its usage as a reference when a fine enough grid resolution was taken. A modern, 2nd-order method MIBPB[88] was used as the accuracy reference for MEAD.
Figure 1

Convergence of hydration free energies computed by MEAD. The convergence is assessed against reference values computed by MIBPB,[22] as RMSD over our small molecule set. MIBPB hydration free energies are computed with a 1.4 Å water probe radius at a 0.1 Å grid spacing. MEAD hydration free energies are computed with identical parameters, except for the variable grid spacing. MIBPB small-molecule electrostatic hydration free energies ranged from −0.05 to −14.37 kcal/mol, with an average magnitude of 6.68 kcal/mol.

Convergence of hydration free energies computed by MEAD. The convergence is assessed against reference values computed by MIBPB,[22] as RMSD over our small molecule set. MIBPB hydration free energies are computed with a 1.4 Å water probe radius at a 0.1 Å grid spacing. MEAD hydration free energies are computed with identical parameters, except for the variable grid spacing. MIBPB small-molecule electrostatic hydration free energies ranged from −0.05 to −14.37 kcal/mol, with an average magnitude of 6.68 kcal/mol. Based in part on the convergence analysis, potential-map calculations were performed at a grid spacing of 0.1 Å and hydration energy calculations at an inner and outer bounding-box grid spacing of 0.1 and 0.5 Å, respectively, for small-molecule accuracy comparisons. NPB reference solvation energy calculations utilize inner and outer dielectrics of 1 and 80, respectively. The water probe radius is 1.4 Å, unless otherwise stated. For fair speed comparisons, potential-map calculations were performed at a grid spacing of 0.25 Å and hydration energy calculations at a single bounding-box grid spacing of 0.25 Å, considered standard for finite-difference NPB calculations. For larger molecules, potential-map calculations were performed at a grid spacing of 0.5 Å and hydration energy calculations at an inner and outer bounding-box grid spacing of 0.5 Å and 1.0 Å, respectively. The total number of NPB grid points was determined by setting a volumetric bounding-box side length slightly larger (+1 Å) than the maximal intramolecular distance. The electric field normal at a point r was numerically approximated by a two-point stencil:where r + hn̂ and r – hn̂ are two sampling point distance ±h from the DB along the surface normal n̂; see ref (36) for additional details of the sampling protocol. Here, h was chosen to minimize the distance between sampled points, while still being large enough so that the sampled points are distinct. Notice that if h were too small, the sampling protocol would sample the exact same points of the cubic lattice used in the NPB reference calculations. The largest possible distance between two such grid points is the diagonal of the cubic grid, which determines the minimum h. To illustrate this numerical constraint, h must be larger than Å for members of the small molecule dataset, whose potential grids were computed with the NPB reference at a grid spacing of 0.1 Å. Additionally, to avoid numerical artifacts of the NPB reference near the DB and mitigate possible effects of minor differences between the internal representations of the SES computed by our NPB reference and NanoShaper, the field computed a distance p > h from the DB (Figure ): doing so ensures that we do not accidentally sample grid points inside the molecule. The need to use a non-zero projection distance in the NPB calculations makes it necessary to consider the electric field normal values near the DB as the numerical reference for assessing the accuracy of the analytical ASC, rather than the apparent surface charge itself (which, up to a prefactor, is essentially the normal component of the field right at the DB, see eq ).
Figure 4

Two dielectric problem for an arbitrary molecule S with smooth boundary ∂S. The boundary separates the inner (blue) and outer (white) dielectric regions, with constants ϵ and ϵ, respectively. For a non-spherical DB, the distance r from the spherical center, Figure a, to the sampling point r is replaced with the generalized expression r = A + p. Here, A is the so-called electrostatic size of S, which characterizes the dimensions of the molecule,[91] and p denotes the projection distance from r to ∂S along the surface normal, n⃗. q is the source charge under consideration, with d denoting the distance between q and r. When p > 0, the electric field normal E(r) of S at r is computed via eq . When p = 0, the ASC, σ(r), is computed via eq .

Generalized-Born Solvation Free Energy Reference

The IGB5[51] GB model from AMBER[75] was utilized. This model was parameterized against reference NPB hydration free energies calculated based on the SES surface, Bondi radii, and a 1.4 Å water probe radius.

Accuracy Metrics Used

The ASC approximation was tested against the exact Poisson–Boltzmann (EPB, see Section ) reference, with additional tests undertaken against the NPB and GB references. Our comparison with the EPB reference was used directly for the apparent surface charge, employing the two test charge configuration in Figure b. Per-vertex electric field normal values were compared against the NPB reference, averaging over each vertex in a given biomolecule, and over each biomolecule in the comparison set. Electrostatic hydration free energies were compared against the NPB and GB references with inner and outer dielectrics ϵ = 1 and ϵ = 80, unless otherwise specified. All results will be in e/Å2 for apparent surface charge, kcal/( mol · e · Å) for electric field normals, and kcal/mol for electrostatic hydration free energies.
Figure 2

Geometry and charge settings for the Kirkwood multipolar expansion(a) and for the two-charge test case (b). These notations and geometries are used in eqs and 6. In each case, a perfectly spherical, sharp DB is utilized, with ϵ and ϵ denoting inner and outer dielectric constants, respectively. The angle θ (resp. π – θ) is subtended by the lines connecting the point of observation, r, and the charge(s) to the spherical center. In panel (a), a single point charge, q, is located r away from the center of a spherical boundary with radius A. In panel (b), two point charges, q1 and q2, are located on the vertical diameter of the sphere. The charges are of equal distance, r1 = r2, from the spherical center. The electrostatic potential V(r) is computed at the point of observation r.

Geometry and charge settings for the Kirkwood multipolar expansion(a) and for the two-charge test case (b). These notations and geometries are used in eqs and 6. In each case, a perfectly spherical, sharp DB is utilized, with ϵ and ϵ denoting inner and outer dielectric constants, respectively. The angle θ (resp. π – θ) is subtended by the lines connecting the point of observation, r, and the charge(s) to the spherical center. In panel (a), a single point charge, q, is located r away from the center of a spherical boundary with radius A. In panel (b), two point charges, q1 and q2, are located on the vertical diameter of the sphere. The charges are of equal distance, r1 = r2, from the spherical center. The electrostatic potential V(r) is computed at the point of observation r.

Computer Specifications

All computations and visualizations were completed on a commodity desktop computer with an Intel Core i7 (or equivalent) processor, using a maximum of 32 GB of memory.

Theory and Results

Preliminaries: Analytical Poisson–Boltzmann Reference

In the context of implicit solvation, the simplest scenario is that of a solute with a sharp, spherical DB, Figure a. For such a spherical boundary, as in Figure a, Kirkwood[89] gave the exact, analytical solution of eq for the potential V at the DB due to a single charge q inside the boundary. Here, we use the solution valid on or exterior to the spherical DB,[92] without consideration for mobile ions. At r = A The apparent surface charge σ is related to the normal component of the electric field at (just outside) the boundary via[24,25] From this and eq , we obtain an exact, analytical expression for the apparent surface charge on the spherical DB:where the summation is over all of the enclosed charges. Equation will provide a key check for our analytical ASC approximation.

Convergence Analysis of σKW

The presence of the indexing term, (l + 1), is notable in its effects on the convergence characteristics of eq , by increasing the number of terms necessary to obtain a converged, accurate reference. As the ratio r/A approaches 1—that is, as the charge approaches the DB—slow convergence of the approximate solution manifests itself. In Figure b, we see that, even for , it is possible to achieve both qualitatively and quantitatively reasonable results. The analytical reference appears well-converged, Figure , for our purposes, at M = 30 terms. Hence, eq was numerically approximated by truncating to the first M = 30 terms, calling the resulting expression the essentially exact Poisson–Boltzmann (EPB) reference.
Figure 3

Apparent surface charge (ASC, per unit area) computed using the truncated, infinite series analytical solution of the Poisson problem on a sphere, eq . The convergence tests are conducted on the dual-positive test case, Figure b. The ASC is sampled at the spherical boundary A = 10 Å away from the center an angle θ from q1. (a) r1 = r2 = 6 Å; (b) r1 = r2 = 8 Å; see Figure b. Partial sums of the infinite series solution, eq , with M = 10, 20, 30, and 40 terms are examined. The corresponding truncated sums are shown with blue, orange, green, and red lines, respectively.

Apparent surface charge (ASC, per unit area) computed using the truncated, infinite series analytical solution of the Poisson problem on a sphere, eq . The convergence tests are conducted on the dual-positive test case, Figure b. The ASC is sampled at the spherical boundary A = 10 Å away from the center an angle θ from q1. (a) r1 = r2 = 6 Å; (b) r1 = r2 = 8 Å; see Figure b. Partial sums of the infinite series solution, eq , with M = 10, 20, 30, and 40 terms are examined. The corresponding truncated sums are shown with blue, orange, green, and red lines, respectively.

Main Result: Analytical Apparent Surface Charge

To derive our ASC approximation, we begin with the previously derived closed-form approximation for the electrostatic potential around (outside) an arbitrary molecular shape,[90] see Figure : Two dielectric problem for an arbitrary molecule S with smooth boundary ∂S. The boundary separates the inner (blue) and outer (white) dielectric regions, with constants ϵ and ϵ, respectively. For a non-spherical DB, the distance r from the spherical center, Figure a, to the sampling point r is replaced with the generalized expression r = A + p. Here, A is the so-called electrostatic size of S, which characterizes the dimensions of the molecule,[91] and p denotes the projection distance from r to ∂S along the surface normal, n⃗. q is the source charge under consideration, with d denoting the distance between q and r. When p > 0, the electric field normal E(r) of S at r is computed via eq . When p = 0, the ASC, σ(r), is computed via eq . One significant drawback of the approximation in eq that will be overcome here is that it only provides a simple expression for the electrostatic potential in the outside (solvent) space. Thus, the computation of many quantities of interest remain out of reach, e.g., of the solvation free energy, which requires knowledge of the potential inside the solute. To proceed, we utilize the polar orthonormal frame, , to take its derivative, for use in eq . The derivative vanishes in the direction of eθ, yielding Exploiting the geometry in Figure , we relate cos (φ) as a dot product of the surface unit normal, n̂, and the vector from E⊥(r) to q, which we denote d⃗: cos (φ) = (n̂ · d⃗)/d. Applying eq to eq 8 and summing over the charge distribution (Figure ), we arrive at:where we have made the substitution r = A + p described in Figure . At the dielectric boundary, p = 0, applying eq to eq gives the following closed-form, analytical approximation for the apparent surface charge: These two equations are the main analytical result of this work. Equations , 9, and 10 rely on approximating the term of eq as for l > 0, which allows the infinite Kirkwood series to be summed without truncation. That approach proves critical to both accuracy and computational efficiency[90] of the closed-form approximations, based on the Kirkwood solution. The value of α in eq was rigorously derived previously[92] to minimize RMSD to the exact Kirkwood solution for electrostatic potential, eq , assuming a random, discrete charge distribution inside a perfect sphere. Here, we have attempted to re-optimize the value of α in the context of eq , aiming at best agreement with the reference for hydration free energies of our small molecule set. The effort led to only a very minor improvement in accuracy (not shown), so we have decided to retain the original[92] α = 0.580127 for use in our ASC approximation.

A Self-Consistency Check

Arguably the simplest self-consistency check of analytical ASC is that the total surface charge produced by eq should be zero for any of the neutral small molecules making up our main test set: In the discrete DB case, as the triangulation density is increased, we expect the numerical approximation of the total charge integral, eq , to approach zero. Thus, the same simple check automatically tests both the analytical ASC and the DB discretization (triangulation) used. As seen from Figure , our ASC implementation follows the expected trend. We stress that the only purpose of this simple test is a “sanity check” of our code implementation; the accuracy of the derived approximation was tested thoroughly in the sequel.
Figure 5

Self-consistency and surface triangulation convergence check of our ASC method. The exact result for the total surface charge is zero; as NanoShaper grid density is increased, the molecular charge estimated via our implementation also tends to zero. An average over the entire set of small neutral molecules is shown. NanoShaper inverse grid spacings are given in Å, while average total molecular charges are given in |e|.

Self-consistency and surface triangulation convergence check of our ASC method. The exact result for the total surface charge is zero; as NanoShaper grid density is increased, the molecular charge estimated via our implementation also tends to zero. An average over the entire set of small neutral molecules is shown. NanoShaper inverse grid spacings are given in Å, while average total molecular charges are given in |e|.

Accuracy against the Analytical PB Reference

From Figure , our approximation matches the essentially exact Kirkwood solution (EPB), the truncation of eq , quite well on the two-charge test distribution, with only a slight, and expected, drop in accuracy closest to the point charges. For θ ∈ [0, π], an RMSD of 0.00124 and 0.00117 e/Å2 was achieved between our ASC approximation and the EPB reference, on the geometries described in panels (a) and (b) of Figure , respectively.
Figure 6

Apparent surface charge of our ASC approximation (dashed red) and the EPB reference (solid black) on the two-charge test distribution, shown in Figure b. (a) Surface charge due to two point charges, 1.5 Å away from the boundary of a dielectric sphere of radius 3 Å, with q1 = q2 = – 0.65. These specific parameters are intended to mimic two oxygen atoms in a small molecule, with respect to the distance between a highly charged “surface” atom and the DB. (b) The same charge distribution as in panel (a) but with q1 = – 0.65, q2 = + 0.65. Points were sampled from 0 to π in 0.0001 radian steps.

Apparent surface charge of our ASC approximation (dashed red) and the EPB reference (solid black) on the two-charge test distribution, shown in Figure b. (a) Surface charge due to two point charges, 1.5 Å away from the boundary of a dielectric sphere of radius 3 Å, with q1 = q2 = – 0.65. These specific parameters are intended to mimic two oxygen atoms in a small molecule, with respect to the distance between a highly charged “surface” atom and the DB. (b) The same charge distribution as in panel (a) but with q1 = – 0.65, q2 = + 0.65. Points were sampled from 0 to π in 0.0001 radian steps.

Electrostatic Solvation Free Energy with ASC

The apparent surface charge formulation allows us to gain insights into a variety of solvation effects,[28,93] including the estimate of hydration free energy, which we used extensively here to evaluate the accuracy of our new approach against the accepted reference. Within the implicit solvation framework,[8] the hydration free energy of a molecule is often approximated[94] as the sum of polar (electrostatic) and nonpolar components: Of the two components in eq , the electrostatic solvation free energy, ΔG, often contributes the most to the total in polar solvents such as water, especially for macromolecules. Highly approximate yet computationally efficient ways to estimate the nonpolar component, ΔG, are widely used; by comparison, ΔG is relatively expensive to estimate computationally with commonly used numerical methods such as NPB.[7] Here, our focus is ΔG. For a discrete charge density indexed by i, we can compute ΔG as[7]where V(r) and V(r) are the electrostatic potentials due to the given charge distribution in the solvent and in vacuum, respectively, sampled at each point charge q located at r. In the special case when the inner and outer dielectric constants, ϵ and ϵ, are equal to 1 and 80, respectively, we call ΔG the electrostatic hydration free energy. We use eqs and 13 to write Though eq is valid for any choice of DB, the surface integral is nontrivial to compute. We approximate the surface integral using a specific triangulation of the DB, see Section . This discrete representation approximates eq aswhere σ, A, and r are the apparent surface charge, area, and center of the triangle T (to express ΔG in kcal/mol, which is often convenient, eq is multiplied by 166, while using atomic units of length Å and charge |e|). The ASC on T is found by averaging the ASC at its comprising vertices. The triangular center is simply the centroid of T, with its area calculated using Heron’s formula:where a, b, and c are the side lengths of T.

Numerical Applications and Results

Analytical ASC Computational Speed

Here, we present general running time descriptions for each tested method, rather than exact time values. In this way, we can differentiate between each method, without worrying about particular optimizations and expert parameter setups that can be found across a variety of implementations.[95] In algorithmic time complexity, the three methods we compared in Table are very different. GB methods, such as the IGB5 reference, scaled quadratically in the number of atoms O(K2), while our method grew linearly O(KN) in the number of atoms K and surface elements N. Volumetric methods, similar to the NPB reference, scale cubically in the number of grid points per side of a corresponding bounding box, itself a function of grid density and the maximum intramolecular distance. The impact of these asymptotic time complexities can be clearly seen when we focus on hen-egg lysozyme (2LZT) and double-stranded DNA wall running times. Though the 2LZT structure has about 400 more atoms than the DNA structure, the intramolecular width of the DNA structure is almost double that of 2LZT. This means that the DNA structure has both a larger total surface area and requires a bigger volumetric bounding box; as seen in Table , we found longer running times for our ASC approximation and the NPB reference for 2LZT as compared to DNA, but not for the IGB5 reference. Hence, this contrasting algorithmic complexity affects computational timings between each model for structures of different sizes and characteristics. A principal consequence is the following: the best case scenario for the efficiency of our ASC approximation is for structures having many atoms but a comparatively low surface area; in terms of the derivation of our model, it is coincidental that in three dimensions, the shape maximizing total inner “volume” (number of atoms) and minimizing outer surface area is that of a sphere.
Table 1

Running Time Expectations for Computed Electrostatic Solvation Free Energiesa

methodsmall molecules (average of 16 atoms)2LZT (1958 atoms)DNA (1598 atoms)
IGB5(AMBER)milliseconds∼ a second∼ half a second
analytical ASC approximation∼100 mstens of seconds∼ half a minute
numerical PBtens of secondsminutestens of minutes

Times are given per-molecule (averaged over the entire set in the small molecule case).

Times are given per-molecule (averaged over the entire set in the small molecule case). Though the efficiency of our analytical ASC implementation is not at the level of the IGB5 reference, it occupies a different niche: its main purpose is the estimation of the ASC and the electric field. It is worth noting that the surface integration required for eq is trivially parallelizable since eqs and 10 can be computed independent of adjacent surface elements on the DB. This is in addition to the more fundamental parallelism present in eqs and 10, due to the independence of per-charge contributions to the electric field and ASC, respectively. Improvements in the efficiency of our implementation would greatly improve performance as surface resolution is increased, or in “worst-case” scenarios such as those seen in the DNA fragment, Table .

Analytical ASC Accuracy with Respect to the NPB Reference

Small Molecules

We first examine how our ASC approximation compares to the NPB reference qualitatively followed by multiple quantitative tests. For qualitative, visual comparisons, our motivation is the well-recognized utility[6,9,77] of visualizing electrostatic characteristics of molecules, including mapping them onto a molecular surface. Our main metric, in this section, is qualitative similarity, or the lack thereof, between visualizations produced by our analytical ASC method and the NPB reference. Figure exemplifies the qualitative match between our method and the NPB reference on a subset of the small molecule dataset. Apart from very small irregularities attributable to the discrete sampling of the NPB potential map, the electric field normals computed with our ASC approximation are, visually, almost indistinguishable from those computed with the NPB reference.
Figure 7

Electric field normals computed on a selection of small molecules by our ASC approximation (top row) and the NPB reference (bottom row), with visualization by GEM.[36] From left to right, the three molecules shown (chosen to represent various shapes) are 1,2-ethanediol (a,d), methane (b,e), and 1-butoxybutane (c,f). All calculations are made 0.7 Å from the DB, with a water probe radius of 1.4 Å. Our ASC approximation and the NPB reference use a 0.1 Å triangulation density/grid spacing. The color range for the analytical ASC and the NPB reference is the same for each vertical pair of panels.

Electric field normals computed on a selection of small molecules by our ASC approximation (top row) and the NPB reference (bottom row), with visualization by GEM.[36] From left to right, the three molecules shown (chosen to represent various shapes) are 1,2-ethanediol (a,d), methane (b,e), and 1-butoxybutane (c,f). All calculations are made 0.7 Å from the DB, with a water probe radius of 1.4 Å. Our ASC approximation and the NPB reference use a 0.1 Å triangulation density/grid spacing. The color range for the analytical ASC and the NPB reference is the same for each vertical pair of panels. The near quantitative agreement between the analytical ASC and the NPB reference, Figure is nontrivial, and goes beyond the visual match of the “reds” and the “blues” with the NPB reference. Notice that, when α is set to 0 in eqs and 9, the analytical ASC reduces to the so-called Coulomb field approximation (CFA), which is often used, and can be considered a null model here. Figure clearly indicates the decreased magnitude of CFA-generated electric field normals, when compared to our ASC approximation; on average, the CFA produced an electric field that was approximately 36% weaker than our analytical ASC method, which reproduces the NPB reference closely. The significant deviation of the CFA surface charge from the reference translates into its poor accuracy in the estimation of the hydration free energy, see Figure .
Figure 8

Electric field normals computed via our analytical ASC method (a), the Coulomb field approximation (CFA) (b), and the NPB reference (c) on a member of our small molecule dataset, pyrrole. While both the analytical ASC and the CFA identify the positive and negative surface charge patches, the CFA underestimates their intensity significantly. Between the analytical ASC method and the CFA, minimum, maximum, and average electric field normals on the DB are −0.441,0.314, and −0.030 kcal/(mol · e · Å) and −0.281,0.200, and −0.019 kcal/(mol · e · Å), respectively. The color range used to visualize the field is the same in all the panels.

Figure 9

Analytical ASC (red circles), the GB (green stars), and CFA (blue squares) hydration free energies of rigid small molecules relative to the NPB reference values. Linear regression lines are plotted using colors corresponding to the representative data points. The dotted blue line represents a perfect match between an approximation and the NPB reference. R2 values for IGB5, our analytical approximation, and the CFA are 0.932, 0.961, and 0.777, respectively. Hydration free energies are shown in kcal/mol.

Electric field normals computed via our analytical ASC method (a), the Coulomb field approximation (CFA) (b), and the NPB reference (c) on a member of our small molecule dataset, pyrrole. While both the analytical ASC and the CFA identify the positive and negative surface charge patches, the CFA underestimates their intensity significantly. Between the analytical ASC method and the CFA, minimum, maximum, and average electric field normals on the DB are −0.441,0.314, and −0.030 kcal/(mol · e · Å) and −0.281,0.200, and −0.019 kcal/(mol · e · Å), respectively. The color range used to visualize the field is the same in all the panels. Analytical ASC (red circles), the GB (green stars), and CFA (blue squares) hydration free energies of rigid small molecules relative to the NPB reference values. Linear regression lines are plotted using colors corresponding to the representative data points. The dotted blue line represents a perfect match between an approximation and the NPB reference. R2 values for IGB5, our analytical approximation, and the CFA are 0.932, 0.961, and 0.777, respectively. Hydration free energies are shown in kcal/mol.

Quantitative Assessment of ASC Accuracy

Next, we examine how our ASC approximation compares quantitatively to the NPB reference. Though electric field normals (or linearly related surface charges) are not the most intuitive accuracy metrics, these quantities are those that our method directly computes, and so a direct comparison with the NPB is in order. In Section , we discuss a physical interpretation of the deviation of these quantities from the reference. Relative to the NPB reference, our ASC approximation achieved an average RMSD and absolute difference of 0.14 and 0.11 kcal/( mol · e · Å) on calculated electric field normal values, respectively. Relative to direct comparisons of molecular ASC or its proxy, the normal electric field, tests featuring the calculation of electrostatic solvation free energies are valuable in the sense that they provide an intuitive accuracy metric, directly relevant to experiments. Here, IGB5[51] is an example of what can be expected from a very fast GB model on small molecule datasets,[96,97] in terms of accuracy and running time. We tested how our ASC approximation and the IGB5 method compare to calculation of electrostatic solvation free energies by the NPB reference. From our findings (see Table ), our ASC approximation had an approximately 20% reduced RMSD to NPB reference hydration free energies than the IGB5 reference. This accuracy gain is encouraging, particularly from the perspective of design: GB models are designed to analytically approximate the electrostatic solvation free energy obtained from solving the Poisson equation. That our model can estimate solvation energies more accurately than a widely used GB model suggests that the approximation of the ASC and electric field normals by the model is reasonably accurate to be considered for practical applications. Additionally, the results in Table give another encouraging conclusion, with respect to running times efficiencies. To achieve a deviation from the NPB reference of just slightly above kT, we do not require an overly fine triangulation density for the analytical ASC. When the grid resolution is set to what was used in the timing section (see Table ), RMSD against the NPB reference differs only in nonsignificant digits. Thus, our analytical ASC can achieve a very similar accuracy, without incurring a heavy 1–2 order of magnitude time penalty, as seen with the NPB reference at this fine grid resolution.
Table 2

Accuracy of Electrostatic Hydration Free Energies against the NPB Reference, Estimated by our Analytical ASC Approximation, IGB5, and the CFAa

methodRMSD to NPB reference (kcal/mol)
analytical ASC approximation0.77
GB (IGB5,AMBER)0.98
CFA2.73

Analytical ΔG is computed via eq , using the analytical ASC approximation given in eq . Small-molecule electrostatic hydration free energies range from −0.01 to −14.71 kcal/mol.

Analytical ΔG is computed via eq , using the analytical ASC approximation given in eq . Small-molecule electrostatic hydration free energies range from −0.01 to −14.71 kcal/mol.

Proteins and DNA

In analyzing the performance of our model on structures of increased size and complexity, we examine a fragment of double-stranded DNA and the hen-egg lysozyme. Structures of this type—with regions of the DB having deep, negative curvature “pockets”—present some of the toughest tests for our model due to certain theoretical considerations we touch on in section 4.2.3.2.

Double-Stranded DNA

First, we examined our analytical approximation on a double-stranded DNA fragment. Qualitatively, Figure shows that our analytical approximation reproduced the NPB reference quite well, although some discrepancies are clearly seen in the grooves, that is, in regions of high negative curvature.
Figure 10

Electric field normals computed on the double-stranded DNA snapshot by our ASC approximation (a) and the NPB reference (b) with visualization by GEM.[36] The field is estimated to be 1.5 Å from the DB, obtained with the water probe of radius of 2 Å. The larger probe radius is used here to achieve a better visualization of negative curvature regions. Our ASC approximation and the NPB reference use a 0.5 Å triangulation density/grid spacing.

Electric field normals computed on the double-stranded DNA snapshot by our ASC approximation (a) and the NPB reference (b) with visualization by GEM.[36] The field is estimated to be 1.5 Å from the DB, obtained with the water probe of radius of 2 Å. The larger probe radius is used here to achieve a better visualization of negative curvature regions. Our ASC approximation and the NPB reference use a 0.5 Å triangulation density/grid spacing.

Triclinic Hen Egg White Lysozyme

Next, we compared our analytical ASC to the NPB reference on the triclinic hen egg white lysozyme. We see in Figure that our analytical approximation accurately reproduced the NPB reference, outside of the hen-egg lysozyme’s binding cleft. Within the binding cleft, quantitative deviations in electric field normal magnitudes from the NPB reference become apparent, though our approximation still produces a qualitatively reasonable picture. Our approximation (Figure a,b) qualitatively reproduced the reference (Figure c,d) in visualizing the substantial electrostatic effect of Asp 52 and Glu 35 in the enzymatic pocket as well as the corresponding changes due to the change in the charge states of these two residues under mildly acidic conditions (pH 4.5). Hence, both the NPB reference and our model are able to visualize the behavior of Asp 52 and Glu 35, under a pH change.[98−100]
Figure 11

Electric field normals computed on the hen-egg lysozyme by our ASC approximation (top row) and the NPB reference (bottom row), with visualization by GEM.[36] (a,c) Structure at pH 4.5. (b,d) Structure at pH 6.5. All calculations are made 1.5 Å from the DB, with a water probe radius of 2 Å. The larger probe radius is used to achieve a better visualization of negative curvature regions. Our ASC approximation and the NPB reference use a 0.5 Å triangulation density/grid spacing.

Electric field normals computed on the hen-egg lysozyme by our ASC approximation (top row) and the NPB reference (bottom row), with visualization by GEM.[36] (a,c) Structure at pH 4.5. (b,d) Structure at pH 6.5. All calculations are made 1.5 Å from the DB, with a water probe radius of 2 Å. The larger probe radius is used to achieve a better visualization of negative curvature regions. Our ASC approximation and the NPB reference use a 0.5 Å triangulation density/grid spacing. With qualitative tests complete, we finish the analysis with a quantitative comparison between our analytical approximation and the NPB reference. As expected, quantitative performance deficiencies exist for larger molecules with prominent regions of negative curvature. Although the double-stranded DNA and hen-egg lysozyme have similar numbers of atoms, average RMSD values in Table are, relatively, inconsistent. On the double-stranded DNA snapshot (∼1600 atoms), the average RMSD against the NPB reference is about 2.6 times larger than on small molecules (Section , Table ). Comparatively, the average RMSD of the hen-egg lysozyme (∼2000 atoms) is only about 1.4 times larger than on small molecules. On hydration free energies, relative errors between our analytical approximation and the NPB reference are quite small on the DNA snapshot, ∼4%, but more than double, ∼12%, on the hen-egg lysozyme. These findings might have to do with the DNA’s proportion of negative curvature regions with respect to the whole.
Table 3

Electric Field Normal Comparisons between Our Analytical Approximation and the NPB Reference, on Double-Stranded DNA and the Protonated/Unprotonated Hen-Egg Lysozymea

 double-stranded DNA2LZT pH 4.52LZT pH 6.5
absolute difference0.270.150.15
average RMSD0.370.190.20

All values are in kcal/(mol · e · Å).

All values are in kcal/(mol · e · Å).

Discussion

Surface Charge Distribution and Electrostatic Solvation Free Energy

To put the results of Section and Table in a better context, it may be helpful to consider a hypothetical situation. Suppose a biomolecule of interest has a constant electric field strength near its dielectric boundary. When compared to the reference, there is, on average, a ∼0.4 kcal/( mol · e · Å) RMSD error in the electric field normal values. If a unit electric charge is moved 1 Å away from the biomolecular boundary, along the surface normal, the error in the total work done by the electric field would be less than ∼0.4 kcal / mol—small when compared to the “gold-standard” 1 kcal/mol difference against reference. Several caveats are due here. First, the estimate is based on the RMS error; it is possible that the errors are larger in some spots near the DB. That concern is mitigated to an extent by the fact that the electric field strength is inversely related to the square of distance from the surface and would, in reality, decrease away from the surface, making any discrepancy with the reference smaller. Second, this test is free from the danger of a specific error cancellation that may affect the commonly used error assessment based on solvation free energies. As the history of the GB model development demonstrates,[41] very large errors in individual contributions may cancel out to produce seemingly accurate total solvation energies.

Structural Considerations

A notable feature, seen prominently in both the double-stranded DNA snapshot and the hen-egg lysozyme, but not generally in small molecules, is the presence of distinct, and fairly deep, negative curvature pockets on the DB. Our analytical ASC model is derived from an exact solution of the Poisson problem on a spherical DB (Figure a), having positive curvature throughout. Negative curvature regions, such as the main groove in Figure and the binding cleft in Figure , do not occur on a sphere; this is where our model is not expected to perform well. A resulting loss in performance had been noted previously[36] for the approximate electrostatic potential (eq ). It is therefore surprising that a qualitative agreement with the NPB reference is still seen, even in these regions. More shallow negative curvature regions are also present in the small molecules, Figure , but apparently these do not present a significant challenge to the new model. Confounding this effect, deep negative curvature regions on the DB can restrict water molecule conformational freedom, making nearby solvent behave less similarly to that of the bulk.[101,102] It has been shown that regions of this type can significantly modify interactions between small molecule inhibitors and their target proteins,[103] prompting investigations related to their identification.[104,105] In our context, this change in the behavior of water bulk has negative implications on the performance of our model, but the same is true for the NPB reference, which is also based on the continuum solvent. Because both of these models are expected to deviate from the correct physical behavior within these regions of negative curvature, we argue that a qualitative agreement with the NPB reference may be acceptable here, in place of a strong quantitative agreement.

Computational Considerations

Though not implemented in this work, one very important consequence of our analytical approach to the ASC confers a key additional theoretical benefit—the trivially parallel nature of our solution. Equation is computed over our discrete DB representation and is totally independent from surrounding surface elements. Coupled with a similarly parallelizable expression for the computation of electrostatic solvation free energy, eq , our method holds significant potential for the computationally efficient ASC treatment of large biomolecules.

Testing on a Large Biomolecule

For a real-world application, we examine our approximation on a much larger (∼6500 atom) complex, with important relevance today—the ACE2/SARS–CoV-2 complex (PDB ID: 6M0J). Recently, a comparison has been made between SARS-CoV and SARS-CoV-2, examining various mutations and their effects on respective binding strengths with the ACE2 agonist.[77] The focus of Figure is on, what Wang et al.[77] termed, the “CR2” receptor binding domain; the visualized ASC shows how our approximation reproduced the electrostatic complementary of surfaces charges between two residues, ASP30 (D12), Figure a, and LYS 417 (K85), Figure b, thought to contribute to the formation of a salt bridge. This salt bridge improves both stability and binding strength between the ACE2 receptor and SARS-CoV-2 spike protein, when compared to the SARS-CoV spike protein.
Figure 12

Apparent surface charge computed on the receptor binding domain (RBD) of ACE2 receptor/SARS-CoV-2 spike glycoprotein complex, with visualization by GEM.[36] The range of the ASC values is ±0.008 e/Å2, corresponding to the red-blue color map. The white area—a majority of the molecular surface outside of the RBD—is excluded from the calculation, which reduces the computational time significantly. To mimic the color convention used in Figure B of Wang et al.[77] the sign of eq is reversed for this calculation; that is, the negative of the σ is shown. (a,b) ACE2 receptor and SARS-CoV-2 spike glycoprotein RBDs, respectively. All calculations are made 0.7 Å from the DB with a water probe radius of 2 Å. A NanoShaper triangulation density of 0.5 Å is used.

Apparent surface charge computed on the receptor binding domain (RBD) of ACE2 receptor/SARS-CoV-2 spike glycoprotein complex, with visualization by GEM.[36] The range of the ASC values is ±0.008 e/Å2, corresponding to the red-blue color map. The white area—a majority of the molecular surface outside of the RBD—is excluded from the calculation, which reduces the computational time significantly. To mimic the color convention used in Figure B of Wang et al.[77] the sign of eq is reversed for this calculation; that is, the negative of the σ is shown. (a,b) ACE2 receptor and SARS-CoV-2 spike glycoprotein RBDs, respectively. All calculations are made 0.7 Å from the DB with a water probe radius of 2 Å. A NanoShaper triangulation density of 0.5 Å is used. These large-scale visualizations of the ASC (or of the normal component of the electric field) has already been shown useful;[6] our analytical approximation might be a useful tool in understanding complex protein–protein interactions at an atomistic scale, including SARS-CoV-2 mutants of concern,[106] especially in high throughput studies. Our method can potentially be useful in this area due to its targeted, source-based approach to computation of ASC, where only a small portion the entire DB, and hence a small subset of the surface elements, is included in the computation, as demonstrated in Figure . As a result, the computational time is reduced dramatically—see the discussion on time complexity in section 4.2.3.3.

Conclusions

In this work, we have derived a closed-form, analytical approximation for biomolecular apparent surface charge (ASC) and the normal component of the electric field outside the molecule. To the best of our knowledge, this is the first such fully analytical approximation. The approximation was constructed to closely reproduce the exact infinite series solution for a perfect spherical boundary; more importantly, the approximation yields a reasonably close agreement with the standard numerical PB for realistic molecular structures. Specifically, quantitative reproduction of results from the standard numerical PB reference was achieved on most of the tested molecules, except within prominent regions of negative curvature, where the new approximation is still qualitatively correct. Comparisons with a popular fast GB model in AMBER (IGB5) shows that our method is more accurate in reproducing the hydration free energy, albeit at higher computational expense, which may be expected of a proof-of-concept code package that is not highly optimized. At the same time, standard numerical PB is still 1–2 orders of magnitude slower than the proposed approximation, which puts it “in-between” fast analytical GB and numerical PB. We stress that solvation free energy estimates are used here as a common and convenient accuracy metric and is not where we believe the potential benefits of the proposed analytical ASC may be. These potential benefits stem from the unique features of the method. In our view, the main advantage of ASC compared to the GB is its versatility: in contrast to the GB, ASC allows one to estimate pretty much any electrostatic quantity. There are at least two features of the new approximation absent from the GB: the ability to estimate the apparent surface charge (and, hence, the potential everywhere) and the ability to estimate the normal component of the electric field. Another noteworthy feature of the approach sets it apart from other existing approximations that can estimate ASC, including those aimed at computing ASC directly—the fact that the new approximation is “source-based”. This means that the normal electric field and the ASC can be estimated at any individual point or surface patch, without the need for self-consistent computation over the entire surface or volume. This feature is in contrast to “field-based” methods such as numerical solutions of the Poisson equation or DPCM. As an illustration, we showed that the “source-based” feature of our ASC approximation allows a rapid examination of the ACE2/SARS-CoV-2 RBD electrostatics, reproducing conditions posited to contribute to the spike protein’s high binding strength. An area which, in our view, can benefit the most from the proposed analytical ASC is the development of new implicit solvation methods that require fast estimates of local polarization charges and/or fields. We also believe that the new approach may have the potential to compete with existing ASC-based approaches in QM applications, especially where computational efficiency is key; further extensive testing and analysis beyond the scope of the proof-of-concept work will be necessary to explore the potential of the method in this area. As it stands, the proposed method has several limitations. First, it does not yet include salt effects explicitly. However, in the future, it should be relatively easy to add into the model salt dependence at the Debye–Huckel level, following an approach outlined in ref (36). Another limitation of the model is its qualitative nature in the regions of high negative curvature, at least relative to the standard NPB reference. Overcoming this specific limitation will require a significant extension of the underlying theory and extensive testing on biomolecular structures. A careful and detailed comparison of our proof-of-concept approximation within the broader category of existing, optimized implementations ASC methods has not been performed and is warranted in the future; nonetheless, promising results so far have pointed to the potential of our approach in forming the basis of novel implicit models of solvation.
  68 in total

1.  Protein molecular dynamics with the generalized Born/ACE solvent model.

Authors:  N Calimet; M Schaefer; T Simonson
Journal:  Proteins       Date:  2001-11-01

2.  Efficient Computation of the Total Solvation Energy of Small Molecules via the R6 Generalized Born Model.

Authors:  Boris Aguilar; Alexey V Onufriev
Journal:  J Chem Theory Comput       Date:  2012-06-08       Impact factor: 6.006

3.  The Amber biomolecular simulation programs.

Authors:  David A Case; Thomas E Cheatham; Tom Darden; Holger Gohlke; Ray Luo; Kenneth M Merz; Alexey Onufriev; Carlos Simmerling; Bing Wang; Robert J Woods
Journal:  J Comput Chem       Date:  2005-12       Impact factor: 3.376

4.  Refinement of triclinic lysozyme: II. The method of stereochemically restrained least squares.

Authors:  M Ramanadham; L C Sieker; L H Jensen
Journal:  Acta Crystallogr B       Date:  1990-02-01

5.  Theoretical studies of enzymic reactions: dielectric, electrostatic and steric stabilization of the carbonium ion in the reaction of lysozyme.

Authors:  A Warshel; M Levitt
Journal:  J Mol Biol       Date:  1976-05-15       Impact factor: 5.469

6.  A strategy for reducing gross errors in the generalized Born models of implicit solvation.

Authors:  Alexey V Onufriev; Grigori Sigalov
Journal:  J Chem Phys       Date:  2011-04-28       Impact factor: 3.488

7.  Generalized born model with a simple smoothing function.

Authors:  Wonpil Im; Michael S Lee; Charles L Brooks
Journal:  J Comput Chem       Date:  2003-11-15       Impact factor: 3.376

8.  Accuracy of continuum electrostatic calculations based on three common dielectric boundary definitions.

Authors:  Alexey V Onufriev; Boris Aguilar
Journal:  J Theor Comput Chem       Date:  2014-05       Impact factor: 0.939

9.  Exploring protein native states and large-scale conformational changes with a modified generalized born model.

Authors:  Alexey Onufriev; Donald Bashford; David A Case
Journal:  Proteins       Date:  2004-05-01

10.  H++: a server for estimating pKas and adding missing hydrogens to macromolecules.

Authors:  John C Gordon; Jonathan B Myers; Timothy Folta; Valia Shoja; Lenwood S Heath; Alexey Onufriev
Journal:  Nucleic Acids Res       Date:  2005-07-01       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.