Dan E Folescu1, Alexey V Onufriev1,2,3. 1. Department of Computer Science, Virginia Tech, Blacksburg, Virginia 24061, United States. 2. Department of Physics, Virginia Tech, Blacksburg, Virginia 24061, United States. 3. Center for Soft Matter and Biological Physics, Virginia Tech, Blacksburg, Virginia 24061, United States.
Abstract
Closed-form, analytical approximations for electrostatic properties of molecules are of unique value as these can provide computational speed, versatility, and physical insight. Here, we have derived a simple, closed-form formula for the apparent surface charge (ASC) as well as for the electric field generated by a molecular charge distribution in aqueous solution. The approximation, with no fitted parameters, was tested against numerical solutions of the Poisson equation, where it has produced a significant speed-up. For neutral small molecules, the hydration free energies estimated from the closed-form ASC formula are within 0.8 kcal/mol RMSD from the numerical Poisson reference; the electric field at the surface is in quantitative agreement with the reference. Performance of the approximation was also tested on larger structures, including a protein, a DNA fragment, and a viral receptor-target complex. For all structures tested, a near-quantitative agreement with the numerical Poisson reference was achieved, except in regions of high negative curvature, where the new approximation is still qualitatively correct. A unique efficiency feature of the proposed "source-based″ closed-form approximation is that the ASC and electric field can be estimated individually at any point or surface patch, without the need to obtain the full global solution. An open-source software implementation of the method is available: https://people.cs.vt.edu/~onufriev/CODES/aasc.zip.
Closed-form, analytical approximations for electrostatic properties of molecules are of unique value as these can provide computational speed, versatility, and physical insight. Here, we have derived a simple, closed-form formula for the apparent surface charge (ASC) as well as for the electric field generated by a molecular charge distribution in aqueous solution. The approximation, with no fitted parameters, was tested against numerical solutions of the Poisson equation, where it has produced a significant speed-up. For neutral small molecules, the hydration free energies estimated from the closed-form ASC formula are within 0.8 kcal/mol RMSD from the numerical Poisson reference; the electric field at the surface is in quantitative agreement with the reference. Performance of the approximation was also tested on larger structures, including a protein, a DNA fragment, and a viral receptor-target complex. For all structures tested, a near-quantitative agreement with the numerical Poisson reference was achieved, except in regions of high negative curvature, where the new approximation is still qualitatively correct. A unique efficiency feature of the proposed "source-based″ closed-form approximation is that the ASC and electric field can be estimated individually at any point or surface patch, without the need to obtain the full global solution. An open-source software implementation of the method is available: https://people.cs.vt.edu/~onufriev/CODES/aasc.zip.
Accurate and efficient
modeling of solvation effects at the atomistic
level is a critical component of modern efforts to understand molecular
structure and function.[1−5] Analysis and visualization of electrostatic properties of biomolecules,
including the electric field and surface charge generated by the molecular
charge distribution, have made an impact on qualitative reasoning
about biomolecules.[1,6]There are two broad approaches
to the modeling of molecular electrostatics
and solvation effects: explicit and implicit solvation methods.[7] Arguably the most widely used model of solvation
is that for which individual solvent molecules are treated explicitly,
on the same footing with the target molecule. However, accuracy of
the explicit solvent representation comes at a high price computationally,
limiting the practical utility of atomistic simulations in many areas.
The implicit, continuum solvation approach—treating solvent
as a continuum with the dielectric and nonpolar properties of water—can
offer much greater effective simulation speeds compared to the explicit
solvent models.[8−18] The Poisson equation[9,10,19−23] of classical electrostatics[24] provides
an exact formalism—within the continuum, local, linear-response
dielectric approximation of solvent in the absence of mobile ions—for
computing the electrostatic potential V(r) produced by a molecular charge distribution ρ(r) characterizing the solutewhere ϵ(r) is the dielectric constant. Once V(r) is obtained, the electrostatic part of the solvation free energy
is easily computed.[24]The problem
of finding V(r) is mathematically
equivalent[25,26] to finding a continuous charge
density, σ, on the dielectric boundary (DB), such thatwhere ρ(r) is the discrete charge distribution, formed by n point charges q1, ···, q, and σ(s) is the apparent
surface charge (ASC) associated with each surface patch s. The second term in eq represents the so-called reaction field potential.[27−29] Conceptually, once the ASC, σ(s), is found, all
of the solvation effects, at the level of the Poisson equation, can
be computed. The reformulation of the Poisson problem via eq has a number of technical
advantages made apparent over the years, especially in quantum mechanical
(QM) calculations.[8,32] A great variety of practical
widely used methods, including multiple modern derivatives of PCM[25] and COSMO,[30] utilize
this general idea—the apparent surface charge (ASC) formalism,
see refs (31, 32) for comprehensive
reviews.A number of ASC-based methods yield numerically exact
solutions
to the Poisson equation, in the sense that the exact solution can,
at least in principle, be approximated to arbitrary precision. Formally
exact linear-scaling implementations of numerical ASC methods based
on a conjugate gradient or domain decomposition exist.[32] Concerns related to the computational cost of
numerically exact approaches[31] have led
to the development of approximate ASC-based methods, such as the widely
used COSMO,[30] GCOSMO,[33] and C-PCM.[34] These methods rely
on approximations to eq .Still, even these approximate ASC-based methods employ a
significant
numerical component, such as numerical matrix inversion, which may
carry appreciable computational overhead, especially as the structure
size grows.[35,36] Therefore, in applications where
computational efficiency and algorithmic simplicity is paramount,
numerical ASC-based methods may not be as competitive as approximations
to the Poisson equation based purely on closed-form analytical expressions.[27]Among the fully analytical approximations
to the Poisson equation,
the generalized Born (GB) model[8,13,37−62] is arguably the most widely used, especially in atomistic simulations.[63−70] However, despite its multiple documented success stories,[71] the GB model does not have the versatility of eq and the associated benefits
of an ASC-based formulation of biomolecular electrostatics. Here,
we aim to fill the gap by deriving an analytical, closed-form approximation
to the Poisson equation for the ASC and the (normal) electric field
around an arbitrarily shaped molecule. Standard numerical solutions
of the Poisson equation are used as the reference. We refrain from
comparisons with well-established, optimized numerical implementations
of ASC methods in this initial investigation.The outline of
the paper is as follows: Section describes computational testing materials
and methodology. Section is focused on our analytical ASC approximation: we first
derive an exact analytical ASC reference on a sphere (3.1) and present our approximate form
of the ASC for arbitrary molecular geometries (3.2). We test the approximate ASC against
the exact analytical ASC reference using a spherical test case (3.3), which simulates relevant
electrostatic configurations that we will encounter in later sections.
Application of our model to solvation energy calculations is presented
in Section . Numerical
performance and accuracy analysis along with an example applications
are presented in Section , where we first examine the computational speed of our method
(4.1) before testing
its accuracy on small and large molecules (4.2.1 and 4.2.2). Finally, we showcase our analytical
ASC approximation on a presently relevant biomolecular complex: that
of the human ACE2 receptor and SARS-CoV-2 spike glycoprotein (4.3).
Methods
Structures
For testing, a set of
173 neutral small molecules from version 0.52 of the FreeSolv database[72,73] was used. The original set of nearly 650 molecules was narrowed
to include only those molecules containing hydrogen, oxygen, nitrogen,
and carbon atoms. The small molecules under consideration are all
rigid—having small conformational variability as seen in molecular
dynamic (MD) simulations.[74] The choice
of rigid molecules has allowed us to focus on the physics of solvation
while mitigating the uncertainty related to conformational sampling. ambpdb76 was used to generate PQR format files
from AMBER format coordinate and topology files.[72] Additionally, two larger biomolecules were used: a 25 bp
poly-A B′-form dsDNA[6] and the hen-egg
lysozyme (PDB: 2LZT).[76] Our method was also tested on a portion
of the ACE2/SARS-CoV-2 complex (PDB: 6M0J)[77] receptor-binding
domain (RBD). 6M0J RBD residues were determined through A/E chain
contacts within 3.8 Å. In each chain, residues within 1.5 Å
of the contacts were also included. The H++ server[78] was used to generate protonated PQR format files. PQR format
files for small molecules and the SARS-CoV-2 complex along with RBD
contact residue lists are provided in the accompanying code package.
Dielectric Boundary Representations
Just like the numerical Poisson solvers, ASC-based methods rely on
molecular boundary representations.[79−81] These representations
of the actual molecular shape are crucial to the accuracy and, to
some extent, the efficiency of modern implicit solvation models. The
question of which definition of the dielectric boundary (DB) is most
appropriate is nontrivial; there is no universal default.[82−84] Here, our main goal is to assess the accuracy of the new ASC approximation
against its primary accuracy metric, the numerical Poisson–Boltzmann
(NPB). For this purpose, both methods can utilize whichever DB one
considers most appropriate for the given application. Our secondary
goal is to compare the new ASC approximation to a fast analytical
GB model, which dictates the choice of the DB representation chosen
here.Within our ASC method, we approximate the solute–solvent
interface—the DB—using the solvent excluded surface
(SES),[79,85] with Bondi[86] atomic
radii and a water probe of 1.4 Å. This DB is triangulated with
the open-source package NanoShaper.[80] In
each relevant test, the NanoShaper grid spacing was matched with the
grid spacing used by the NPB reference; 0.1 Å was used for small-molecule
accuracy comparisons, 0.25 Å for fair speed comparisons, and
0.5 Å for large-molecule electric field normal comparisons. The
two existing reference methods employed in this work, the NBP and
GB (see Sections and 2.4), utilized matching dielectric boundaries
based on the same set of Bondi radii with a 1.4 Å water probe
radius (unless otherwise specified), for consistency.
Numerical Poisson–Boltzmann Reference
For NPB reference calculations, the macroscopic electrostatics
with atomic detail (MEAD) package[87] was
used. MEAD is a volumetric, finite-difference solver that can compute
potential maps and hydration energies, utilizing an SES DB representation.
The package was chosen primarily because it interfaces well with the
visualization and analysis utility GEM,[36] used here to process the NPB-generated potential maps. Using GEM,
visualization of the electric field around the solute can be achieved
at any given distance from the DB. Convergence[23] of MEAD-generated hydration free energies was verified
(Figure ), supporting
its usage as a reference when a fine enough grid resolution was taken.
A modern, 2nd-order method MIBPB[88] was
used as the accuracy reference for MEAD.
Figure 1
Convergence of hydration
free energies computed by MEAD. The convergence
is assessed against reference values computed by MIBPB,[22] as RMSD over our small molecule set. MIBPB hydration
free energies are computed with a 1.4 Å water probe radius at
a 0.1 Å grid spacing. MEAD hydration free energies are computed
with identical parameters, except for the variable grid spacing. MIBPB
small-molecule electrostatic hydration free energies ranged from −0.05
to −14.37 kcal/mol, with an average magnitude of 6.68 kcal/mol.
Convergence of hydration
free energies computed by MEAD. The convergence
is assessed against reference values computed by MIBPB,[22] as RMSD over our small molecule set. MIBPB hydration
free energies are computed with a 1.4 Å water probe radius at
a 0.1 Å grid spacing. MEAD hydration free energies are computed
with identical parameters, except for the variable grid spacing. MIBPB
small-molecule electrostatic hydration free energies ranged from −0.05
to −14.37 kcal/mol, with an average magnitude of 6.68 kcal/mol.Based in part on the convergence analysis, potential-map
calculations
were performed at a grid spacing of 0.1 Å and hydration energy
calculations at an inner and outer bounding-box grid spacing of 0.1
and 0.5 Å, respectively, for small-molecule accuracy comparisons.
NPB reference solvation energy calculations utilize inner and outer
dielectrics of 1 and 80, respectively. The water probe radius is 1.4
Å, unless otherwise stated. For fair speed comparisons, potential-map
calculations were performed at a grid spacing of 0.25 Å and hydration
energy calculations at a single bounding-box grid spacing of 0.25
Å, considered standard for finite-difference NPB calculations.
For larger molecules, potential-map calculations were performed at
a grid spacing of 0.5 Å and hydration energy calculations at
an inner and outer bounding-box grid spacing of 0.5 Å and 1.0
Å, respectively. The total number of NPB grid points was determined
by setting a volumetric bounding-box side length slightly larger (+1
Å) than the maximal intramolecular distance. The electric field
normal at a point r was numerically approximated by a
two-point stencil:where r + hn̂ and r – hn̂ are two sampling point distance ±h from the
DB along the surface normal n̂; see ref (36) for additional details
of the sampling protocol. Here, h was chosen to minimize
the distance between sampled points, while still being large enough
so that the sampled points are distinct. Notice that if h were too small, the sampling protocol would sample the exact same
points of the cubic lattice used in the NPB reference calculations.
The largest possible distance between two such grid points is the
diagonal of the cubic grid, which determines the minimum h. To illustrate this numerical constraint, h must
be larger than Å for members of the small molecule
dataset, whose potential grids were computed with the NPB reference
at a grid spacing of 0.1 Å.Additionally, to avoid numerical
artifacts of the NPB reference
near the DB and mitigate possible effects of minor differences between
the internal representations of the SES computed by our NPB reference
and NanoShaper, the field computed a distance p > h from the DB (Figure ): doing so ensures that we do not accidentally sample
grid points inside the molecule. The need to use a non-zero projection
distance in the NPB calculations makes it necessary to consider the
electric field normal values near the DB as the numerical reference
for assessing the accuracy of the analytical ASC, rather than the
apparent surface charge itself (which, up to a prefactor, is essentially
the normal component of the field right at the DB, see eq ).
Figure 4
Two dielectric problem
for an arbitrary molecule S with smooth boundary ∂S. The boundary separates
the inner (blue) and outer (white) dielectric regions, with constants
ϵ and ϵ, respectively. For a non-spherical DB, the distance r from the spherical center, Figure a, to the sampling point r is
replaced with the generalized expression r = A + p. Here, A is the
so-called electrostatic size of S, which characterizes
the dimensions of the molecule,[91] and p denotes the projection distance from r to ∂S along the surface normal, n⃗. q is the source charge under consideration,
with d denoting the distance between q and r. When p > 0, the
electric
field normal E(r) of S at r is computed via eq . When p = 0, the ASC, σ(r), is computed via eq .
Generalized-Born Solvation Free Energy Reference
The IGB5[51] GB model from AMBER[75] was utilized. This model was parameterized against
reference NPB hydration free energies calculated based on the SES
surface, Bondi radii, and a 1.4 Å water probe radius.
Accuracy Metrics Used
The ASC approximation
was tested against the exact Poisson–Boltzmann (EPB, see Section ) reference,
with additional tests undertaken against the NPB and GB references.
Our comparison with the EPB reference was used directly for the apparent
surface charge, employing the two test charge configuration in Figure b. Per-vertex electric
field normal values were compared against the NPB reference, averaging
over each vertex in a given biomolecule, and over each biomolecule
in the comparison set. Electrostatic hydration free energies were
compared against the NPB and GB references with inner and outer dielectrics
ϵ = 1 and ϵ = 80, unless otherwise specified. All results
will be in e/Å2 for apparent surface
charge, kcal/( mol · e · Å) for electric field normals, and kcal/mol for electrostatic
hydration free energies.
Figure 2
Geometry and charge settings for the Kirkwood
multipolar expansion(a)
and for the two-charge test case (b). These notations and geometries
are used in eqs and 6. In each case, a perfectly spherical, sharp DB is
utilized, with ϵ and ϵ denoting inner and outer dielectric constants,
respectively. The angle θ (resp. π – θ) is
subtended by the lines connecting the point of observation, r, and the charge(s) to the spherical center. In panel (a),
a single point charge, q, is located r away from the center of a spherical boundary
with radius A. In panel (b), two point charges, q1 and q2, are located
on the vertical diameter of the sphere. The charges are of equal distance, r1 = r2, from the
spherical center. The electrostatic potential V(r) is computed at the point of observation r.
Geometry and charge settings for the Kirkwood
multipolar expansion(a)
and for the two-charge test case (b). These notations and geometries
are used in eqs and 6. In each case, a perfectly spherical, sharp DB is
utilized, with ϵ and ϵ denoting inner and outer dielectric constants,
respectively. The angle θ (resp. π – θ) is
subtended by the lines connecting the point of observation, r, and the charge(s) to the spherical center. In panel (a),
a single point charge, q, is located r away from the center of a spherical boundary
with radius A. In panel (b), two point charges, q1 and q2, are located
on the vertical diameter of the sphere. The charges are of equal distance, r1 = r2, from the
spherical center. The electrostatic potential V(r) is computed at the point of observation r.
Computer Specifications
All computations
and visualizations were completed on a commodity desktop computer
with an Intel Core i7 (or equivalent) processor, using a maximum of
32 GB of memory.
In the context of implicit solvation, the simplest
scenario is that of a solute with a sharp, spherical DB, Figure a.For such
a spherical boundary, as in Figure a, Kirkwood[89] gave the exact,
analytical solution of eq for the potential V at the DB due to
a single charge q inside the boundary.
Here, we use the solution valid on or exterior to the spherical DB,[92] without consideration for mobile ions. At r = AThe apparent surface
charge σ is related to the normal component
of the electric field at (just outside) the boundary via[24,25]From this and eq , we obtain an exact, analytical
expression for the apparent surface
charge on the spherical DB:where the summation is over
all of the enclosed charges. Equation will provide a key check for our analytical ASC approximation.
Convergence Analysis of σKW
The presence of the indexing term, (l +
1), is notable in its effects on the convergence characteristics of eq , by increasing the number
of terms necessary to obtain a converged, accurate reference. As the
ratio r/A approaches
1—that is, as the charge approaches the DB—slow convergence
of the approximate solution manifests itself.In Figure b, we see that, even for , it is possible to achieve both qualitatively
and quantitatively reasonable results. The analytical reference appears
well-converged, Figure , for our purposes, at M = 30 terms. Hence, eq was numerically approximated
by truncating to the first M = 30 terms, calling
the resulting expression the essentially exact Poisson–Boltzmann
(EPB) reference.
Figure 3
Apparent surface charge (ASC, per unit area) computed
using the
truncated, infinite series analytical solution of the Poisson problem
on a sphere, eq . The
convergence tests are conducted on the dual-positive test case, Figure b. The ASC is sampled
at the spherical boundary A = 10 Å away from
the center an angle θ from q1. (a) r1 = r2 = 6 Å;
(b) r1 = r2 = 8 Å; see Figure b. Partial sums of the infinite series solution, eq , with M = 10,
20, 30, and 40 terms are examined. The corresponding truncated sums
are shown with blue, orange, green, and red lines, respectively.
Apparent surface charge (ASC, per unit area) computed
using the
truncated, infinite series analytical solution of the Poisson problem
on a sphere, eq . The
convergence tests are conducted on the dual-positive test case, Figure b. The ASC is sampled
at the spherical boundary A = 10 Å away from
the center an angle θ from q1. (a) r1 = r2 = 6 Å;
(b) r1 = r2 = 8 Å; see Figure b. Partial sums of the infinite series solution, eq , with M = 10,
20, 30, and 40 terms are examined. The corresponding truncated sums
are shown with blue, orange, green, and red lines, respectively.
Main Result: Analytical Apparent Surface Charge
To derive our ASC approximation, we begin with the previously derived
closed-form approximation for the electrostatic potential around (outside)
an arbitrary molecular shape,[90] see Figure :Two dielectric problem
for an arbitrary molecule S with smooth boundary ∂S. The boundary separates
the inner (blue) and outer (white) dielectric regions, with constants
ϵ and ϵ, respectively. For a non-spherical DB, the distance r from the spherical center, Figure a, to the sampling point r is
replaced with the generalized expression r = A + p. Here, A is the
so-called electrostatic size of S, which characterizes
the dimensions of the molecule,[91] and p denotes the projection distance from r to ∂S along the surface normal, n⃗. q is the source charge under consideration,
with d denoting the distance between q and r. When p > 0, the
electric
field normal E(r) of S at r is computed via eq . When p = 0, the ASC, σ(r), is computed via eq .One significant drawback of the approximation in eq that will be overcome
here is that
it only provides a simple expression for the electrostatic potential
in the outside (solvent) space. Thus, the computation of many quantities
of interest remain out of reach, e.g., of the solvation free energy,
which requires knowledge of the potential inside the solute.To proceed, we utilize the polar orthonormal frame, , to take its derivative, for use in eq . The derivative vanishes
in the direction of eθ, yieldingExploiting the geometry
in Figure , we relate
cos (φ) as a dot product of the
surface unit normal, n̂, and the vector from E⊥(r) to q, which we denote d⃗: cos (φ) = (n̂ · d⃗)/d. Applying eq to eq 8 and summing over the
charge distribution (Figure ), we arrive at:where we have made the substitution r = A + p described in Figure . At the dielectric
boundary, p = 0, applying eq to eq gives the following closed-form, analytical approximation
for the apparent surface charge:These two equations
are the main analytical result of this work. Equations , 9, and 10 rely on approximating the term of eq as for l > 0, which allows
the infinite Kirkwood series to be summed without truncation. That
approach proves critical to both accuracy and computational efficiency[90] of the closed-form approximations, based on
the Kirkwood solution. The value of α in eq was rigorously derived previously[92] to minimize RMSD to the exact Kirkwood solution
for electrostatic potential, eq , assuming a random, discrete charge distribution inside a
perfect sphere. Here, we have attempted to re-optimize the value of
α in the context of eq , aiming at best agreement with the reference for hydration
free energies of our small molecule set. The effort led to only a
very minor improvement in accuracy (not shown), so we have decided
to retain the original[92] α = 0.580127
for use in our ASC approximation.
A Self-Consistency Check
Arguably
the simplest self-consistency check of analytical ASC is that the
total surface charge produced by eq should be zero for any of the neutral small molecules
making up our main test set:In the discrete DB
case, as the triangulation density is increased, we expect the numerical
approximation of the total charge integral, eq , to approach zero. Thus, the same simple
check automatically tests both the analytical ASC and the DB discretization
(triangulation) used. As seen from Figure , our ASC implementation follows the expected
trend. We stress that the only purpose of this simple test is a “sanity
check” of our code implementation; the accuracy of the derived
approximation was tested thoroughly in the sequel.
Figure 5
Self-consistency and
surface triangulation convergence check of
our ASC method. The exact result for the total surface charge is zero;
as NanoShaper grid density is increased, the molecular charge estimated
via our implementation also tends to zero. An average over the entire
set of small neutral molecules is shown. NanoShaper inverse grid spacings
are given in Å, while average total molecular charges are given in
|e|.
Self-consistency and
surface triangulation convergence check of
our ASC method. The exact result for the total surface charge is zero;
as NanoShaper grid density is increased, the molecular charge estimated
via our implementation also tends to zero. An average over the entire
set of small neutral molecules is shown. NanoShaper inverse grid spacings
are given in Å, while average total molecular charges are given in
|e|.
Accuracy against the Analytical PB Reference
From Figure , our
approximation matches the essentially exact Kirkwood solution (EPB),
the truncation of eq , quite well on the two-charge test distribution, with only a slight,
and expected, drop in accuracy closest to the point charges. For θ ∈ [0, π], an RMSD of 0.00124 and 0.00117 e/Å2 was achieved between our ASC approximation
and the EPB reference, on the geometries described in panels (a) and
(b) of Figure , respectively.
Figure 6
Apparent
surface charge of our ASC approximation (dashed red) and
the EPB reference (solid black) on the two-charge test distribution,
shown in Figure b.
(a) Surface charge due to two point charges, 1.5 Å away from
the boundary of a dielectric sphere of radius 3 Å, with q1 = q2 = –
0.65. These specific parameters are intended to mimic two oxygen atoms
in a small molecule, with respect to the distance between a highly
charged “surface” atom and the DB. (b) The same charge
distribution as in panel (a) but with q1 = – 0.65, q2 = + 0.65. Points
were sampled from 0 to π in 0.0001 radian steps.
Apparent
surface charge of our ASC approximation (dashed red) and
the EPB reference (solid black) on the two-charge test distribution,
shown in Figure b.
(a) Surface charge due to two point charges, 1.5 Å away from
the boundary of a dielectric sphere of radius 3 Å, with q1 = q2 = –
0.65. These specific parameters are intended to mimic two oxygen atoms
in a small molecule, with respect to the distance between a highly
charged “surface” atom and the DB. (b) The same charge
distribution as in panel (a) but with q1 = – 0.65, q2 = + 0.65. Points
were sampled from 0 to π in 0.0001 radian steps.
Electrostatic Solvation Free Energy with ASC
The apparent surface charge formulation allows us to gain insights
into a variety of solvation effects,[28,93] including
the estimate of hydration free energy, which we used extensively here
to evaluate the accuracy of our new approach against the accepted
reference. Within the implicit solvation framework,[8] the hydration free energy of a molecule is often approximated[94] as the sum of polar (electrostatic) and nonpolar
components:Of the two components
in eq , the electrostatic
solvation free energy, ΔG, often
contributes the most to the total in polar solvents such as water,
especially for macromolecules. Highly approximate yet computationally
efficient ways to estimate the nonpolar component, ΔG, are widely used; by comparison, ΔG is relatively expensive to estimate computationally
with commonly used numerical methods such as NPB.[7] Here, our focus is ΔG. For a discrete charge density indexed by i, we
can compute ΔG as[7]where V(r) and V(r) are the electrostatic potentials due to the given charge distribution
in the solvent and in vacuum, respectively, sampled at each point
charge q located at r. In the special case when the inner and
outer dielectric constants, ϵ and
ϵ, are equal to 1 and 80, respectively,
we call ΔG the electrostatic hydration
free energy. We use eqs and 13 to writeThough eq is valid
for any choice of DB, the surface integral is nontrivial to compute.
We approximate the surface integral using a specific triangulation
of the DB, see Section . This discrete representation approximates eq aswhere σ, A, and r are the apparent surface charge, area,
and center of the triangle T (to express ΔG in kcal/mol, which is often convenient, eq is multiplied by 166,
while using atomic units of length Å and charge |e|). The ASC on T is found by averaging the ASC at
its comprising vertices. The triangular center is simply the centroid
of T, with its area calculated using Heron’s
formula:where a, b, and c are the side lengths of T.
Numerical Applications and Results
Analytical ASC Computational Speed
Here, we present general running time descriptions for each tested
method, rather than exact time values. In this way, we can differentiate
between each method, without worrying about particular optimizations
and expert parameter setups that can be found across a variety of
implementations.[95]In algorithmic
time complexity, the three methods we compared in Table are very different. GB methods,
such as the IGB5 reference, scaled quadratically in the number of
atoms O(K2), while our
method grew linearly O(KN) in the
number of atoms K and surface elements N. Volumetric methods, similar to the NPB reference, scale cubically
in the number of grid points per side of a corresponding bounding
box, itself a function of grid density and the maximum intramolecular
distance. The impact of these asymptotic time complexities can be
clearly seen when we focus on hen-egg lysozyme (2LZT) and double-stranded
DNA wall running times. Though the 2LZT structure has about 400 more
atoms than the DNA structure, the intramolecular width of the DNA
structure is almost double that of 2LZT. This means that the DNA structure
has both a larger total surface area and requires a bigger volumetric
bounding box; as seen in Table , we found longer running times for our ASC approximation
and the NPB reference for 2LZT as compared to DNA, but not for the
IGB5 reference. Hence, this contrasting algorithmic complexity affects
computational timings between each model for structures of different
sizes and characteristics. A principal consequence is the following:
the best case scenario for the efficiency of our ASC approximation
is for structures having many atoms but a comparatively low surface
area; in terms of the derivation of our model, it is coincidental
that in three dimensions, the shape maximizing total inner “volume”
(number of atoms) and minimizing outer surface area is that of a sphere.
Table 1
Running Time Expectations for Computed
Electrostatic Solvation Free Energiesa
method
small molecules
(average of 16 atoms)
2LZT (1958
atoms)
DNA (1598
atoms)
IGB5(AMBER)
milliseconds
∼ a second
∼ half a second
analytical ASC approximation
∼100 ms
tens of seconds
∼ half a minute
numerical PB
tens of seconds
minutes
tens of minutes
Times are given per-molecule (averaged
over the entire set in the small molecule case).
Times are given per-molecule (averaged
over the entire set in the small molecule case).Though the efficiency of our analytical ASC implementation
is not
at the level of the IGB5 reference, it occupies a different niche:
its main purpose is the estimation of the ASC and the electric field.
It is worth noting that the surface integration required for eq is trivially parallelizable
since eqs and 10 can be computed independent of adjacent surface
elements on the DB. This is in addition to the more fundamental parallelism
present in eqs and 10, due to the independence of per-charge contributions
to the electric field and ASC, respectively. Improvements in the efficiency
of our implementation would greatly improve performance as surface
resolution is increased, or in “worst-case” scenarios
such as those seen in the DNA fragment, Table .
Analytical ASC Accuracy with Respect to the
NPB Reference
Small Molecules
We first examine
how our ASC approximation compares to the NPB reference qualitatively
followed by multiple quantitative tests. For qualitative, visual comparisons,
our motivation is the well-recognized utility[6,9,77] of visualizing electrostatic characteristics
of molecules, including mapping them onto a molecular surface. Our
main metric, in this section, is qualitative similarity, or the lack
thereof, between visualizations produced by our analytical ASC method
and the NPB reference.Figure exemplifies the qualitative match between our method
and the NPB reference on a subset of the small molecule dataset. Apart
from very small irregularities attributable to the discrete sampling
of the NPB potential map, the electric field normals computed with
our ASC approximation are, visually, almost indistinguishable from
those computed with the NPB reference.
Figure 7
Electric field normals
computed on a selection of small molecules
by our ASC approximation (top row) and the NPB reference (bottom row),
with visualization by GEM.[36] From left
to right, the three molecules shown (chosen to represent various shapes)
are 1,2-ethanediol (a,d), methane (b,e), and 1-butoxybutane (c,f).
All calculations are made 0.7 Å from the DB, with a water probe
radius of 1.4 Å. Our ASC approximation and the NPB reference
use a 0.1 Å triangulation density/grid spacing. The color range
for the analytical ASC and the NPB reference is the same for each
vertical pair of panels.
Electric field normals
computed on a selection of small molecules
by our ASC approximation (top row) and the NPB reference (bottom row),
with visualization by GEM.[36] From left
to right, the three molecules shown (chosen to represent various shapes)
are 1,2-ethanediol (a,d), methane (b,e), and 1-butoxybutane (c,f).
All calculations are made 0.7 Å from the DB, with a water probe
radius of 1.4 Å. Our ASC approximation and the NPB reference
use a 0.1 Å triangulation density/grid spacing. The color range
for the analytical ASC and the NPB reference is the same for each
vertical pair of panels.The near quantitative agreement between the analytical
ASC and
the NPB reference, Figure is nontrivial, and goes beyond the visual match of the “reds”
and the “blues” with the NPB reference. Notice that,
when α is set to 0 in eqs and 9, the analytical ASC reduces to
the so-called Coulomb field approximation (CFA), which is often used,
and can be considered a null model here.Figure clearly
indicates the decreased magnitude of CFA-generated electric field
normals, when compared to our ASC approximation; on average, the CFA
produced an electric field that was approximately 36% weaker than
our analytical ASC method, which reproduces the NPB reference closely.
The significant deviation of the CFA surface charge from the reference
translates into its poor accuracy in the estimation of the hydration
free energy, see Figure .
Figure 8
Electric field normals computed via our analytical ASC method (a),
the Coulomb field approximation (CFA) (b), and the NPB reference (c)
on a member of our small molecule dataset, pyrrole. While both the
analytical ASC and the CFA identify the positive and negative surface
charge patches, the CFA underestimates their intensity significantly.
Between the analytical ASC method and the CFA, minimum, maximum, and
average electric field normals on the DB are −0.441,0.314,
and −0.030 kcal/(mol · e · Å)
and −0.281,0.200, and −0.019 kcal/(mol · e · Å), respectively. The color range used to
visualize the field is the same in all the panels.
Figure 9
Analytical ASC (red circles), the GB (green stars), and
CFA (blue
squares) hydration free energies of rigid small molecules relative
to the NPB reference values. Linear regression lines are plotted using
colors corresponding to the representative data points. The dotted
blue line represents a perfect match between an approximation and
the NPB reference. R2 values for IGB5,
our analytical approximation, and the CFA are 0.932, 0.961, and 0.777,
respectively. Hydration free energies are shown in kcal/mol.
Electric field normals computed via our analytical ASC method (a),
the Coulomb field approximation (CFA) (b), and the NPB reference (c)
on a member of our small molecule dataset, pyrrole. While both the
analytical ASC and the CFA identify the positive and negative surface
charge patches, the CFA underestimates their intensity significantly.
Between the analytical ASC method and the CFA, minimum, maximum, and
average electric field normals on the DB are −0.441,0.314,
and −0.030 kcal/(mol · e · Å)
and −0.281,0.200, and −0.019 kcal/(mol · e · Å), respectively. The color range used to
visualize the field is the same in all the panels.Analytical ASC (red circles), the GB (green stars), and
CFA (blue
squares) hydration free energies of rigid small molecules relative
to the NPB reference values. Linear regression lines are plotted using
colors corresponding to the representative data points. The dotted
blue line represents a perfect match between an approximation and
the NPB reference. R2 values for IGB5,
our analytical approximation, and the CFA are 0.932, 0.961, and 0.777,
respectively. Hydration free energies are shown in kcal/mol.
Quantitative Assessment of ASC Accuracy
Next, we examine how our ASC approximation compares quantitatively
to the NPB reference. Though electric field normals (or linearly related
surface charges) are not the most intuitive accuracy metrics, these
quantities are those that our method directly computes, and so a direct
comparison with the NPB is in order. In Section , we discuss a physical interpretation
of the deviation of these quantities from the reference. Relative
to the NPB reference, our ASC approximation achieved an average RMSD
and absolute difference of 0.14 and 0.11 kcal/( mol · e · Å) on calculated electric field normal values,
respectively.Relative to direct comparisons of molecular ASC
or its proxy, the normal electric field, tests featuring the calculation
of electrostatic solvation free energies are valuable in the sense
that they provide an intuitive accuracy metric, directly relevant
to experiments. Here, IGB5[51] is an example
of what can be expected from a very fast GB model on small molecule
datasets,[96,97] in terms of accuracy and running time. We
tested how our ASC approximation and the IGB5 method compare to calculation
of electrostatic solvation free energies by the NPB reference.From our findings (see Table ), our ASC approximation had an approximately 20% reduced
RMSD to NPB reference hydration free energies than the IGB5 reference.
This accuracy gain is encouraging, particularly from the perspective
of design: GB models are designed to analytically approximate the
electrostatic solvation free energy obtained from solving the Poisson
equation. That our model can estimate solvation energies more accurately
than a widely used GB model suggests that the approximation of the
ASC and electric field normals by the model is reasonably accurate
to be considered for practical applications. Additionally, the results
in Table give another
encouraging conclusion, with respect to running times efficiencies.
To achieve a deviation from the NPB reference of just slightly above kT, we do not require an overly fine triangulation density
for the analytical ASC. When the grid resolution is set to what was
used in the timing section (see Table ), RMSD against the NPB reference differs only in nonsignificant
digits. Thus, our analytical ASC can achieve a very similar accuracy,
without incurring a heavy 1–2 order of magnitude time penalty,
as seen with the NPB reference at this fine grid resolution.
Table 2
Accuracy of Electrostatic Hydration
Free Energies against the NPB Reference, Estimated by our Analytical
ASC Approximation, IGB5, and the CFAa
method
RMSD to NPB
reference (kcal/mol)
analytical
ASC approximation
0.77
GB (IGB5,AMBER)
0.98
CFA
2.73
Analytical ΔG is computed via eq , using the analytical ASC approximation given in eq . Small-molecule electrostatic
hydration free energies range from −0.01 to −14.71 kcal/mol.
Analytical ΔG is computed via eq , using the analytical ASC approximation given in eq . Small-molecule electrostatic
hydration free energies range from −0.01 to −14.71 kcal/mol.
Proteins and DNA
In analyzing the
performance of our model on structures of increased size and complexity,
we examine a fragment of double-stranded DNA and the hen-egg lysozyme.
Structures of this type—with regions of the DB having deep,
negative curvature “pockets”—present some of
the toughest tests for our model due to certain theoretical considerations
we touch on in section 4.2.3.2.
Double-Stranded DNA
First, we
examined our analytical approximation on a double-stranded DNA fragment.Qualitatively, Figure shows that our analytical approximation reproduced the NPB
reference quite well, although some discrepancies are clearly seen
in the grooves, that is, in regions of high negative curvature.
Figure 10
Electric
field normals computed on the double-stranded DNA snapshot
by our ASC approximation (a) and the NPB reference (b) with visualization
by GEM.[36] The field is estimated to be
1.5 Å from the DB, obtained with the water probe of radius of
2 Å. The larger probe radius is used here to achieve a better
visualization of negative curvature regions. Our ASC approximation
and the NPB reference use a 0.5 Å triangulation density/grid
spacing.
Electric
field normals computed on the double-stranded DNA snapshot
by our ASC approximation (a) and the NPB reference (b) with visualization
by GEM.[36] The field is estimated to be
1.5 Å from the DB, obtained with the water probe of radius of
2 Å. The larger probe radius is used here to achieve a better
visualization of negative curvature regions. Our ASC approximation
and the NPB reference use a 0.5 Å triangulation density/grid
spacing.
Triclinic Hen Egg White Lysozyme
Next, we compared our analytical ASC to the NPB reference on the
triclinic hen egg white lysozyme.We see in Figure that our analytical approximation
accurately reproduced the NPB reference, outside of the hen-egg lysozyme’s
binding cleft. Within the binding cleft, quantitative deviations in
electric field normal magnitudes from the NPB reference become apparent,
though our approximation still produces a qualitatively reasonable
picture. Our approximation (Figure a,b) qualitatively reproduced the reference (Figure c,d) in visualizing
the substantial electrostatic effect of Asp 52 and Glu 35 in the enzymatic
pocket as well as the corresponding changes due to the change in the
charge states of these two residues under mildly acidic conditions
(pH 4.5). Hence, both the NPB reference and our model are able to
visualize the behavior of Asp 52 and Glu 35, under a pH change.[98−100]
Figure 11
Electric field normals computed on the hen-egg lysozyme by our
ASC approximation (top row) and the NPB reference (bottom row), with
visualization by GEM.[36] (a,c) Structure
at pH 4.5. (b,d) Structure at pH 6.5. All calculations are made 1.5
Å from the DB, with a water probe radius of 2 Å. The larger
probe radius is used to achieve a better visualization of negative
curvature regions. Our ASC approximation and the NPB reference use
a 0.5 Å triangulation density/grid spacing.
Electric field normals computed on the hen-egg lysozyme by our
ASC approximation (top row) and the NPB reference (bottom row), with
visualization by GEM.[36] (a,c) Structure
at pH 4.5. (b,d) Structure at pH 6.5. All calculations are made 1.5
Å from the DB, with a water probe radius of 2 Å. The larger
probe radius is used to achieve a better visualization of negative
curvature regions. Our ASC approximation and the NPB reference use
a 0.5 Å triangulation density/grid spacing.With qualitative tests complete, we finish the analysis with a
quantitative comparison between our analytical approximation and the
NPB reference.As expected, quantitative performance deficiencies
exist for larger molecules with prominent regions of negative curvature.
Although the double-stranded DNA and hen-egg lysozyme have similar
numbers of atoms, average RMSD values in Table are, relatively, inconsistent. On the double-stranded
DNA snapshot (∼1600 atoms), the average RMSD against the NPB
reference is about 2.6 times larger than on small molecules (Section , Table ). Comparatively,
the average RMSD of the hen-egg lysozyme (∼2000 atoms) is only
about 1.4 times larger than on small molecules. On hydration free
energies, relative errors between our analytical approximation and
the NPB reference are quite small on the DNA snapshot, ∼4%,
but more than double, ∼12%, on the hen-egg lysozyme. These
findings might have to do with the DNA’s proportion of negative
curvature regions with respect to the whole.
Table 3
Electric Field Normal Comparisons
between Our Analytical Approximation and the NPB Reference, on Double-Stranded
DNA and the Protonated/Unprotonated Hen-Egg Lysozymea
double-stranded
DNA
2LZT pH 4.5
2LZT pH 6.5
absolute difference
0.27
0.15
0.15
average RMSD
0.37
0.19
0.20
All values are in kcal/(mol · e · Å).
All values are in kcal/(mol · e · Å).
Discussion
Surface Charge Distribution and Electrostatic
Solvation Free Energy
To put the results of Section and Table in a better context,
it may be helpful to consider a hypothetical situation. Suppose a
biomolecule of interest has a constant electric field strength near
its dielectric boundary. When compared to the reference, there is,
on average, a ∼0.4 kcal/( mol · e ·
Å) RMSD error in the electric field normal values. If a unit
electric charge is moved 1 Å away from the biomolecular boundary,
along the surface normal, the error in the total work done by the
electric field would be less than ∼0.4 kcal / mol—small
when compared to the “gold-standard” 1 kcal/mol difference
against reference. Several caveats are due here. First, the estimate
is based on the RMS error; it is possible that the errors are larger
in some spots near the DB. That concern is mitigated to an extent
by the fact that the electric field strength is inversely related
to the square of distance from the surface and would, in reality,
decrease away from the surface, making any discrepancy with the reference
smaller. Second, this test is free from the danger of a specific error
cancellation that may affect the commonly used error assessment based
on solvation free energies. As the history of the GB model development
demonstrates,[41] very large errors in individual
contributions may cancel out to produce seemingly accurate total solvation
energies.
Structural Considerations
A notable
feature, seen prominently in both the double-stranded DNA snapshot
and the hen-egg lysozyme, but not generally in small molecules, is
the presence of distinct, and fairly deep, negative curvature pockets
on the DB. Our analytical ASC model is derived from an exact solution
of the Poisson problem on a spherical DB (Figure a), having positive curvature throughout.
Negative curvature regions, such as the main groove in Figure and the binding cleft in Figure , do not occur
on a sphere; this is where our model is not expected to perform well.
A resulting loss in performance had been noted previously[36] for the approximate electrostatic potential
(eq ). It is therefore
surprising that a qualitative agreement with the NPB reference is
still seen, even in these regions. More shallow negative curvature
regions are also present in the small molecules, Figure , but apparently these do not
present a significant challenge to the new model.Confounding
this effect, deep negative curvature regions on the DB can restrict
water molecule conformational freedom, making nearby solvent behave
less similarly to that of the bulk.[101,102] It has been
shown that regions of this type can significantly modify interactions
between small molecule inhibitors and their target proteins,[103] prompting investigations related to their identification.[104,105] In our context, this change in the behavior of water bulk has negative
implications on the performance of our model, but the same is true
for the NPB reference, which is also based on the continuum solvent.
Because both of these models are expected to deviate from the correct
physical behavior within these regions of negative curvature, we argue
that a qualitative agreement with the NPB reference may be acceptable
here, in place of a strong quantitative agreement.
Computational Considerations
Though not implemented in this work, one very important consequence
of our analytical approach to the ASC confers a key additional theoretical
benefit—the trivially parallel nature of our solution. Equation is computed over
our discrete DB representation and is totally independent from surrounding
surface elements. Coupled with a similarly parallelizable expression
for the computation of electrostatic solvation free energy, eq , our method holds significant
potential for the computationally efficient ASC treatment of large
biomolecules.
Testing on a Large Biomolecule
For
a real-world application, we examine our approximation on a much larger
(∼6500 atom) complex, with important relevance today—the
ACE2/SARS–CoV-2 complex (PDB ID: 6M0J).Recently, a comparison
has been made between SARS-CoV and SARS-CoV-2, examining various mutations
and their effects on respective binding strengths with the ACE2 agonist.[77] The focus of Figure is on, what Wang et al.[77] termed, the “CR2” receptor binding domain;
the visualized ASC shows how our approximation reproduced the electrostatic
complementary of surfaces charges between two residues, ASP30 (D12), Figure a, and LYS 417
(K85), Figure b,
thought to contribute to the formation of a salt bridge. This salt
bridge improves both stability and binding strength between the ACE2
receptor and SARS-CoV-2 spike protein, when compared to the SARS-CoV
spike protein.
Figure 12
Apparent surface charge computed on the receptor binding
domain
(RBD) of ACE2 receptor/SARS-CoV-2 spike glycoprotein complex, with
visualization by GEM.[36] The range of the
ASC values is ±0.008 e/Å2, corresponding
to the red-blue color map. The white area—a majority of the
molecular surface outside of the RBD—is excluded from the calculation,
which reduces the computational time significantly. To mimic the color
convention used in Figure B of Wang et al.[77] the sign of eq is reversed for this
calculation; that is, the negative of the σ is shown. (a,b)
ACE2 receptor and SARS-CoV-2 spike glycoprotein RBDs, respectively.
All calculations are made 0.7 Å from the DB with a water probe
radius of 2 Å. A NanoShaper triangulation density of 0.5 Å
is used.
Apparent surface charge computed on the receptor binding
domain
(RBD) of ACE2 receptor/SARS-CoV-2 spike glycoprotein complex, with
visualization by GEM.[36] The range of the
ASC values is ±0.008 e/Å2, corresponding
to the red-blue color map. The white area—a majority of the
molecular surface outside of the RBD—is excluded from the calculation,
which reduces the computational time significantly. To mimic the color
convention used in Figure B of Wang et al.[77] the sign of eq is reversed for this
calculation; that is, the negative of the σ is shown. (a,b)
ACE2 receptor and SARS-CoV-2 spike glycoprotein RBDs, respectively.
All calculations are made 0.7 Å from the DB with a water probe
radius of 2 Å. A NanoShaper triangulation density of 0.5 Å
is used.These large-scale visualizations of the ASC (or
of the normal component
of the electric field) has already been shown useful;[6] our analytical approximation might be a useful tool in
understanding complex protein–protein interactions at an atomistic
scale, including SARS-CoV-2 mutants of concern,[106] especially in high throughput studies. Our method can potentially
be useful in this area due to its targeted, source-based approach
to computation of ASC, where only a small portion the entire DB, and
hence a small subset of the surface elements, is included in the computation,
as demonstrated in Figure . As a result, the computational time is reduced dramatically—see
the discussion on time complexity in section
4.2.3.3.
Conclusions
In this work, we have derived
a closed-form, analytical approximation
for biomolecular apparent surface charge (ASC) and the normal component
of the electric field outside the molecule. To the best of our knowledge,
this is the first such fully analytical approximation. The approximation
was constructed to closely reproduce the exact infinite series solution
for a perfect spherical boundary; more importantly, the approximation
yields a reasonably close agreement with the standard numerical PB
for realistic molecular structures. Specifically, quantitative reproduction
of results from the standard numerical PB reference was achieved on
most of the tested molecules, except within prominent regions of negative
curvature, where the new approximation is still qualitatively correct.
Comparisons with a popular fast GB model in AMBER (IGB5) shows that
our method is more accurate in reproducing the hydration free energy,
albeit at higher computational expense, which may be expected of a
proof-of-concept code package that is not highly optimized. At the
same time, standard numerical PB is still 1–2 orders of magnitude
slower than the proposed approximation, which puts it “in-between”
fast analytical GB and numerical PB. We stress that solvation free
energy estimates are used here as a common and convenient accuracy
metric and is not where we believe the potential benefits of the proposed
analytical ASC may be. These potential benefits stem from the unique
features of the method. In our view, the main advantage of ASC compared
to the GB is its versatility: in contrast to the GB, ASC allows one
to estimate pretty much any electrostatic quantity.There are
at least two features of the new approximation absent
from the GB: the ability to estimate the apparent surface charge (and,
hence, the potential everywhere) and the ability to estimate the normal
component of the electric field. Another noteworthy feature of the
approach sets it apart from other existing approximations that can
estimate ASC, including those aimed at computing ASC directly—the
fact that the new approximation is “source-based”. This
means that the normal electric field and the ASC can be estimated
at any individual point or surface patch, without the need for self-consistent
computation over the entire surface or volume. This feature is in
contrast to “field-based” methods such as numerical
solutions of the Poisson equation or DPCM. As an illustration, we
showed that the “source-based” feature of our ASC approximation
allows a rapid examination of the ACE2/SARS-CoV-2 RBD electrostatics,
reproducing conditions posited to contribute to the spike protein’s
high binding strength.An area which, in our view, can benefit
the most from the proposed
analytical ASC is the development of new implicit solvation methods
that require fast estimates of local polarization charges and/or fields.
We also believe that the new approach may have the potential to compete
with existing ASC-based approaches in QM applications, especially
where computational efficiency is key; further extensive testing and
analysis beyond the scope of the proof-of-concept work will be necessary
to explore the potential of the method in this area.As it stands,
the proposed method has several limitations. First,
it does not yet include salt effects explicitly. However, in the future,
it should be relatively easy to add into the model salt dependence
at the Debye–Huckel level, following an approach outlined in
ref (36). Another limitation
of the model is its qualitative nature in the regions of high negative
curvature, at least relative to the standard NPB reference. Overcoming
this specific limitation will require a significant extension of the
underlying theory and extensive testing on biomolecular structures.A careful and detailed comparison of our proof-of-concept approximation
within the broader category of existing, optimized implementations
ASC methods has not been performed and is warranted in the future;
nonetheless, promising results so far have pointed to the potential
of our approach in forming the basis of novel implicit models of solvation.
Authors: David A Case; Thomas E Cheatham; Tom Darden; Holger Gohlke; Ray Luo; Kenneth M Merz; Alexey Onufriev; Carlos Simmerling; Bing Wang; Robert J Woods Journal: J Comput Chem Date: 2005-12 Impact factor: 3.376
Authors: John C Gordon; Jonathan B Myers; Timothy Folta; Valia Shoja; Lenwood S Heath; Alexey Onufriev Journal: Nucleic Acids Res Date: 2005-07-01 Impact factor: 16.971