Nils J D Drechsel1, Christopher J Fennell2, Ken A Dill3, Jordi Villà-Freixa4. 1. Computational Biochemistry and Biophysics Laboratory, Research Unit on Biomedical Informatics, Universitat Pompeu Fabra , C/Doctor Aiguader, 88, 08003 Barcelona, Catalunya, Spain ; Laufer Center for Physical and Quantitative Biology, Stony Brook University , Stony Brook, New York 11794-5252, United States ; Department of Chemistry, Oklahoma State University , Stillwater, Oklahoma 74078, United States. 2. Department of Chemistry, Oklahoma State University , Stillwater, Oklahoma 74078, United States. 3. Laufer Center for Physical and Quantitative Biology and Departments of Physics and Chemistry, Stony Brook University , Stony Brook, New York 11794-5252, United States. 4. Computational Biochemistry and Biophysics Laboratory, Research Unit on Biomedical Informatics, Universitat Pompeu Fabra , C/Doctor Aiguader, 88, 08003 Barcelona, Catalunya, Spain ; Escola Politècnica Superior, Universitat de Vic-Universitat Central de Catalunya , C/de la Laura, 13, 08500 Vic, Catalunya, Spain.
Abstract
We present a new approach to the calculation of solvent-accessible surface areas of molecules with potential application to surface area based methods for determination of solvation free energies. As in traditional analytical and statistical approaches, this new algorithm, called TRIFORCE, reports both component areas and derivatives as a function of the atomic coordinates and radii. Unique to TRIFORCE are the rapid and scalable approaches for the determination of sphere intersection points and numerical estimation of the surface areas, derivatives, and other properties that can be associated with the surface area facets. The algorithm performs a special tessellation and semianalytical integration that uses a precomputed look-up table. This provides a simple way to balance numerical accuracy and memory usage. TRIFORCE calculates derivatives in the same manner, enabling application in force-dependent activities such as molecular geometry minimization. TRIFORCE is available free of charge for academic purposes as both a C++ library, which can be directly interfaced to existing molecular simulation packages, and a web-accessible application.
We present a new approach to the calculation of solvent-accessible surface areas of molecules with potential application to surface area based methods for determination of solvation free energies. As in traditional analytical and statistical approaches, this new algorithm, called TRIFORCE, reports both component areas and derivatives as a function of the atomic coordinates and radii. Unique to TRIFORCE are the rapid and scalable approaches for the determination of sphere intersection points and numerical estimation of the surface areas, derivatives, and other properties that can be associated with the surface area facets. The algorithm performs a special tessellation and semianalytical integration that uses a precomputed look-up table. This provides a simple way to balance numerical accuracy and memory usage. TRIFORCE calculates derivatives in the same manner, enabling application in force-dependent activities such as molecular geometry minimization. TRIFORCE is available free of charge for academic purposes as both a C++ library, which can be directly interfaced to existing molecular simulation packages, and a web-accessible application.
Molecular simulations
often involve balancing between complexity
and computational tractability. One common approximation that makes
macromolecular simulations, such as protein folding and association,
tractable is to use implicit rather than explicit solvent. Implicit
solvents trade the detailed accounting of water, lipid, or other surrounding
solvent molecules for an averaged effective solvent representation.
To do this, the many explicit degrees of solvent freedom are integrated
out of the system and replaced with free energies of solvation. Free
energies of solvation have been observed to be well-correlated with
the interfacial or solvent-accessible surface area (SASA) in the case
of nonpolar solutes.[1−4] Many methods for implicit solvation have been developed which take
advantage of this observation to provide quantitative predictions
for solvation free energies using the SASA,[5−14] though it should be noted that surface area based approaches are
not the only avenue for estimating such quantities.[15−17] To perform
simulations with implicit solvents that depend upon the SASA, we need
both per-atom SASAs and their derivatives. Several algorithms and
methods have been developed which can compute these quantities spanning
from analytically exact approaches[18−23] to statistical or numerically approximate approaches.[24−27]We are interested in performing molecular simulations in implicit
solvents that report solvation free energies with a high degree of
accuracy. Much of our motivation for the work presented within comes
as a result of our experiences applying numerical surface area methods
to these problems and encountering performance limitations with subsequent
numerical derivatives that make them, while accurate, undesirable
in general use. Analytical approaches available are also often performance
limiting, and potentially faster statistical approaches are often
marked by substantial nonuniform errors. Additionally, many methods
are physically limited in some way, such as the inability to treat
hydrogen atom contributions to the SASA. Thus, despite the large number
of surface area algorithms in the scientific literature, we found
there to be a surprising lack of readily accessible, working, and
robust approaches that provide both accurate surface areas and accurate
derivatives.Here, we present an alternative semianalytical
approach for computing
Lee–Richards SASAs and their derivatives from coarse-grained
or all-atom representations of molecular systems.[28] This method, which we refer to as TRIFORCE, employs a precomputed
look-up table that enables accelerated determination of component
areas and derivatives as a function of the principal angles defining
boundary interfaces. We refer to this as a semianalytical method because,
while the process for determining these principal angles is analytical,
we perform a numerical interpolation look-up on a discretized grid
of surface areas and their derivatives as a function of these angles.
We evaluate the correctness of this approach with comparisons to analytical
methods and show that the numerical accuracy is determined by the
density of this look-up table, with the primary cost being one of
memory size and memory access rather than compute cycles. Our intent
is for the TRIFORCE method to fill the apparent gap between the theoretical
landscape of surface area algorithms and the realm of practical usage.
Area Calculation
In classical molecular simulations,
molecules are modeled as a
set of intersecting spheres S with radii corresponding to their extended van der Waals (vdW)
surface. The radii for this extended vdW surface often include a solvent
probe radius term (traditionally 1.4 Å) to best align this extended
surface with the average distance to the surrounding solvent. This
extended surface is also called the Lee–Richards solvent-accessible
surface.[28] The solvent-accessible surface
area A of a molecule is then calculated by adding
up all solvent exposed areas of the set of spheres A( usingA( is a complex
shape that we represent as a sum of simpler component areas. These
component areas are triangular patches which are constructed by a
special tessellation of the exposed area of a sphere, as depicted
in Figure 1. Here, the exposed area of central
sphere S0 is composed of six triangular
patches, each having two sides which are great circle segments and
a third side which is a spherical arc.
Figure 1
Tessellation of the exposed
area of sphere S0 by component triangles.
The corners of the triangles are
two consecutive intersection points, denoted by , and the tessellation point, χ̇. are intersections of circular interfaces I on the surface of S0.
Tessellation of the exposed
area of sphere S0 by component triangles.
The corners of the triangles are
two consecutive intersection points, denoted by , and the tessellation point, χ̇. are intersections of circular interfaces I on the surface of S0.The boundary of the exposed area,
which we will refer to as the
exposed boundary, is composed of these spherical arcs, which are segments
of circular interfaces I between spheres S that intersect with S. For example, circular interface I is located in the plane of intersection between S and S where the hulls touch each other. An intersection
point exists between circular interfaces
whenever two interfaces intersect on the exposed boundary. Two consecutive
intersection points and on an exposed boundary, in conjunction with a tessellation
point χ̇, form the three points of a triangular patch A(. χ̇ is the
point on S where the
tessellation axis χ intersects. χ is an arbitrary vector for each atom fixed for the duration of the
calculations.Areas A( are calculated
on a unit sphere and therefore have to be multiplied with the squared
radius r of sphere S:which
results in area Â(. Here, Seg denotes the set of all interface
segments of the exposed boundaries
on S.Patches A( can be either
fully or fractionally part of the SASA, or the solvent excluded surface
area (SESA), which is complement to the SASA, i.e., the area that
is buried by neighbor spheres. Figure 2 shows
that, over the course of the algorithm, triangular patches contribute
positively if they are fully or fractionally part of the SASA (blue)
or contribute negatively if they are fully part of the SESA (red).
Figure 2
Illustration
of the tessellation of an exposed boundary when the
tessellation-point is (left) inside or (right) outside the exposed
boundary. When the tessellation point is outside the boundary, we
compute both positive (blue) and negative (red) areas, the sum of
which is equal to the result when the intersection point is inside
the boundary.
Illustration
of the tessellation of an exposed boundary when the
tessellation-point is (left) inside or (right) outside the exposed
boundary. When the tessellation point is outside the boundary, we
compute both positive (blue) and negative (red) areas, the sum of
which is equal to the result when the intersection point is inside
the boundary.In addition, the interface
segments of a patch A( can be either part of a convex or a concave
interface. Figure 3 illustrates the distinction
between these two types of interfaces. When a sphere with a smaller
radius moves toward and begins to intersect a sphere with a larger
radius, the sphere–sphere intersection which results is a convex
interface. Once half the small sphere is occluded, this interface
transitions to a concave interface. Concave interfaces can be seen
as inverted convex interfaces, and an area computed using a concave
interface will have opposite sign. Once patches A( are summed up to form Â(, this area will either be positive
or negative, depending on whether SASA or SESA has been counted, respectively.
If the area is a SESA, we compute the complement to convert it into
a SASA. This is done by adding the surface area of a sphere with radius r to this negative Â(:
Figure 3
Illustration
showing how an interface between two spheres transitions
from convex to concave with increasing sphere overlap.
Illustration
showing how an interface between two spheres transitions
from convex to concave with increasing sphere overlap.Figure 4 shows the principal
angles Φ,
ψ, and λ used in determining the areas of a triangular
patch. Each triangular patch A( is assembled from two subpatches split by the so-called tessellation
plane, i.e., the plane orthogonal to the cross product of χ and the intersphere vector. Here, Φ is the angle between the
tessellation plane and a vector from the center of I to a . The angle ψ is between the tessellation
axis and the center of I. Finally, λ is the opening angle
of the cone constructed from I and the center of sphere S.
Figure 4
Triangular patch formed by points , , and χ̇, formed from two subpatches,
defined by principal angles Φ, ψ, and λ. The area
of the subpatch is acquired from a simple look-up table T(Φ,ψ,λ) dependent on these principal angles.
Triangular patch formed by points , , and χ̇, formed from two subpatches,
defined by principal angles Φ, ψ, and λ. The area
of the subpatch is acquired from a simple look-up table T(Φ,ψ,λ) dependent on these principal angles.The area of each subpatch is looked-up
and interpolated from a
precomputed integration table T(Φ,ψ,λ).
Φ can either be positive or negative depending on whether its
corresponding intersection point is on the “right” or
“left” side of the tessellation plane, which results
in either a positive or negative area, respectively. The two subpatches
are subtracted from each other to result in the full area of patch A(.From now on, whenever
unambiguous, we omit the index (l) for simplicity.
Throughout the work we will use the symbol ·
to denote the dot product, symbol ⊗ to denote the outer vector
product, and symbol × to denote the cross-product. Whenever the
cross-product is used between a matrix and a vector, a columnwise
cross-product is assumed.Figure 5 highlights
the key angles and vectors
that factor into the calculation of the surface area by illustrating
both concave and convex sphere intersection interfaces. An interface I is determined by two quantities:
(1) the amount of penetration g* of S into S normalized to a unit sphere:where υ is a vector of length d between Cartesian coordinates x and x of spheres S and S. (2) Interface
type f, which is either
positive for convex or
negative for concave interfaces, is given byPreviously we stated that concave
interfaces
are treated as inverted convex interfaces. Concave interfaces will
have a negative g* which needs to be made positive:Angular calculations are performed on unit
vectors:which include
the radial distances between
two interfaces I and I,as well as the opening angle to the
interface,and the angle to the tessellation axis,Multiple circular interfaces will
intersect
if the sum of their conical opening angles λ and λ fall below their
radial distance ρ, and if one
does not entirely contain the other. The intersection between these
circular interfaces will then result in two intersection points and . The cases
in which opening angles exactly match their radial distance, or two
interfaces are identical, are treated as if there were no intersection
at all.
Figure 5
Sphere S intersects
with two neighboring spheres S and S. The intersections
result in one concave (red) and one convex (blue) circular interface,
respectively, to which normalized vectors μ and μ point, with ρ as the angle between these normalized
vectors. Vectors υ and υ connect S to the respective sphere
centers x and x, and these vectors have magnitude d and d. g and g are simply
the distances from the S center to interfaces I and I.
Sphere S intersects
with two neighboring spheres S and S. The intersections
result in one concave (red) and one convex (blue) circular interface,
respectively, to which normalized vectors μ and μ point, with ρ as the angle between these normalized
vectors. Vectors υ and υ connect S to the respective sphere
centers x and x, and these vectors have magnitude d and d. g and g are simply
the distances from the S center to interfaces I and I.The three look-up parameters are derived from these
circular interfaces.
Φ angles are calculated with respect to a specific interface,
which is illustrated in Figure 6. Here, Φ
angles for differ with respect to interface I or I. The curved arrows over the Φ indices represent the direction
of the intersection. Seen from I, is viewed as an incoming point (connecting I with I) and as an outgoing point (connecting I with I). The respective angles for interface I are an incoming angle Φ↶ and
an outgoing angle Φ↷.
Figure 6
Intersections between circular interfaces I, I, and I on sphere S resulting
in intersection
points , , , and . Φ
angles are calculated with respect to each interface’s tessellation
plane. Here, we show all of the Φ angles for interface I. Following the path indicated
by the arrows, the SASA will be on the right-hand side. This path
direction makes an incoming intersection point of I and an outgoing intersection point.
Intersections between circular interfaces I, I, and I on sphere S resulting
in intersection
points , , , and . Φ
angles are calculated with respect to each interface’s tessellation
plane. Here, we show all of the Φ angles for interface I. Following the path indicated
by the arrows, the SASA will be on the right-hand side. This path
direction makes an incoming intersection point of I and an outgoing intersection point.Figure 7 illustrates how
these incoming
and outgoing angles are calculated utilizing the previously introduced
quantities for circular interfaces. First, η is derived from the two opening angles λ and λ, and the
angle between the two interface vectors ρ:η is the
opening angle of the intersection between the interfaces. Next, a
normal vector n to the tessellation
plane is established:in which is the unnormalized
vector. This calculation
is followed by the computation of a vector normal to and , which
we will denote as the normal to the interfacial plane of I and I:in which again describes the unnormalized
vector. The angle between these two normal vectors determines the
rotation of I around I:
Figure 7
Intersections between circular interfaces I and I resulting in and . A tessellation plane is formed between μ and χ. The calculation
of Φ can be split into the calculation of angles ω and
η. ω denotes the rotation
of μ around μ with the tessellation
plane as reference. η is the opening
angle of the intersection of interfaces I and I.
Intersections between circular interfaces I and I resulting in and . A tessellation plane is formed between μ and χ. The calculation
of Φ can be split into the calculation of angles ω and
η. ω denotes the rotation
of μ around μ with the tessellation
plane as reference. η is the opening
angle of the intersection of interfaces I and I.Whether this rotation is to the
left or to the right depends on
the orientation of χ with respect to the interfacial
plane of I and I:resulting in the signed rotation:η and
ω need to be combined into both
outgoing Φ↷ and incoming Φ↶ angles:with γ↷ being either zero or π
and γ↶ being either zero or −π,
indicating whether the complement of ω needs to be calculated.
Both quantities depend on the rotation of I around I and the magnitude of η:Utilizing the preceding equations, we can
look-up the areas of two subpatches of a triangular patch A( from precomputed tables T(Φ↷,ψ,λ) and T(Φ↶,ψ,λ). As previously stated, the two subareas need to be subtracted
from each other. Furthermore, if the outgoing angle is smaller than
the incoming angle (indicated by q), the complementary area needs to be calculated:where M is the maximal area of a triangular patch for a
given interface I specified
by ψ and λ.
Exposed Boundary Segments
Each sphere
can possibly have multiple contributions to A( in the form of multiple
discontinuous segments of its circular interface. Each segment, described
by a tuple (i,j,k), can be uniquely identified by its corresponding intersection points and . Segments
containing belong to the exposed boundary if, generally speaking, they
are not covered by any other interface.To calculate the three
principal angles Φ, ψ, and λ
for each intersection point of the boundary of the exposed area, it
is necessary to find all segments (i,j,k) that contain these points. Each interface can
possibly contain multiple segments for which the calculation is performed
separately.For each interface a cyclic sorted list is established.
In the
course of the algorithm, Φ angles for intersection points of
all intersecting interfaces are added to the list. After each addition,
the list is pruned by removing occluded intersection points (and their
Φ angles). Figure 8 illustrates this
pruning process. Here, every node between two corresponding intersection
points is removed. Corresponding intersection points are either the
two points of the newly added interface (blue double-dashed area)
or two points already present in the cyclic list (red single-dashed).
At the conclusion of the algorithm, the list contains only intersection
points that are on the boundary of the exposed surface.
Figure 8
Addition of
interface I to the cyclic
list of interface I,
which already contains data from an intersection
with I. Intersections
Φ↷ and Φ↶ cause newly
added intersection Φ↶ to be deleted (red single-dashed
area). Intersections Φ↷ and Φ↶ cause old intersection Φ↶ to be deleted (blue double-dashed
area). Only intersections Φ↷ and Φ↶ remain.
Addition of
interface I to the cyclic
list of interface I,
which already contains data from an intersection
with I. Intersections
Φ↷ and Φ↶ cause newly
added intersection Φ↶ to be deleted (red single-dashed
area). Intersections Φ↷ and Φ↶ cause old intersection Φ↶ to be deleted (blue double-dashed
area). Only intersections Φ↷ and Φ↶ remain.
Discontinuities and Singularities
Some constellations
of spheres express extreme properties, or singular
behaviors, which can cause instabilities during force calculations.
These result in artificially large forces due to numerical limits
and force discontinuities due to the discrete nature of soft-sphere
and soft-circle intersections.
Contact Point between Two
Spheres
Imagine the following scenario for which key quantities
are shown
in Figure 9: two nonintersecting atoms S and S with radii of 2.0 Å approach one another.
In the moment when the hulls just touch, i.e., when their distance
is at precisely 4.0 Å, the derivative in the direction of the
axis of separation abruptly jumps from zero to some nonzero value
(illustrated by the continuous lines). Discontinuities of this type
will act as large force barriers during a minimization or lead to
numerical instabilities in the time-dependent evolution of the surface.[29] To compensate, we smooth the force and area
around the contact point with a logistic function Λ(t). Figure 9 illustrates the effect of Λ(t). The dashed line shows a smooth transition
in force space around the contact point. However, it also introduces
a small hill in the corresponding region in area space. The change
in area in this region is quite small, leading to only a minor perturbation
of the computed surface area values. We calculate Λ(t) using a term t which depends on the distance between
the spheres:Here, parameters p0 and p1 adjust
the shape of the smoothing
function. We set p0 = 0.15 and p1 = 8.0 in this implementation.
Figure 9
Illustration of the derivative
(top) and area (bottom) calculated
as a function of separation distance for two 2.0 Å radius spheres,
with and without a distance-dependent logistic smoother. Without the
smoother, derivatives rapidly jump from zero to some nonzero value
upon contact. With the smoother, this discontinuity is mitigated in
force space at the cost of adding a small hill in area space.
Illustration of the derivative
(top) and area (bottom) calculated
as a function of separation distance for two 2.0 Å radius spheres,
with and without a distance-dependent logistic smoother. Without the
smoother, derivatives rapidly jump from zero to some nonzero value
upon contact. With the smoother, this discontinuity is mitigated in
force space at the cost of adding a small hill in area space.
Interfacial
Angular Singularities
The derivatives of the principal angles
ψ, λ, and Φ
(which involves η and ω) lead to the calculation of fractional
terms in which the denominator vanishes once the angles approach either
0 or π. This gives rise to numerically anomalous very large
(or infinite) forces. λ and η are two angles that can
describe extreme geometrical constellations: a very small λ
refers to two spheres that just barely touch, a very small η
refers to two interfaces that just barely touch, and a very large
η refers to interfaces of identical size with nearly perfect
overlap. In all cases, we can use a threshold to circumvent calculation
in these extreme angular regions. Doing so creates a small discontinuity
which could be removed using a smoothing strategy similar to that
used for sphere–sphere contact-point discontinuities.ψ angle singularities are entirely artificial due to the arbitrary
nature of the tessellation axis χ. Singularities
can be prevented by bringing χ into general position,
i.e., choosing a χ which is not degenerate with
respect to any ψ angle.ω angle singularities can
be handled in the same way as ψ
angle singularities, i.e., by choosing a general position with respect
to ψ and ω. However, we have found this approach to be
computationally intensive. Instead, we remove the singularity via
a tessellation-axis transformation. If a degenerate ω angle
is detected for a given derivative, a new χ* is
chosen arbitrarily such that it is in a general position. We then
calculate a normal vector n* to the new tessellation
plane using this new axis. We calculate a transformed ω* with respect to the new axis using eqs 14–20. ω and ω* are related by the angle between the new and old tessellation plane
τ:The original ω can be reconstructed:where the relationship between the new and
old tessellation planes depend on the geometrical constellation of χ* and n. q0 indicates if the tessellation plane and the
interfacial plane of I and I are similarly
orientated with respect to the tessellation axis:q1 and q2 determine the angular relationship between
τ and ω*, and whether
the complement of ω* needs to be computed:The advantage
of eq 32 is that it does not
depend on the possibly singular angle between n and n. Instead, it depends on the guaranteed nonsingular angles ω* and τ for which partial derivatives
can be robustly calculated.A different kind of singularity
arises if an interface contains
the mirror point of the tessellation point on the opposite side of
the sphere. This would be the case if the sum of λ and ψ
is equal to π or if it exceeds π. During the calculation
of the integration tables, we map Φ angles to ϕ angles,
which represent longitudinal lines that pass through the same intersection
point (see the Appendix for a more detailed
explanation of ϕ angles). The former case would map one Φ
angle to an infinite number of ϕ angles. The latter would map
Φ angles to ϕ angles in such a way that close to the singularity
the look-up tables would be too sparse. This gives rise to imprecise
areas and derivatives. To handle this, we split each sphere into two
independently tessellated hemispheres. As a result, no Φ angles
close to the singularity need be used.
Results
Correctness of Areas and Derivatives
We show the correctness
of the area and derivatives by comparing
to both exact and approximate values for a set of proteins and integration
tables of different grid sizes. Here, we compare against the exact
output of http://curie.utmb.edu/getarea.html and the approximate
LCPO algorithm.[26] The set of proteins we
chose includes ubiquitin, five structures of increasing size that
were mentioned in ref (21), 14 structures from Decoys ‘Rʼ Us[30] that we have observed to be challenging for statistically
derived area calculation algorithms, and 22 structures from ref (31) that represent a set of
different folds. We list PDB codes for these structures in Table 1 and Table 2. We note that
all of these protein structures lack hydrogen atoms due to limitations
of both the exact GETAREA and approximate LCPO methods. TRIFORCE suffers
from no such limitation.
Table 1
Relative Unsigned
Error (RUE) of the
Per-Atom Surface Areas Calculated Using TRIFORCE, with Grid Dimensions
of 16 and 64, and LCPO Relative to Exact (GETAREA) Surface Areas for
42 Proteins
RUE of
area (Å2)
LCPO
TRIFORCE
PDB code
heavy atoms
grid 16
grid 64
1PLX
40
0.9602
0.0070
0.0001
1CBH
260
1.8208
0.0148
0.0021
1SP2
269
1.6250
0.0183
0.0018
5RXN
422
3.8093
0.0372
0.0020
1I6F
436
3.8105
0.0486
0.0003
4PTI
454
4.2465
0.0344
0.0012
1FAS
468
3.7621
0.0333
0.0058
1SN3
492
3.4444
0.0390
0.0008
1CSP
505
1.8924
0.0241
0.0012
2CRO
520
5.6744
0.0239
0.0006
1FVQ
545
5.7127
0.0465
0.0022
1SDF
550
5.5041
0.0374
0.0034
1UBQ
602
2.6297
0.0178
0.0008
1HIP
617
3.9922
0.0266
0.0011
1PHT
666
4.1957
0.0460
0.0017
1J5D
721
4.9532
0.0614
0.0031
2CDV
801
5.4789
0.0316
0.0020
1OPC
805
5.0891
0.0361
0.0039
1KTE
818
5.4651
0.0401
0.0021
1NSO
858
4.2843
0.0279
0.0019
2PAZ
932
2.6954
0.0303
0.0018
1CHN
966
3.7557
0.0273
0.0017
1K40
976
3.4175
0.0206
0.0004
1OOI
986
5.0779
0.0348
0.0019
1PDO
988
5.8762
0.0361
0.0025
6LYZ
1001
4.8848
0.0342
0.0026
1LIT
1045
3.3452
0.0219
0.0012
1BJ7
1208
3.1121
0.0254
0.0015
2I1B
1219
8.8207
0.0349
0.0023
1MBS
1223
7.6788
0.0372
0.0025
1EMR
1232
6.8879
0.0289
0.0037
1CZT
1311
5.4416
0.0344
0.0012
2PTN
1629
4.6474
0.0286
0.0013
5PAD
1655
4.8401
0.0304
0.0007
1SUR
1739
3.1552
0.0304
0.0020
2HVM
2087
3.2928
0.0287
0.0022
2CYP
2299
5.3554
0.0478
0.0021
1RHD
2319
7.9767
0.0455
0.0034
2TMN
2432
7.6176
0.0414
0.0031
2TS1
2457
3.8035
0.0333
0.0026
1FRG
3361
3.7585
0.0313
0.0015
1MCP
3401
4.0348
0.0284
0.0020
av RUE
4.5577
0.0331
0.0020
av RMSD
9.352
0.045
0.004
Table 2
Relative
Unsigned Error (RUE) of the
Per-Atom Derivatives Calculated Using TRIFORCE, with Grid Dimensions
of 16 and 64, and LCPO Relative to Exact (GETAREA) Derivatives for
42 Proteins
RUE for
grad (Å)
LCPO
TRIFORCE
PDB code
heavy atoms
grid 16
grid 64
1PLX
40
9.030
0.164
0.005
1CBH
260
4.558
0.113
0.013
1SP2
269
6.743
0.181
0.026
5RXN
422
5.543
0.108
0.011
1I6F
436
6.408
0.138
0.043
4PTI
454
7.469
0.154
0.009
1FAS
468
6.032
0.114
0.013
1SN3
492
7.871
0.125
0.031
1CSP
505
5.817
0.114
0.015
2CRO
520
5.209
0.067
0.009
1FVQ
545
4.990
0.068
0.006
1SDF
550
6.004
0.122
0.007
1UBQ
602
6.049
0.107
0.008
1HIP
617
7.415
0.101
0.009
1PHT
666
6.425
0.172
0.014
1J5D
721
6.140
0.101
0.010
2CDV
801
6.614
0.108
0.014
1OPC
805
6.407
0.104
0.019
1KTE
818
5.425
0.080
0.010
1NSO
858
5.330
0.117
0.027
2PAZ
932
5.677
0.097
0.010
1CHN
966
5.060
0.071
0.008
1K40
976
4.625
0.071
0.012
1OOI
986
5.930
0.088
0.009
1PDO
988
6.853
0.124
0.013
6LYZ
1001
5.903
0.096
0.010
1LIT
1045
5.047
0.081
0.012
1BJ7
1208
5.741
0.104
0.014
2I1B
1219
5.988
0.109
0.018
1MBS
1223
6.590
0.174
0.059
1EMR
1232
6.636
0.232
0.059
1CZT
1311
4.568
0.079
0.009
2PTN
1629
5.307
0.120
0.031
5PAD
1655
5.750
0.103
0.009
1SUR
1739
6.108
0.179
0.020
2HVM
2087
5.081
0.088
0.008
2CYP
2299
5.640
0.108
0.016
1RHD
2319
5.015
0.092
0.011
2TMN
2432
5.050
0.087
0.014
2TS1
2457
5.207
0.114
0.009
1FRG
3361
4.780
0.083
0.013
1MCP
3401
5.409
0.086
0.008
av RUE
5.815
0.112
0.016
av RMSD
4.367
0.166
0.071
Table 1 and Table 2 show the relative unsigned
error (RUE) of per-atom areas and derivatives,
respectively, for this set of 42 proteins. At the bottom of these
tables we also report the average RUE and average root-mean-square
deviation (RMSD) over this set. In general, TRIFORCE shows significantly
improved agreement with the exact per-atom areas and derivatives over
the approximate LCPO technique in these comparisons. Note that while
the listed proteins are sorted by the number of heavy atoms, TRIFORCE
is not affected by protein size. LCPO is also not greatly affected
by the number of heavy atoms, though there appears to be a greater
degree of variability of the predicted per-atom areas when using this
approach. When we use an interpolation grid detail of 64 rather than
16 with TRIFORCE, we observe a roughly 15 times reduction in area
RUE and a 7 times reduction in derivative RUE. These calculated per-atom
areas and derivatives are numerically invariant upon rotation of the
molecules and choice of the tesselation axis, as the measured fluctuations
when rotating the molecule and/or the tesselation axis are smaller
than the errors reported in Table 1.We note that the real differences in the derivatives are better
than those shown in Table 2. The reason is
not imprecise calculations by TRIFORCE but differences between the
algorithms of GETAREA and TRIFORCE due to limited numerical precision:
both algorithms utilize different methods to determine whether a sphere
contributes to the exposed boundary or not. As an example, observe
the large difference in the derivatives of molecule “1MBS”. Here, a
single atomistic constellation expresses extreme properties. In this
constellation there are two interfaces in which one is buried almost
completely in the other. This leads to an ambiguous situation in which
GETAREA sees no intersection between the interfaces while TRIFORCE
sees the intersection. Consequently, TRIFORCE will calculate forces
for the extra interface while it is invisible for GETAREA. Due to
the ambiguous nature of these cases, there is no right or wrong result.
Both are correct within the choices made in algorithm implementation.Figure 10 shows the average per-atom RUE
of both TRIFORCE and LCPO areas and derivatives relative to exact
(GETAREA) values. We show the TRIFORCE results as a function of interpolation
grid detail. With an extremely coarse grid of 3 (3 × 3 ×
3), the TRIFORCE areas and derivatives are already closer to the exact
values than those reported by the statistically derived LCPO method.
With finer grids, the TRIFORCE results converge on the exact values.
Aside from the time required to load a grid in the initialization
of TRIFORCE, there is, in principle, no computational performance
penalty in choosing a finer grid. The primary cost as a function of
grid detail is the memory required for storing the grid values. To
test this, we performed a series of TRIFORCE surface area calculations
on egg white lysozyme (6LYZ: 1001 heavy atoms, 129 residues) on a 2.6 GHz Intel
Core i7 processor with 6.25 MB of accessible cache. For grid details
up to 32, the averaged real world performance of the TRIFORCE integration
time was 60 ms. Grid details of 64 and 80 have average integration
times of 70 ms, a greater than 15% performance penalty. This penalty
is due to cache overflow, which forces an increasing number of system
memory calls. All grids with details smaller than 32 will fit entirely
on this processorʼs cache while larger grids do not. For example,
a quadratic interpolation scheme with a grid detail of 16 uses 328
KB of memory while a grid detail of 64 uses 21 MB. We can modulate
these memory requirements to varying degrees by implementing different
interpolation schemes; however, we expect increasingly large grids
that overflow the cache to suffer from an increasing number of system
memory calls. This will lead to an increasing performance penalty
up to a hardware specific maximum, where nearly all of the grid access
steps of integration are done with system memory calls rather than
cache look-ups.
Figure 10
Relative unsigned error of per-atom areas and derivatives
over
a set of 42 proteins calculated using TRIFORCE and LCPO relative to
the analytical GETAREA method. We see that TRIFORCE error decreases
with increasing interpolation grid density.
Relative unsigned error of per-atom areas and derivatives
over
a set of 42 proteins calculated using TRIFORCE and LCPO relative to
the analytical GETAREA method. We see that TRIFORCE error decreases
with increasing interpolation grid density.We note that performance analyses such as those stated above
are
very much hardware- and implementation-dependent. This makes it difficult
to perform rigorous performance comparisons between TRIFORCE, analytical,
and even numerical approaches. Instead of head-to-head timings, we
attempt to provide performance guidance relative to the GETAREA analytical
method through operation counts from the published equations. Based
on the relative numbers of floating point addition, multiplication,
division, and square root operations, TRIFORCE integration will exhibit
an approximately 15% reduced latency over GETAREA analytical integration
on current Intel Core i7 processors. It should be noted that integration does not consider the construction of the Gauss–Bonnet
path, which has an identical cost for all analytical surface area
methods that require it and which is approximately an order of magnitude
more costly than TRIFORCE integration. We are currently preparing
a manuscript detailing how to accelerate Gauss–Bonnet path
constructions with little loss in numerical accuracy, bringing performance
in-line with TRIFORCE integration.Finally, while not overly
apparent in Figure 10, there is a slight bump
in the per-atom derivative RUE for
a grid detail of 25. This bump comes not from the TRIFORCE algorithm
but from a numerical artifact in the construction of this specific
grid. The integration routine used in its construction reached numerically
unstable regions, preventing calculation of this grid to the same
precision as the others. As the performance differences are minimal
when using neighboring grid details, we recommend not using this specific
interpolation grid.
Minimization Test of Particle
Surface Areas
We performed both steepest descent and Broyden–Fletcher–Goldfarb–Shanno
(BFGS) minimizations on a fluorene molecule (C13H10) to test correctness of the derivatives. No atomic bonding, repulsive,
torsional, or angular forces were included; as such the minimization
was expected to reduce the molecule into a single sphere. While this
is somewhat unphysical, it provides a stringent test of the algorithm
as it has a known target surface area, that of an inflated sphere
(vdW + water distance) of one single carbon with radius of 3.09 Å.
Figure 11 shows the averaged course of the
steepest descent minimization as a function of TRIFORCE grid detail,
each over 1000 randomized runs. After 80 steepest descent steps, the
averaged total area is mostly level and the system is nearly completely
collapsed. The BFGS minimizations (not shown) tended to converge somewhat
sooner at around 50 steps. Complete collapse is not observed with
either minimizer due to how the minimization algorithms treat buried
spheres. In the late stages of minimization, all spheres with radii
smaller than 3.09 Å are entirely buried inside the larger spheres,
and all spheres with radii of 3.09 Å are mostly covered by each
other. Because of their lack of SASA, the completely buried spheres
do not experience forces and are therefore invisible to the minimizer.
As such, these spheres are prone to resurface constantly when their
enveloping spheres shift positions. Similarly, the principal angles
become numerically less well defined when spheres nearly overlap,
exaggerating this resurfacing behavior.
Figure 11
Solvent-accessible
surface area of a fluorene molecule (C13H10)
over the course of a steepest descent minimization
without bonding or repulsive forces between atoms when using TRIFORCE
with different grid details. The molecule minimizes down to a single
sphere of approximately radius 3.09 Å, corresponding to the radius
of the largest component atom and illustrated here with the gray line.
The shaded regions are the standard deviation envelopes over 1000
randomized runs. While all shown grid details minimize at much the
same rate, coarser grids exhibit more numerical noise as spheres begin
to overlap and the principal angles become less well defined.
Coarser grid details
have less accurate derivatives, leading to greater numerical variability
in this minimization process, denoted by the shaded standard deviation
envelop regions in Figure 11. We attempted
even lower resolution grid details, but the derivatives become so
poor at a grid detail of 3 (RUE of derivatives, 5.9) that the minimizations
would fail to converge. Here, the atoms would move in mostly random
directions, separating the molecule into individual atomic spheres.
It appears that we need a grid detail of 4, which has an RUE of the
per-atom derivatives of 3.8, to ensure convergence. It is unclear
from this test if the derivatives from approximate surface area methods
such as LCPO (RUE of derivatives, 5.8) are themselves sufficiently
accurate to ensure convergence to the expected minimum structure.Solvent-accessible
surface area of a fluorene molecule (C13H10)
over the course of a steepest descent minimization
without bonding or repulsive forces between atoms when using TRIFORCE
with different grid details. The molecule minimizes down to a single
sphere of approximately radius 3.09 Å, corresponding to the radius
of the largest component atom and illustrated here with the gray line.
The shaded regions are the standard deviation envelopes over 1000
randomized runs. While all shown grid details minimize at much the
same rate, coarser grids exhibit more numerical noise as spheres begin
to overlap and the principal angles become less well defined.
Discussion
TRIFORCE has several features which distinguish it from algorithms
that perform equivalent tasks. It has the ability to segment the exposed
surface efficiently into triangular patches. This ability allows TRIFORCE
to accumulate not just surface areas but any precomputed quantity.
For such a purpose, the algorithm seamlessly allows for two adjustments:
(1) the selection of an arbitrary tessellation axis χ, to adjust for geometrical requirements and (2) the option to add
additional dimensions to the integration tables, to allow the modeling
of a functional surface with higher complexity.In contrast
to some algorithms that rely on united atom molecular
representations, TRIFORCE is capable of handling all topologies, including
concave circular interfaces. This is crucial because such interfaces
frequently occur in molecules where hydrogen atoms are buried in large
van der Waals radii of bonded atoms.Its tabular integration
procedure can be efficiently implemented
onto GPUs (development is in progress) in which this task becomes
a simple texture look-up. The whole algorithm has been designed to
allow for parallel processing on the level of circular interfaces,
in contrast to an atomic level. The nature of the algorithm structure
gives TRIFORCE the potential to utilize the GPUs or other massively
parallel architectures to full capacity. Integration tables are stored
in an architecture-independent way, enabling usage of the library
on different platforms right away.All calculations can be made
arbitrarily precise by increasing
the grid detail, as Figure 10 shows. Doing
this will require additional memory but will not slow down the algorithm,
aside from the initial loading procedure, if the grid is still small
enough to remain in the processing unit’s cache. TRIFORCE can
be accessed and run on the web from http://dillgroup.io. A downloadable version of the source code is also available on
this Web site, free of charge for academic purposes.
Conclusion
We have presented a novel algorithm for the calculation
of the
SASA and its derivatives for arbitrary molecules. Although the method
is partly numerical, errors in both of these quantities are marginal,
and we see performance and flexibility gains over fully analytical
approaches. We tested its correctness through an extensive set of
calculations on a database of 42 proteins of various sizes and folds.
We evaluated its robustness by successfully applying TRIFORCE in minimization
calculations of the SASA. To compensate for discontinuities in the
derivatives upon sphere contact, we have introduced a logistical smoothing
function which is able to bridge the gap in force space between two
distant spheres and their intersecting counterparts. Future direction
for the development of TRIFORCE involves complete parallelization
of the algorithm through implementation on GPUs and other massively
parallel architectures.
Authors: Manuel Rueda; Carles Ferrer-Costa; Tim Meyer; Alberto Pérez; Jordi Camps; Adam Hospital; Josep Lluis Gelpí; Modesto Orozco Journal: Proc Natl Acad Sci U S A Date: 2007-01-10 Impact factor: 11.205