Literature DB >> 25221446

TRIFORCE: Tessellated Semianalytical Solvent Exposed Surface Areas and Derivatives.

Nils J D Drechsel¹, Christopher J Fennell², Ken A Dill³, Jordi Villà-Freixa⁴.

Abstract

We present a new approach to the calculation of solvent-accessible surface areas of molecules with potential application to surface area based methods for determination of solvation free energies. As in traditional analytical and statistical approaches, this new algorithm, called TRIFORCE, reports both component areas and derivatives as a function of the atomic coordinates and radii. Unique to TRIFORCE are the rapid and scalable approaches for the determination of sphere intersection points and numerical estimation of the surface areas, derivatives, and other properties that can be associated with the surface area facets. The algorithm performs a special tessellation and semianalytical integration that uses a precomputed look-up table. This provides a simple way to balance numerical accuracy and memory usage. TRIFORCE calculates derivatives in the same manner, enabling application in force-dependent activities such as molecular geometry minimization. TRIFORCE is available free of charge for academic purposes as both a C++ library, which can be directly interfaced to existing molecular simulation packages, and a web-accessible application.

Entities: CellLine Chemical Disease Gene

Year: 2014 PMID： 25221446 PMCID： PMC4159216 DOI： 10.1021/ct5002818

Source DB: PubMed Journal: J Chem Theory Comput ISSN： 1549-9618 Impact factor: 6.006

Introduction

Molecular simulations often involve balancing between complexity and computational tractability. One common approximation that makes macromolecular simulations, such as protein folding and association, tractable is to use implicit rather than explicit solvent. Implicit solvents trade the detailed accounting of water, lipid, or other surrounding solvent molecules for an averaged effective solvent representation. To do this, the many explicit degrees of solvent freedom are integrated out of the system and replaced with free energies of solvation. Free energies of solvation have been observed to be well-correlated with the interfacial or solvent-accessible surface area (SASA) in the case of nonpolar solutes.[1−4] Many methods for implicit solvation have been developed which take advantage of this observation to provide quantitative predictions for solvation free energies using the SASA,[5−14] though it should be noted that surface area based approaches are not the only avenue for estimating such quantities.[15−17] To perform simulations with implicit solvents that depend upon the SASA, we need both per-atom SASAs and their derivatives. Several algorithms and methods have been developed which can compute these quantities spanning from analytically exact approaches[18−23] to statistical or numerically approximate approaches.[24−27] We are interested in performing molecular simulations in implicit solvents that report solvation free energies with a high degree of accuracy. Much of our motivation for the work presented within comes as a result of our experiences applying numerical surface area methods to these problems and encountering performance limitations with subsequent numerical derivatives that make them, while accurate, undesirable in general use. Analytical approaches available are also often performance limiting, and potentially faster statistical approaches are often marked by substantial nonuniform errors. Additionally, many methods are physically limited in some way, such as the inability to treat hydrogen atom contributions to the SASA. Thus, despite the large number of surface area algorithms in the scientific literature, we found there to be a surprising lack of readily accessible, working, and robust approaches that provide both accurate surface areas and accurate derivatives. Here, we present an alternative semianalytical approach for computing Lee–Richards SASAs and their derivatives from coarse-grained or all-atom representations of molecular systems.[28] This method, which we refer to as TRIFORCE, employs a precomputed look-up table that enables accelerated determination of component areas and derivatives as a function of the principal angles defining boundary interfaces. We refer to this as a semianalytical method because, while the process for determining these principal angles is analytical, we perform a numerical interpolation look-up on a discretized grid of surface areas and their derivatives as a function of these angles. We evaluate the correctness of this approach with comparisons to analytical methods and show that the numerical accuracy is determined by the density of this look-up table, with the primary cost being one of memory size and memory access rather than compute cycles. Our intent is for the TRIFORCE method to fill the apparent gap between the theoretical landscape of surface area algorithms and the realm of practical usage.

Area Calculation

In classical molecular simulations, molecules are modeled as a set of intersecting spheres S with radii corresponding to their extended van der Waals (vdW) surface. The radii for this extended vdW surface often include a solvent probe radius term (traditionally 1.4 Å) to best align this extended surface with the average distance to the surrounding solvent. This extended surface is also called the Lee–Richards solvent-accessible surface.[28] The solvent-accessible surface area A of a molecule is then calculated by adding up all solvent exposed areas of the set of spheres A( using A( is a complex shape that we represent as a sum of simpler component areas. These component areas are triangular patches which are constructed by a special tessellation of the exposed area of a sphere, as depicted in Figure 1. Here, the exposed area of central sphere S0 is composed of six triangular patches, each having two sides which are great circle segments and a third side which is a spherical arc.

Figure 1

Tessellation of the exposed area of sphere S0 by component triangles. The corners of the triangles are two consecutive intersection points, denoted by , and the tessellation point, χ̇. are intersections of circular interfaces I on the surface of S0. The boundary of the exposed area, which we will refer to as the exposed boundary, is composed of these spherical arcs, which are segments of circular interfaces I between spheres S that intersect with S. For example, circular interface I is located in the plane of intersection between S and S where the hulls touch each other. An intersection point exists between circular interfaces whenever two interfaces intersect on the exposed boundary. Two consecutive intersection points and on an exposed boundary, in conjunction with a tessellation point χ̇, form the three points of a triangular patch A(. χ̇ is the point on S where the tessellation axis χ intersects. χ is an arbitrary vector for each atom fixed for the duration of the calculations. Areas A( are calculated on a unit sphere and therefore have to be multiplied with the squared radius r of sphere S:which results in area Â(. Here, Seg denotes the set of all interface segments of the exposed boundaries on S. Patches A( can be either fully or fractionally part of the SASA, or the solvent excluded surface area (SESA), which is complement to the SASA, i.e., the area that is buried by neighbor spheres. Figure 2 shows that, over the course of the algorithm, triangular patches contribute positively if they are fully or fractionally part of the SASA (blue) or contribute negatively if they are fully part of the SESA (red).

Figure 2

Illustration of the tessellation of an exposed boundary when the tessellation-point is (left) inside or (right) outside the exposed boundary. When the tessellation point is outside the boundary, we compute both positive (blue) and negative (red) areas, the sum of which is equal to the result when the intersection point is inside the boundary. In addition, the interface segments of a patch A( can be either part of a convex or a concave interface. Figure 3 illustrates the distinction between these two types of interfaces. When a sphere with a smaller radius moves toward and begins to intersect a sphere with a larger radius, the sphere–sphere intersection which results is a convex interface. Once half the small sphere is occluded, this interface transitions to a concave interface. Concave interfaces can be seen as inverted convex interfaces, and an area computed using a concave interface will have opposite sign. Once patches A( are summed up to form Â(, this area will either be positive or negative, depending on whether SASA or SESA has been counted, respectively. If the area is a SESA, we compute the complement to convert it into a SASA. This is done by adding the surface area of a sphere with radius r to this negative Â(:

Figure 3

Illustration showing how an interface between two spheres transitions from convex to concave with increasing sphere overlap.

Illustration showing how an interface between two spheres transitions from convex to concave with increasing sphere overlap. Figure 4 shows the principal angles Φ, ψ, and λ used in determining the areas of a triangular patch. Each triangular patch A( is assembled from two subpatches split by the so-called tessellation plane, i.e., the plane orthogonal to the cross product of χ and the intersphere vector. Here, Φ is the angle between the tessellation plane and a vector from the center of I to a . The angle ψ is between the tessellation axis and the center of I. Finally, λ is the opening angle of the cone constructed from I and the center of sphere S.

Figure 4

Triangular patch formed by points , , and χ̇, formed from two subpatches, defined by principal angles Φ, ψ, and λ. The area of the subpatch is acquired from a simple look-up table T(Φ,ψ,λ) dependent on these principal angles. The area of each subpatch is looked-up and interpolated from a precomputed integration table T(Φ,ψ,λ). Φ can either be positive or negative depending on whether its corresponding intersection point is on the “right” or “left” side of the tessellation plane, which results in either a positive or negative area, respectively. The two subpatches are subtracted from each other to result in the full area of patch A(. From now on, whenever unambiguous, we omit the index (l) for simplicity. Throughout the work we will use the symbol · to denote the dot product, symbol ⊗ to denote the outer vector product, and symbol × to denote the cross-product. Whenever the cross-product is used between a matrix and a vector, a columnwise cross-product is assumed. Figure 5 highlights the key angles and vectors that factor into the calculation of the surface area by illustrating both concave and convex sphere intersection interfaces. An interface I is determined by two quantities: (1) the amount of penetration g* of S into S normalized to a unit sphere:where υ is a vector of length d between Cartesian coordinates x and x of spheres S and S. (2) Interface type f, which is either positive for convex or negative for concave interfaces, is given byPreviously we stated that concave interfaces are treated as inverted convex interfaces. Concave interfaces will have a negative g* which needs to be made positive:Angular calculations are performed on unit vectors:which include the radial distances between two interfaces I and I,as well as the opening angle to the interface,and the angle to the tessellation axis,Multiple circular interfaces will intersect if the sum of their conical opening angles λ and λ fall below their radial distance ρ, and if one does not entirely contain the other. The intersection between these circular interfaces will then result in two intersection points and . The cases in which opening angles exactly match their radial distance, or two interfaces are identical, are treated as if there were no intersection at all.

Figure 5

Sphere S intersects with two neighboring spheres S and S. The intersections result in one concave (red) and one convex (blue) circular interface, respectively, to which normalized vectors μ and μ point, with ρ as the angle between these normalized vectors. Vectors υ and υ connect S to the respective sphere centers x and x, and these vectors have magnitude d and d. g and g are simply the distances from the S center to interfaces I and I. The three look-up parameters are derived from these circular interfaces. Φ angles are calculated with respect to a specific interface, which is illustrated in Figure 6. Here, Φ angles for differ with respect to interface I or I. The curved arrows over the Φ indices represent the direction of the intersection. Seen from I, is viewed as an incoming point (connecting I with I) and as an outgoing point (connecting I with I). The respective angles for interface I are an incoming angle Φ↶ and an outgoing angle Φ↷.

Figure 6

Intersections between circular interfaces I, I, and I on sphere S resulting in intersection points , , , and . Φ angles are calculated with respect to each interface’s tessellation plane. Here, we show all of the Φ angles for interface I. Following the path indicated by the arrows, the SASA will be on the right-hand side. This path direction makes an incoming intersection point of I and an outgoing intersection point. Figure 7 illustrates how these incoming and outgoing angles are calculated utilizing the previously introduced quantities for circular interfaces. First, η is derived from the two opening angles λ and λ, and the angle between the two interface vectors ρ:η is the opening angle of the intersection between the interfaces. Next, a normal vector n to the tessellation plane is established:in which is the unnormalized vector. This calculation is followed by the computation of a vector normal to and , which we will denote as the normal to the interfacial plane of I and I:in which again describes the unnormalized vector. The angle between these two normal vectors determines the rotation of I around I:

Figure 7

Intersections between circular interfaces I and I resulting in and . A tessellation plane is formed between μ and χ. The calculation of Φ can be split into the calculation of angles ω and η. ω denotes the rotation of μ around μ with the tessellation plane as reference. η is the opening angle of the intersection of interfaces I and I. Whether this rotation is to the left or to the right depends on the orientation of χ with respect to the interfacial plane of I and I:resulting in the signed rotation:η and ω need to be combined into both outgoing Φ↷ and incoming Φ↶ angles:with γ↷ being either zero or π and γ↶ being either zero or −π, indicating whether the complement of ω needs to be calculated. Both quantities depend on the rotation of I around I and the magnitude of η:Utilizing the preceding equations, we can look-up the areas of two subpatches of a triangular patch A( from precomputed tables T(Φ↷,ψ,λ) and T(Φ↶,ψ,λ). As previously stated, the two subareas need to be subtracted from each other. Furthermore, if the outgoing angle is smaller than the incoming angle (indicated by q), the complementary area needs to be calculated:where M is the maximal area of a triangular patch for a given interface I specified by ψ and λ.

Exposed Boundary Segments

Each sphere can possibly have multiple contributions to A( in the form of multiple discontinuous segments of its circular interface. Each segment, described by a tuple (i,j,k), can be uniquely identified by its corresponding intersection points and . Segments containing belong to the exposed boundary if, generally speaking, they are not covered by any other interface. To calculate the three principal angles Φ, ψ, and λ for each intersection point of the boundary of the exposed area, it is necessary to find all segments (i,j,k) that contain these points. Each interface can possibly contain multiple segments for which the calculation is performed separately. For each interface a cyclic sorted list is established. In the course of the algorithm, Φ angles for intersection points of all intersecting interfaces are added to the list. After each addition, the list is pruned by removing occluded intersection points (and their Φ angles). Figure 8 illustrates this pruning process. Here, every node between two corresponding intersection points is removed. Corresponding intersection points are either the two points of the newly added interface (blue double-dashed area) or two points already present in the cyclic list (red single-dashed). At the conclusion of the algorithm, the list contains only intersection points that are on the boundary of the exposed surface.

Figure 8

Addition of interface I to the cyclic list of interface I, which already contains data from an intersection with I. Intersections Φ↷ and Φ↶ cause newly added intersection Φ↶ to be deleted (red single-dashed area). Intersections Φ↷ and Φ↶ cause old intersection Φ↶ to be deleted (blue double-dashed area). Only intersections Φ↷ and Φ↶ remain.

Discontinuities and Singularities

Some constellations of spheres express extreme properties, or singular behaviors, which can cause instabilities during force calculations. These result in artificially large forces due to numerical limits and force discontinuities due to the discrete nature of soft-sphere and soft-circle intersections.

Contact Point between Two Spheres

Imagine the following scenario for which key quantities are shown in Figure 9: two nonintersecting atoms S and S with radii of 2.0 Å approach one another. In the moment when the hulls just touch, i.e., when their distance is at precisely 4.0 Å, the derivative in the direction of the axis of separation abruptly jumps from zero to some nonzero value (illustrated by the continuous lines). Discontinuities of this type will act as large force barriers during a minimization or lead to numerical instabilities in the time-dependent evolution of the surface.[29] To compensate, we smooth the force and area around the contact point with a logistic function Λ(t). Figure 9 illustrates the effect of Λ(t). The dashed line shows a smooth transition in force space around the contact point. However, it also introduces a small hill in the corresponding region in area space. The change in area in this region is quite small, leading to only a minor perturbation of the computed surface area values. We calculate Λ(t) using a term t which depends on the distance between the spheres:Here, parameters p0 and p1 adjust the shape of the smoothing function. We set p0 = 0.15 and p1 = 8.0 in this implementation.

Figure 9

Illustration of the derivative (top) and area (bottom) calculated as a function of separation distance for two 2.0 Å radius spheres, with and without a distance-dependent logistic smoother. Without the smoother, derivatives rapidly jump from zero to some nonzero value upon contact. With the smoother, this discontinuity is mitigated in force space at the cost of adding a small hill in area space.

Interfacial Angular Singularities

The derivatives of the principal angles ψ, λ, and Φ (which involves η and ω) lead to the calculation of fractional terms in which the denominator vanishes once the angles approach either 0 or π. This gives rise to numerically anomalous very large (or infinite) forces. λ and η are two angles that can describe extreme geometrical constellations: a very small λ refers to two spheres that just barely touch, a very small η refers to two interfaces that just barely touch, and a very large η refers to interfaces of identical size with nearly perfect overlap. In all cases, we can use a threshold to circumvent calculation in these extreme angular regions. Doing so creates a small discontinuity which could be removed using a smoothing strategy similar to that used for sphere–sphere contact-point discontinuities. ψ angle singularities are entirely artificial due to the arbitrary nature of the tessellation axis χ. Singularities can be prevented by bringing χ into general position, i.e., choosing a χ which is not degenerate with respect to any ψ angle. ω angle singularities can be handled in the same way as ψ angle singularities, i.e., by choosing a general position with respect to ψ and ω. However, we have found this approach to be computationally intensive. Instead, we remove the singularity via a tessellation-axis transformation. If a degenerate ω angle is detected for a given derivative, a new χ* is chosen arbitrarily such that it is in a general position. We then calculate a normal vector n* to the new tessellation plane using this new axis. We calculate a transformed ω* with respect to the new axis using eqs 14–20. ω and ω* are related by the angle between the new and old tessellation plane τ: The original ω can be reconstructed:where the relationship between the new and old tessellation planes depend on the geometrical constellation of χ* and n. q0 indicates if the tessellation plane and the interfacial plane of I and I are similarly orientated with respect to the tessellation axis:q1 and q2 determine the angular relationship between τ and ω*, and whether the complement of ω* needs to be computed: The advantage of eq 32 is that it does not depend on the possibly singular angle between n and n. Instead, it depends on the guaranteed nonsingular angles ω* and τ for which partial derivatives can be robustly calculated. A different kind of singularity arises if an interface contains the mirror point of the tessellation point on the opposite side of the sphere. This would be the case if the sum of λ and ψ is equal to π or if it exceeds π. During the calculation of the integration tables, we map Φ angles to ϕ angles, which represent longitudinal lines that pass through the same intersection point (see the Appendix for a more detailed explanation of ϕ angles). The former case would map one Φ angle to an infinite number of ϕ angles. The latter would map Φ angles to ϕ angles in such a way that close to the singularity the look-up tables would be too sparse. This gives rise to imprecise areas and derivatives. To handle this, we split each sphere into two independently tessellated hemispheres. As a result, no Φ angles close to the singularity need be used.

Results

Correctness of Areas and Derivatives

We show the correctness of the area and derivatives by comparing to both exact and approximate values for a set of proteins and integration tables of different grid sizes. Here, we compare against the exact output of http://curie.utmb.edu/getarea.html and the approximate LCPO algorithm.[26] The set of proteins we chose includes ubiquitin, five structures of increasing size that were mentioned in ref (21), 14 structures from Decoys ‘Rʼ Us[30] that we have observed to be challenging for statistically derived area calculation algorithms, and 22 structures from ref (31) that represent a set of different folds. We list PDB codes for these structures in Table 1 and Table 2. We note that all of these protein structures lack hydrogen atoms due to limitations of both the exact GETAREA and approximate LCPO methods. TRIFORCE suffers from no such limitation.

Table 1

Relative Unsigned Error (RUE) of the Per-Atom Surface Areas Calculated Using TRIFORCE, with Grid Dimensions of 16 and 64, and LCPO Relative to Exact (GETAREA) Surface Areas for 42 Proteins

		RUE of area (Å²)
		LCPO	TRIFORCE
PDB code	heavy atoms		grid 16	grid 64
1PLX	40	0.9602	0.0070	0.0001
1CBH	260	1.8208	0.0148	0.0021
1SP2	269	1.6250	0.0183	0.0018
5RXN	422	3.8093	0.0372	0.0020
1I6F	436	3.8105	0.0486	0.0003
4PTI	454	4.2465	0.0344	0.0012
1FAS	468	3.7621	0.0333	0.0058
1SN3	492	3.4444	0.0390	0.0008
1CSP	505	1.8924	0.0241	0.0012
2CRO	520	5.6744	0.0239	0.0006
1FVQ	545	5.7127	0.0465	0.0022
1SDF	550	5.5041	0.0374	0.0034
1UBQ	602	2.6297	0.0178	0.0008
1HIP	617	3.9922	0.0266	0.0011
1PHT	666	4.1957	0.0460	0.0017
1J5D	721	4.9532	0.0614	0.0031
2CDV	801	5.4789	0.0316	0.0020
1OPC	805	5.0891	0.0361	0.0039
1KTE	818	5.4651	0.0401	0.0021
1NSO	858	4.2843	0.0279	0.0019
2PAZ	932	2.6954	0.0303	0.0018
1CHN	966	3.7557	0.0273	0.0017
1K40	976	3.4175	0.0206	0.0004
1OOI	986	5.0779	0.0348	0.0019
1PDO	988	5.8762	0.0361	0.0025
6LYZ	1001	4.8848	0.0342	0.0026
1LIT	1045	3.3452	0.0219	0.0012
1BJ7	1208	3.1121	0.0254	0.0015
2I1B	1219	8.8207	0.0349	0.0023
1MBS	1223	7.6788	0.0372	0.0025
1EMR	1232	6.8879	0.0289	0.0037
1CZT	1311	5.4416	0.0344	0.0012
2PTN	1629	4.6474	0.0286	0.0013
5PAD	1655	4.8401	0.0304	0.0007
1SUR	1739	3.1552	0.0304	0.0020
2HVM	2087	3.2928	0.0287	0.0022
2CYP	2299	5.3554	0.0478	0.0021
1RHD	2319	7.9767	0.0455	0.0034
2TMN	2432	7.6176	0.0414	0.0031
2TS1	2457	3.8035	0.0333	0.0026
1FRG	3361	3.7585	0.0313	0.0015
1MCP	3401	4.0348	0.0284	0.0020

av RUE		4.5577	0.0331	0.0020
av RMSD		9.352	0.045	0.004

Table 2

Relative Unsigned Error (RUE) of the Per-Atom Derivatives Calculated Using TRIFORCE, with Grid Dimensions of 16 and 64, and LCPO Relative to Exact (GETAREA) Derivatives for 42 Proteins

		RUE for grad (Å)
		LCPO	TRIFORCE
PDB code	heavy atoms		grid 16	grid 64
1PLX	40	9.030	0.164	0.005
1CBH	260	4.558	0.113	0.013
1SP2	269	6.743	0.181	0.026
5RXN	422	5.543	0.108	0.011
1I6F	436	6.408	0.138	0.043
4PTI	454	7.469	0.154	0.009
1FAS	468	6.032	0.114	0.013
1SN3	492	7.871	0.125	0.031
1CSP	505	5.817	0.114	0.015
2CRO	520	5.209	0.067	0.009
1FVQ	545	4.990	0.068	0.006
1SDF	550	6.004	0.122	0.007
1UBQ	602	6.049	0.107	0.008
1HIP	617	7.415	0.101	0.009
1PHT	666	6.425	0.172	0.014
1J5D	721	6.140	0.101	0.010
2CDV	801	6.614	0.108	0.014
1OPC	805	6.407	0.104	0.019
1KTE	818	5.425	0.080	0.010
1NSO	858	5.330	0.117	0.027
2PAZ	932	5.677	0.097	0.010
1CHN	966	5.060	0.071	0.008
1K40	976	4.625	0.071	0.012
1OOI	986	5.930	0.088	0.009
1PDO	988	6.853	0.124	0.013
6LYZ	1001	5.903	0.096	0.010
1LIT	1045	5.047	0.081	0.012
1BJ7	1208	5.741	0.104	0.014
2I1B	1219	5.988	0.109	0.018
1MBS	1223	6.590	0.174	0.059
1EMR	1232	6.636	0.232	0.059
1CZT	1311	4.568	0.079	0.009
2PTN	1629	5.307	0.120	0.031
5PAD	1655	5.750	0.103	0.009
1SUR	1739	6.108	0.179	0.020
2HVM	2087	5.081	0.088	0.008
2CYP	2299	5.640	0.108	0.016
1RHD	2319	5.015	0.092	0.011
2TMN	2432	5.050	0.087	0.014
2TS1	2457	5.207	0.114	0.009
1FRG	3361	4.780	0.083	0.013
1MCP	3401	5.409	0.086	0.008

av RUE		5.815	0.112	0.016
av RMSD		4.367	0.166	0.071

Table 1 and Table 2 show the relative unsigned error (RUE) of per-atom areas and derivatives, respectively, for this set of 42 proteins. At the bottom of these tables we also report the average RUE and average root-mean-square deviation (RMSD) over this set. In general, TRIFORCE shows significantly improved agreement with the exact per-atom areas and derivatives over the approximate LCPO technique in these comparisons. Note that while the listed proteins are sorted by the number of heavy atoms, TRIFORCE is not affected by protein size. LCPO is also not greatly affected by the number of heavy atoms, though there appears to be a greater degree of variability of the predicted per-atom areas when using this approach. When we use an interpolation grid detail of 64 rather than 16 with TRIFORCE, we observe a roughly 15 times reduction in area RUE and a 7 times reduction in derivative RUE. These calculated per-atom areas and derivatives are numerically invariant upon rotation of the molecules and choice of the tesselation axis, as the measured fluctuations when rotating the molecule and/or the tesselation axis are smaller than the errors reported in Table 1. We note that the real differences in the derivatives are better than those shown in Table 2. The reason is not imprecise calculations by TRIFORCE but differences between the algorithms of GETAREA and TRIFORCE due to limited numerical precision: both algorithms utilize different methods to determine whether a sphere contributes to the exposed boundary or not. As an example, observe the large difference in the derivatives of molecule “1MBS”. Here, a single atomistic constellation expresses extreme properties. In this constellation there are two interfaces in which one is buried almost completely in the other. This leads to an ambiguous situation in which GETAREA sees no intersection between the interfaces while TRIFORCE sees the intersection. Consequently, TRIFORCE will calculate forces for the extra interface while it is invisible for GETAREA. Due to the ambiguous nature of these cases, there is no right or wrong result. Both are correct within the choices made in algorithm implementation. Figure 10 shows the average per-atom RUE of both TRIFORCE and LCPO areas and derivatives relative to exact (GETAREA) values. We show the TRIFORCE results as a function of interpolation grid detail. With an extremely coarse grid of 3 (3 × 3 × 3), the TRIFORCE areas and derivatives are already closer to the exact values than those reported by the statistically derived LCPO method. With finer grids, the TRIFORCE results converge on the exact values. Aside from the time required to load a grid in the initialization of TRIFORCE, there is, in principle, no computational performance penalty in choosing a finer grid. The primary cost as a function of grid detail is the memory required for storing the grid values. To test this, we performed a series of TRIFORCE surface area calculations on egg white lysozyme (6LYZ: 1001 heavy atoms, 129 residues) on a 2.6 GHz Intel Core i7 processor with 6.25 MB of accessible cache. For grid details up to 32, the averaged real world performance of the TRIFORCE integration time was 60 ms. Grid details of 64 and 80 have average integration times of 70 ms, a greater than 15% performance penalty. This penalty is due to cache overflow, which forces an increasing number of system memory calls. All grids with details smaller than 32 will fit entirely on this processorʼs cache while larger grids do not. For example, a quadratic interpolation scheme with a grid detail of 16 uses 328 KB of memory while a grid detail of 64 uses 21 MB. We can modulate these memory requirements to varying degrees by implementing different interpolation schemes; however, we expect increasingly large grids that overflow the cache to suffer from an increasing number of system memory calls. This will lead to an increasing performance penalty up to a hardware specific maximum, where nearly all of the grid access steps of integration are done with system memory calls rather than cache look-ups.

Figure 10

Relative unsigned error of per-atom areas and derivatives over a set of 42 proteins calculated using TRIFORCE and LCPO relative to the analytical GETAREA method. We see that TRIFORCE error decreases with increasing interpolation grid density. We note that performance analyses such as those stated above are very much hardware- and implementation-dependent. This makes it difficult to perform rigorous performance comparisons between TRIFORCE, analytical, and even numerical approaches. Instead of head-to-head timings, we attempt to provide performance guidance relative to the GETAREA analytical method through operation counts from the published equations. Based on the relative numbers of floating point addition, multiplication, division, and square root operations, TRIFORCE integration will exhibit an approximately 15% reduced latency over GETAREA analytical integration on current Intel Core i7 processors. It should be noted that integration does not consider the construction of the Gauss–Bonnet path, which has an identical cost for all analytical surface area methods that require it and which is approximately an order of magnitude more costly than TRIFORCE integration. We are currently preparing a manuscript detailing how to accelerate Gauss–Bonnet path constructions with little loss in numerical accuracy, bringing performance in-line with TRIFORCE integration. Finally, while not overly apparent in Figure 10, there is a slight bump in the per-atom derivative RUE for a grid detail of 25. This bump comes not from the TRIFORCE algorithm but from a numerical artifact in the construction of this specific grid. The integration routine used in its construction reached numerically unstable regions, preventing calculation of this grid to the same precision as the others. As the performance differences are minimal when using neighboring grid details, we recommend not using this specific interpolation grid.

Minimization Test of Particle Surface Areas

We performed both steepest descent and Broyden–Fletcher–Goldfarb–Shanno (BFGS) minimizations on a fluorene molecule (C13H10) to test correctness of the derivatives. No atomic bonding, repulsive, torsional, or angular forces were included; as such the minimization was expected to reduce the molecule into a single sphere. While this is somewhat unphysical, it provides a stringent test of the algorithm as it has a known target surface area, that of an inflated sphere (vdW + water distance) of one single carbon with radius of 3.09 Å. Figure 11 shows the averaged course of the steepest descent minimization as a function of TRIFORCE grid detail, each over 1000 randomized runs. After 80 steepest descent steps, the averaged total area is mostly level and the system is nearly completely collapsed. The BFGS minimizations (not shown) tended to converge somewhat sooner at around 50 steps. Complete collapse is not observed with either minimizer due to how the minimization algorithms treat buried spheres. In the late stages of minimization, all spheres with radii smaller than 3.09 Å are entirely buried inside the larger spheres, and all spheres with radii of 3.09 Å are mostly covered by each other. Because of their lack of SASA, the completely buried spheres do not experience forces and are therefore invisible to the minimizer. As such, these spheres are prone to resurface constantly when their enveloping spheres shift positions. Similarly, the principal angles become numerically less well defined when spheres nearly overlap, exaggerating this resurfacing behavior.

Figure 11

Solvent-accessible surface area of a fluorene molecule (C13H10) over the course of a steepest descent minimization without bonding or repulsive forces between atoms when using TRIFORCE with different grid details. The molecule minimizes down to a single sphere of approximately radius 3.09 Å, corresponding to the radius of the largest component atom and illustrated here with the gray line. The shaded regions are the standard deviation envelopes over 1000 randomized runs. While all shown grid details minimize at much the same rate, coarser grids exhibit more numerical noise as spheres begin to overlap and the principal angles become less well defined.

Coarser grid details have less accurate derivatives, leading to greater numerical variability in this minimization process, denoted by the shaded standard deviation envelop regions in Figure 11. We attempted even lower resolution grid details, but the derivatives become so poor at a grid detail of 3 (RUE of derivatives, 5.9) that the minimizations would fail to converge. Here, the atoms would move in mostly random directions, separating the molecule into individual atomic spheres. It appears that we need a grid detail of 4, which has an RUE of the per-atom derivatives of 3.8, to ensure convergence. It is unclear from this test if the derivatives from approximate surface area methods such as LCPO (RUE of derivatives, 5.8) are themselves sufficiently accurate to ensure convergence to the expected minimum structure. Solvent-accessible surface area of a fluorene molecule (C13H10) over the course of a steepest descent minimization without bonding or repulsive forces between atoms when using TRIFORCE with different grid details. The molecule minimizes down to a single sphere of approximately radius 3.09 Å, corresponding to the radius of the largest component atom and illustrated here with the gray line. The shaded regions are the standard deviation envelopes over 1000 randomized runs. While all shown grid details minimize at much the same rate, coarser grids exhibit more numerical noise as spheres begin to overlap and the principal angles become less well defined.

Discussion

TRIFORCE has several features which distinguish it from algorithms that perform equivalent tasks. It has the ability to segment the exposed surface efficiently into triangular patches. This ability allows TRIFORCE to accumulate not just surface areas but any precomputed quantity. For such a purpose, the algorithm seamlessly allows for two adjustments: (1) the selection of an arbitrary tessellation axis χ, to adjust for geometrical requirements and (2) the option to add additional dimensions to the integration tables, to allow the modeling of a functional surface with higher complexity. In contrast to some algorithms that rely on united atom molecular representations, TRIFORCE is capable of handling all topologies, including concave circular interfaces. This is crucial because such interfaces frequently occur in molecules where hydrogen atoms are buried in large van der Waals radii of bonded atoms. Its tabular integration procedure can be efficiently implemented onto GPUs (development is in progress) in which this task becomes a simple texture look-up. The whole algorithm has been designed to allow for parallel processing on the level of circular interfaces, in contrast to an atomic level. The nature of the algorithm structure gives TRIFORCE the potential to utilize the GPUs or other massively parallel architectures to full capacity. Integration tables are stored in an architecture-independent way, enabling usage of the library on different platforms right away. All calculations can be made arbitrarily precise by increasing the grid detail, as Figure 10 shows. Doing this will require additional memory but will not slow down the algorithm, aside from the initial loading procedure, if the grid is still small enough to remain in the processing unit’s cache. TRIFORCE can be accessed and run on the web from http://dillgroup.io. A downloadable version of the source code is also available on this Web site, free of charge for academic purposes.

Conclusion

We have presented a novel algorithm for the calculation of the SASA and its derivatives for arbitrary molecules. Although the method is partly numerical, errors in both of these quantities are marginal, and we see performance and flexibility gains over fully analytical approaches. We tested its correctness through an extensive set of calculations on a database of 42 proteins of various sizes and folds. We evaluated its robustness by successfully applying TRIFORCE in minimization calculations of the SASA. To compensate for discontinuities in the derivatives upon sphere contact, we have introduced a logistical smoothing function which is able to bridge the gap in force space between two distant spheres and their intersecting counterparts. Future direction for the development of TRIFORCE involves complete parallelization of the algorithm through implementation on GPUs and other massively parallel architectures.

18 in total

1. Effective energy function for proteins in solution.

Authors: T Lazaridis; M Karplus
Journal: Proteins Date: 1999-05-01

2. Decoys 'R' Us: a database of incorrect conformations to improve protein structure prediction.

Authors: R Samudrala; M Levitt
Journal: Protein Sci Date: 2000-07 Impact factor: 6.725

3. The SGB/NP hydration free energy model based on the surface generalized born solvent reaction field and novel nonpolar hydration free energy estimators.

Authors: Emilio Gallicchio; Linda Yu Zhang; Ronald M Levy
Journal: J Comput Chem Date: 2002-04-15 Impact factor: 3.376

TRIFORCE: Tessellated Semianalytical Solvent Exposed Surface Areas and Derivatives.

Introduction

Area Calculation

Exposed Boundary Segments

Discontinuities and Singularities

Contact Point between Two Spheres

Interfacial Angular Singularities

Results

Correctness of Areas and Derivatives

Minimization Test of Particle Surface Areas

Discussion

Conclusion

1. Effective energy function for proteins in solution.

2. Decoys 'R' Us: a database of incorrect conformations to improve protein structure prediction.

3. The SGB/NP hydration free energy model based on the surface generalized born solvent reaction field and novel nonpolar hydration free energy estimators.

4. Analytical approximation to the accessible surface area of proteins.

5. Assessing implicit models for nonpolar mean solvation forces: the importance of dispersion and volume terms.

6. A consensus view of protein dynamics.

7. Implicit nonpolar solvent models.

8. Protein folding and association: insights from the interfacial and thermodynamic properties of hydrocarbons.

9. Derivatives of molecular surface area and volume: simple and exact analytical formulas.

10. The AGBNP2 Implicit Solvation Model.

1. Computational Signaling Protein Dynamics and Geometric Mass Relations in Biomolecular Diffusion.

2. FreeSASA: An open source C library for solvent accessible surface area calculations.