Literature DB >> 24673391

Aligning experimental and theoretical anisotropic B-factors: water models, normal-mode analysis methods, and metrics.

Abstract

The strength of X-ray crystallography in providing the information for protein dynamics has been under appreciated. The anisotropic B-factors (ADPs) from high-resolution structures are invaluable in studying the relationship among structure, dynamics, and function. Here, starting from an in-depth evaluation of the metrics used for comparing the overlap between two ellipsoids, we applied normal-mode analysis (NMA) to predict the theoretical ADPs and then align them with experimental results. Adding an extra layer of explicitly treated water on protein surface significantly improved the energy minimization results and better reproduced the anisotropy of experimental ADPs. In comparing experimental and theoretical ADPs, we focused on the overlap in shape, the alignment of dominant directions, and the similarity in magnitude. The choices of water molecules, NMA methods, and the metrics for evaluating the overlap of ADPs determined final results. This study provides useful information for exploring the physical basis and the application potential of experimental ADPs.

Entities: Chemical Disease Species

Mesh：

Substances：
Proteins
Water

Year: 2014 PMID： 24673391 PMCID： PMC4397101 DOI： 10.1021/jp4124327

Source DB: PubMed Journal: J Phys Chem B ISSN： 1520-5207 Impact factor: 2.991

Introduction

X-ray crystallography sets up the gold standard for the determination of 3D atomic positions, but its potential in providing the information for protein dynamics has received less attention. In a typical protein PDB file, each atom occupies one line that contains five entries, with three for the XYZ coordinates, one for the occupancy, and one for the temperature or B-factor. The value of B-factor represents the magnitude of the electron density in a spherical and isotropic shape. However, in high-resolution protein structures, each atom occupies two lines and the additional line contains six integers that define a symmetric tensor for the anisotropic B-factor, which is also called anisotropic displacement parameter (ADP).[1−3] ADP represents an asymmetrical and multimodal distribution of electron density in the shape of an ellipsoid, which according to the Born–Oppenheimer approximation reflects the anisotropic movement of the atom nucleus.[4,5] In the current PDB database, the high-resolution structures (<1.2 Å) compose about 2.5% of the total deposits. The ADPs from those structures embed a trove of invaluable information for studying the intriguing relationship between protein structure and dynamics. The notion that structure, dynamics, and function are inseparable in the study of protein biophysics is being increasingly recognized. According to the vibration frequency and amplitude, protein dynamics can be arranged from the high-frequency local movements such as side chain tumbling, to the low-frequency correlated motions that involves every element in the whole molecule.[6−9] It is believed that the motions of the lowest vibrational frequency and also of the largest vibrational amplitude carry the most significance for protein function.[10−13] These molecular motions actually define the pathway that the protein molecule traces in fulfilling the biological function. The lowest frequency and the largest amplitude can be translated into maximal conformation changes with minimal energy costs. Two computational approaches have been used to study protein dynamics: normal-mode analysis (NMA), an analytical approach based on a harmonic approximation to the protein energy surface, and principal component analysis (PCA), a statistical approach based on the sampling of the conformation space by molecular dynamics (MD) or Monte Carlo simulations.[14−17] Classical NMA starts from a protein structure that is assumed to be at the local minimum on energy surface. The first step is to construct the Hessian Matrix, which contains the second derivative of the system potential energy or the effective force constant between each pair of atoms. Diagonalization of the Hessian matrix yields eigenvalues, corresponding to the vibration frequency (or the amplitude) of the collective molecular motions, and the corresponding eigenvectors, representing the direction of the motions.[18] For a nonlinear system containing N particles, the degree of freedom (DOF) related with vibration is 3N – 6, with the six DOF of the translation and rotation of the whole molecule removed. NMA is mathematically rigorous and produces 3N – 6 orthogonal eigenvectors paired with positive eigenvalues. However, the PCA based on MD simulation involves the construction of the covariance matrix based on an ensemble of protein conformations. Compared to NMA, major advantages of PCA include the consideration of the anharmonic behavior of proteins, which could be attributed to the proper treatment of temperature, pressure, and more importantly, solvent, by the MD simulation. Classical NMA is based on the presentation of the system by all-atom or united-atom force fields. Because of expensive costs of the physical memory to store the all-atom Hessian matrix (O(N2)) and the CPU time to diagonalize the matrix (O(N3)), the application of all-atom NMA (AANM) has been limited. Currently, the storage of Hessian matrix is not the major issue given technical advances in the handling of sparse matrixes, but the matrix diagonalization is still challenging. Iterative algorithms developed for this purpose include the diagonalization in a mixed basis method (DIMB)[19,20] and the Lanczos/Arnoldi factorization method[21] adopted in computational packages of CHARMM[22] and GROMACS,[23] respectively. These iterative methods are still very time-consuming and only yield a small fraction of the total eigenvectors. Therefore, AANM has been mainly applied to protein systems containing at most a few hundred residues and without explicitly treated water molecules. Various coarse-grained NMA approaches have been pursued to improve the computational efficiency and thus the applicability of NMA to large systems.[24−27] Methods rooted in the all-atom presentation of the system include the rotation translation block (RTB) (also called block normal mode or BNM), which presents the system with a series of rigid blocks,[28,29] and the method that partitions the matrix to relevant and nonrelevant parts (CGNM).[30−32] In addition, coarse-gained NMA methods based on much simplified force-fields have been developed, represented by the anisotropic network model (ANM).[33,34] ANM is the NMA method based on the elastic network model (ENM) (or Gaussian network model, GNM), which only considers C-α atoms and applies a unified force constant (1 kcal/mol/Å2).[35,36] Despite such dramatic simplifications, ENM effectively captures the essentials of the intrinsic connections between structure and dynamics and have been successfully applied to large macromolecules and assemblies that have been beyond the capability of traditional methods.[37−41] A major advantage of the NMA methods based on all-atom Hessian matrix, including the methods of AANM, RTB, and CGNM, is the incorporation of detailed chemical information embedded in the structure, including structural and surface water molecules.[30] However, because of the expensive computational cost, only a few studies have included explicitly treated water molecular and addressed their effects on protein dynamics.[30,42−46] These studies revealed that surface water, or the hydration layer, has significant impacts on protein structure and dynamics. Especially, having drastically different physical–chemical properties from the bulk water, the surface water contributes to the total atomic fluctuations, reduces the amplitude of protein fluctuations, and shifts the spectrum of molecular motions toward higher vibration frequency. These pieces of information, in conjunction with the MD simulations that explicitly treat both surface and bulk solvent molecules, provide more complete understanding of the unique contribution by surface water. For both AANM and coarse-grained NMA methods, it is essential to study the feature and applicability of each method. The information on protein dynamics provided by experimental approaches, such as the X-ray crystallography, NMR, and other spectroscopic methods, are useful references. Although the harmonic approximation adopted by NMA obliviates the anharmonic behavior of protein molecules, NMA has been confirmed to be able to capture the essentials of protein dynamics in the vicinity of a local minimum on protein energy surface.[32] Notably, the majority of X-ray data sets were obtained under the cryogenic temperature of 100 K, well below the “glass transition” temperature proposed for protein molecules (160–220 K), when the protein behavior starts to show more signs of anharmonicity.[47] Thus, NMA is especially suitable to reproduce the protein dynamics under cryogenic conditions, when the protein molecule behaves more harmonically and the bulk solvent has less influence over the intrinsic motions. At atomic level, protein dynamics is reflected in the amplitude and the direction of the thermal fluctuations of each atom. The isotropic thermal factors from X-ray crystal structures used to be the major criteria in evaluating the NMA results. Comparing the absolute amplitude of the B-factor of each atom is not very meaningful, as it is known that NMA tends to yield smaller atomic fluctuations. Most studies use the linear correlation coefficient (cciso) as the metric for the overall alignment between experimental and theoretical B-factors for a single protein, which varies between 1 (perfect correlation), 0 (no correlation), and −1 (perfect anticorrelation) (eq 11).[48] Extensive efforts and considerations have been devoted to improve the prediction results, including the choice between all-atom and coarse-grained methods, the inclusion of solvent molecules, the different setting of cutoff radius for ENM, the location of the residue relative to the whole molecule or certain structural features, the consideration of crystalline environment, etc. However, it has been puzzling that most of the cciso results fluctuate around 0.6.[37,49,50] Interestingly, ENM-based methods yielded better alignments with NMR data than with X-ray data, with the cciso above 0.7.[51,52] These facts manifest the limitation of the isotropic B-factor from X-ray data set as the reference in evaluating the NMA results, which could be attributed to the fact that the only useful information provided by isotropic B-factor is the relative flexibility along the primary sequence within each protein. When the resolution of the X-ray data set reached the level of 1.2 Å or above, it becomes possible to fit the atomic thermal fluctuation with more sophisticated multimodal distributions. In recent years, the ADPs from high resolution X-ray structures have increasingly been used as the reference to measure the NMA results. Compared to the conventional isotropic B-factor, ADP is superior because ADP carries not only the information about the amplitude of fluctuations but more importantly the spatially distributed fluctuations in electron density. Important information about protein dynamics, more specifically the directionality of collective molecular motions, is embedded in these anisotropic fluctuations at atomic level. Therefore, ADP, as defined by a symmetric 3 × 3 matrix for each atom, provides clearer and richer information than a single scalar value and thus is more useful in evaluating the NMA results. The topic of aligning the experimental and theoretical ADPs has been attempted numerous times, but a major technical hurdle still exists: the suitable metrics to compare the overlap in shape between two ADPs.[37,51,53,54] On the basis of the definition in X-ray crystallography for the correlation between two electron-density maps, Merritt developed the formulation for the correlation coefficient (cc) to quantify the overlap between two ellipsoids in real space (eq 7).[55] However, as noticed from the very beginning, cc is heavily influenced by the degree of anisotropy and biased toward the ADPs that are less anisotropic or close to a sphere in shape (Figure 1). Moreover, cc is sensitive to the amplitude of the isotropic components of two ellipsoids. These factors prompted the development of another two metrics, modified cc (eq 8)[37,50] and normalized cc (eq 10).[55] Modified cc involves the construction of a perfectly misaligned spheroid (keep the eigenvectors but using the eigenvalues, with the smallest and the largest switched, from the other spheroid). Modified cc varies between 1 (perfectly aligned) and 0 (perfectly misaligned). For normalized cc (cc normalized to the overlaps with isotropic sphere of both spheroids), the value will be greater than 1 if two ellipsoids are more similar to each other than the isotropic sphere and less than 1 if otherwise. Another two metrics, Kullback–Leibler (KL) distance and Pearson correlation, have also been used.[53,56,57] Although these different metrics have been repeatedly used in the literature, a detailed investigation of these metrics, especially with regard to the degree of anisotropy, the type, and the orientation of spheroids, is lacking.

Figure 1

Correlation coefficient (cc) for measuring the overlap between two spheroids. (A) Schematic drawings of a sphere, a prolate (cigar-shaped spheroid), and an oblate (pancake-shaped spheroid). The anisotropies for the three objects are 1.0, 0.2, and 0.2, respectively. (B) cc for the overlap between two prolates of the same anisotropy. Red, perfectly aligned prolates; black, averaged cc of randomly orientated prolates; green, perfectly misaligned prolates. Blue trace shows the overlap between a prolate and an isotropic sphere. (C) cc for the overlap between two oblates of the same anisotropy. Red, perfectly aligned oblates; black, averaged cc of randomly orientated oblates; green, perfectly misaligned oblates. Blue trace shows the overlap between an oblate and an isotropic sphere. (D) Averaged cc for randomly oriented prolate and oblate of the same anisotropy (red trace). The traces for prolate–prolate (green) and oblate–oblate (black) are from B and C, respectively.

In this study, we started from an in-depth evaluation of the metrics used for evaluating the overlap between two ellipsoids. We focused on the influence on the results by the degree of anisotropy, which could be affected by the choice of computational methods and the selection of the pool of eigenvectors in calculating the theoretical ADP. We compared different NMA methods in predicting ADP and used both all-atom force field based and coarse-grained methods. Moreover, we studied the effects by surface water and found that adding a layer of explicitly treated water molecules dramatically improves the energy minimization (EM) results and has certain impacts on the results.

Theoretical Methods

Treatment of Experimental ADP

The ADPs in high-resolution structures are listed as six integers in the order of U11, U22, U33, U12, U13, and U23. Each of these factors has been scaled up by a factor of 10 000, and the unit is in Å2. These six ADPs define a symmetric 3 × 3 tensor:which defines the shape of an ellipsoid for each atom. Each tensor/ellipsoid has three prominent or principal axes that represent the peaks of three Gaussian distributions. Similar to the isotropic B-factor, the length of each of the axes is inversely related with the electron density. The three columns (or rows) of the tensor represent the projections of three principal axes of an ellipsoid to the orthogonal XYZ coordinate system used to define the atomic positions. Using a simple matrix manipulation (diagonalization), the above 3 × 3 tensor can be transformed into a diagonal matrix:which contains three eigenvalues (W). The corresponding eigenvectors stored in the matrix R (or R) define the direction of three prominent axes of the ADP ellipsoid in the orthogonal XYZ coordinate system, and W corresponds to the distribution of the electron density along each of those prominent axes. The degree of anisotropy is defined as the ratio between W11 and W33 and varies between 0 and 1 (sphere). On the basis of the value of W, the equivalent isotropic B-factor can be calculated based on the following equation:

Energy Minimization of the Crystal Structure and Normal-Mode Analysis (NMA)

High resolution structures that contain experimental anisotropic B-factors were downloaded from PDB (see Table S1, Supporting Information, for the list). Water molecules within 3 Å of the protein molecule from the crystal structure were kept during the calculations. First, Modeler was used to fix the missing side chains and residues within the structure.[58] The changes to the overall structure through the Modeler step were minimal (RMSD of C-α atoms, 0.09 ± 0.007Å). The initial crystal structure was energy minimized at double precision. The methods used are steepest-descent (SD), conjugate-gradient (CG), and limited-memory Broyden–Fletcher–Goldfarb–Shanno (L-BFGS).[23] The force field GROMOS96 (53a6) was used. During the SD energy minimization step, oxygen atoms of the structural water and heavy atoms of the protein were position-restrained. The electrostatic energy was described by a switch function with the distance for normal treatment set at 15 Å and the cutoff distance set at 18 Å.[59] Two programs from the GROMACS 4.6.3 package, mdrun and g_nmeig,[23] were used to produce and diagonalize the Hessian matrix, respectively. The eigenvalues and eigenvectors were saved as the results of the NMA based on all-atom (or united-atom) force fields (AANM). We also tested three different coarse-grained NMA: the NMA based on the elastic network model (ENM),[34] the block NMA (BNM) based on a representation of the whole molecule by rigid blocks,[29] and the NMA based on partitioning the all-atom Hessian matrix (CGNM).[30−32] For ENM, the coordinates of the C-α atoms from the original crystal structure were used. The force constant and the cutoff distance set as 1 kcal/mol/Å2 and 13 Å, respectively. For BNM, we used the Fortran code DIAGRTB (v2.52) with a minor modification of LRWORK, from 32,000,000 to 200,000,000, to accommodate large systems.[28,60] Details of Hessian matrix partition and the related coarse-grained NMA (CGNM) has been described previously in detail.[30] Briefly, the all-atom Hessian matrix was partitioned into four sections representing the interactions among relevant to relevant atoms, nonrelevant to nonrelevant atoms, relevant to nonrelevant atoms, and nonrelevant to relevant atoms:Then the effective force constant matrix for C-α atoms were extracted based on the following equation:[32]Diagonalization of the effective Hessian matrix for C-α atoms yielded the eigenvectors and the corresponding eigenvalues. On the basis of the eigenvectors and eigenvalues from the above NMA, the theoretical ADP of each atom was calculated using the following equation:where i and j are indexes for three orthogonal axes and k is the index for eigenvectors/eigenvalues. N represents the number of eigenvectors involved in the analysis.

Comparing Experimental and Theoretical ADPs

The correlation coefficient (cc) between two ADPs is calculated according to the following equation:where det represents the matrix determinant.[55] Determinant of the matrix can be calculated as the product of eigenvalues or directly from entries in the matrix. Other than the overlap factor or correlation coefficient (cc), several other metrics have been introduced to quatify the similaries in shape between two ellipsoids. Two representative metrics are 1. Modified cc.where ccmin corresponds to the minimal cc between U and V. V′ represents the perfect misalignment defined bywith the prominent (RU1) and the least prominent (RU3) directions being switched and the three eigenvalues being replaced by the values from tensor V.[50] 2. Normalized cc.where cc(U,V) is divided by the product of the cc between U or V with isotropic sphere.[51] To compare the distribution of the magnitude of the ADPs, we calculated the isotropic B-factor based on eq 3, normalized the value by the sum of all isotropic factors in that protein, and used the following equation to obtain the linear correlation coefficient (cciso) between experimental and theoretical B-factors for each protein:[50]

Notice about the Conversion of Units

In a previous publication, we have listed the conversion of units across different computational systems.[30] For NMA, the GROMACS program g_nmeig generated eigenvalue and eigenfrequency for each eigenvector. The unit of eigenfrequency is listed as wavenumber with the unit in cm–1. Eigenvalue corresponds to the square of the angular speed (ϕ2), with the unit in s–2, and requires a conversion factor (1024).[30] To convert the eigenvalue into eigenfrequency, the following factor is needed:where c represents the speed of light.

Systematic Survey of Spheroid Orientations in Euler Angle Space

To study the overlap between randomly positioned ellipsoid, we relied on the Haar measure for Euler angle to approach a uniform coverage of the angle space (α, β, γ), sin(β)ΔαΔβΔγ. The range of α, β, and γ are 0 to 2π, cos–1 (−1.0) to cos–1 (1.0), and 0 to 2π, respectively. We used the following Euler’s theorem to rotate the ellipsoid:The following equation was used to obtain the mean value of cc:

Results

Metrics for Comparing the Shape of ADP Ellipsoids

To compare the experimental and computational ADP tensors, it is necessary to study both the shape, the orientation, and the magnitude.[51] The orientation is represented by three eigenvectors of the tensor, which define the three dominant directions of the ADP ellipsoid. The three eigenvalues can be converted to the isotropic B-factor and used to compare the magnitude. In contrast, comparing the shape of two ellipsoids is complicated. The value of cc varies between 0 for misaligned and extremely anisotropic ellipsoids and 1 for perfectly aligned ellipsoids or two ellipsoids that are close to the sphere in shape. The degree of anisotropy is defined by the ratio between the shortest and longest radii of the ellipsoids (Figure 1A). The change in cc value correctly reflects the nature that the ellipsoids of higher anisotropy (close to sphere), tends to have larger overlap (Figure 1B). However, this close correlation between cc and anisotropy has a caveat: the cc results have a bias toward ADPs of high anisotropy. This is an important issue for theoretical ADP because the anisotropy is directly related with the computational model (all-atom force fields or coarse-grained methods, with or without surface water) and, more importantly, the number of eigenvectors involved in the analysis. Correlation coefficient (cc) for measuring the overlap between two spheroids. (A) Schematic drawings of a sphere, a prolate (cigar-shaped spheroid), and an oblate (pancake-shaped spheroid). The anisotropies for the three objects are 1.0, 0.2, and 0.2, respectively. (B) cc for the overlap between two prolates of the same anisotropy. Red, perfectly aligned prolates; black, averaged cc of randomly orientated prolates; green, perfectly misaligned prolates. Blue trace shows the overlap between a prolate and an isotropic sphere. (C) cc for the overlap between two oblates of the same anisotropy. Red, perfectly aligned oblates; black, averaged cc of randomly orientated oblates; green, perfectly misaligned oblates. Blue trace shows the overlap between an oblate and an isotropic sphere. (D) Averaged cc for randomly oriented prolate and oblate of the same anisotropy (red trace). The traces for prolate–prolate (green) and oblate–oblate (black) are from B and C, respectively. On the basis of the definition of cc, several other metrics were proposed to quantify the overlap between two ellipsoids including the normalized cc and the modified cc. Which metrics provide more straightforward and realistic comparison of two ADPs? To this end, we used the prolate or oblate (pancake- or cigar-shaped spheroid) as an example. First, we calculated the overlap between the spheroids of different anisotropy with isotropic sphere, cc(U,ISO).[55] Clearly, cc(U,ISO) depends on the type of spheroid because the cc trace of prolate is different from that of oblate, especially at low range of anisotropy (Figure 1B,C, blue traces). Next we calculated the cc of two randomly oriented spheroids with the same anisotropy through a systematic sampling of the Euler angle space (Figure 1B,C, black traces). As expected, the cc between two randomly positioned spheroids is lower than the overlap with the isotropic sphere and also increases along the increase in anisotropy. Noticeably, the overlap between two randomly oriented spheroids is shape-independent: the trace for two prolates and the trace for two oblates are identical (Figure 1D). To further confirm this, we tested the overlap between a prolate with a randomly orientated oblate and obtained the same results. Thus, cc is strongly influenced by the degree of anisotropy and largely insensitive to the specific shape of the ADPs (prolate or oblate). The correlation between cc and anisotropy is an import factor in evaluating experimental and theoretical ADPs. Next we studied the modified cc and the normalized cc as a function of anisotropy. Our results showed modified cc is not sensitive to anisotropy, which makes it very useful when the degree of anisotropy needs to be occluded from the analysis (Figure 2A,B, left). Furthermore, we calculated the overlap between two randomly oriented spheroids and found that modified cc is sensitive to the shape of the spheroid because the overlap between a prolate and an oblate (red trace) is drastically different from that of prolate–prolate (green, dashed trace) or oblate–oblate (black trace) (Figure 2C, left). However, the almost flat trace of modified cc (black traces in Figure 2A,B, left), even when the anisotropy approaches to 1, is at odds with the expectation that the overlap between two spheroids similar to a sphere should be close to 1. For the normalized cc, it is similar to cc and is also very sensitive to the degree of anisotropy (Figure 2A,B, right). However, to a certain extent, normalized cc is also sensitive to the shape of the spheroids. The traces for the pancake–pancake, cigar–cigar, and cigar–pancake are all different, especially when the anisotropy was much less than 1 (Figure 2C, right). A recent study reported that normalized cc is less useful when evaluating the theoretical ADP based on the segmented TLS model.[61] Taken together, these analyses provided useful information for applying three metrics to compare the overlap between two ADPs.

Figure 2

Modified cc and normalized cc for the overlap between a pair of spheroids. (A) Modified cc (left) and normalized cc (right) for measuring the overlap between two prolates of the same anisotropy. Red, perfectly aligned prolates; black, mean value for randomly oriented prolates; green, perfectly misaligned prolates. (B) Modified cc (left) and normalized cc (right) for measuring the overlap between two oblates of the same anisotropy. (C) Modified cc (left) and normalized cc (right) for randomly oriented prolate and oblate of the same anisotropy (red trace). The traces for prolate–prolate (green) and oblate–oblate (black) are from A and B, respectively. For modified cc (left), both traces are numerically identical.

Adding Surface Water Makes the Energy Minimized Structure Closer to Native Structure

EM of the crystal structure is a critical step for classical AANM because NMA is an analytical approach and based on the harmonic approximation of the protein energy surface. To remove the improper contacts in the structure and more importantly ensure the modeled conformation by the force field corresponding to a local minimum on the energy surface, it is essential to use different EM methods and perform the calculations at double precision. To study the effects of surface water, we separately tested three conditions: (1) protein only and without water; (2) protein plus the crystal water within 3 Å from protein, and (3) protein, the crystal water, and an extra layer (5 Å) of explicitly treated surface water (Figure 3A). After the EM steps, we calculated the root-mean-square deviation (rmsd) with respect to the original crystal structure (Figure 3B,C; Table S1, Supporting Information). The results were impressively consistent: among the three conditions, adding an extra layer of surface water always gave the lowest rmsd (0.65 ± 0.03 Å; n = 20). Oppositely, removing all water from the structure resulted in the highest rmsd, 1.56 ± 0.06 Å, and in the middle was the case with only crystal water (0.94 ± 0.04). To test whether this observation is force-field relevant, we repeated the above calculations using another force field, AMBER99, and obtained the same results.

Figure 3

Including crystal water and adding surface water improves EM results. (A) Structures of 1UG6 as an example. (clockwise from upper left corner) The original crystal structure including oxygen atoms of water molecules, energy minimized systems including surface and crystal water, no water, and crystal water only. (B) Overlay of all four structures shown in A. Cyan, crystal structure; red, surface + crystal water; green, crystal water only; blue, no water. (C) rmsd of C-α atoms vs the number of C-α atoms for the 20 proteins used in this study (see Table S1, Supporting Information, for the list). For each protein, three water models were tested: systems including surface and crystal water (black), crystal water only (red), and no water (blue).

Comparing the Overlap in Shape between Experimental and Theoretical ADPs

Next we asked how the treatment of water molecules and the choice of NMA methods affected the ADP results. In theory, for a system containing N particles, the number of the normal modes available for analysis is 3N – 6. The selection of eigenvectors in the calculation of theoretical ADP is crucial for the results.[37] Involving too many eigenvectors would increase the computational cost and diminish the meaning of NMA in terms of reducing the number of free parameters. However, involving too few eigenvectors would apparently affect the accuracy of the results. Since the motions of the lowest vibration frequency and of the highest vibration amplitude are of the most functional significance, we calculated the theoretical ADP based on the first 3, 6, 10, 30, 100, or 1000 (if applicable) eigenvectors of the lowest frequencies. The anisotropy of the theoretical ADP critically depends on the number of eigenvectors (Figure 4A). As a reference, the averaged anisotropy of the experimental ADPs is 0.496 ± 0.099 (n = 20; protein based; filled diamonds) or 0.509 ± 0.168 (n = 6351; residue based; standard deviation). For AANM, the model with explicitly treated surface water apparently outperformed the other two models in approaching the level of experimental anisotropy but with less eigenvectors (black trace in Figure 4A). With only 30 eigenvectors involved (AANM with surface water), the averaged anisotropy reached the level of 0.4, whereas the other two models required hundreds of eigenvectors to reach a similar level. For coarse-grained NMA, both BNM and CGNM methods outperformed the ENM method, which might be related with the all-atom treatment and the consideration of detailed chemical information embedded in the structure.

Figure 4

Averaged correlation coefficient (cc) based on the results of 20 proteins. (A) Averaged anisotropy of the theoretical ADP vs the number of eigenvectors involved in the calculation. Left, results of NMA based on all-atom force fields. Right, results of coarse-grained NMA. Averaged anisotropy of experimental ADPs is shown in the upper left corner (0.496 ± 0.22, n = 20). (B) Averaged cc vs the number of eigenvectors involved in the analysis. Left, results of NMA based on all-atom force fields. Right, results of coarse-grained NMA. (C) Left, averaged cc based on the first 10 eigenvectors. Right, averaged cc based on the first 100 eigenvectors. Does adding surface water lead to an improved overlap between experimental and theoretical ADPs? On the basis of the averaged cc, the answer seemed to be yes. We calculated the cc based on different pools of eigenvectors (Figure 4B). Corresponding to the increase in the number of involved eigenvectors, the cc increases steeply from the value around 0.5 to very close to 1. A replot of the cc values involving 10 or 100 eigenvectors in column format is shown in Figure 4C. It appears that for the theoretical ADP based on the model including surface water, the overlap with experimental ADP is apparently higher. Similarly, the BNM and CGNM methods outperformed the ENM method.

Negative Controls Are Critical in Evaluating the Overlap in Shape

To ensure the above conclusion was correct, we asked whether the increase in cc was due to the increase in the anisotropy of theoretical ADPs. To this end, we plotted the cc values that were based on 100 eigenvectors for all protein residues (n = 6351) as a function of the anisotropy of the experiment ADP (Figure S1, Supporting Information). Moreover, we scrambled the input of experimental ADPs for each protein by randomly choosing experimental ADPs from all other 19 proteins (Figure S2, Supporting Information). Choosing atoms from other proteins instead of shuffling the ADPs within each protein increases the randomness because subpopulation of atoms, such as the residues belonging to the same domain, share similarities in the atomic motions. Then we introduced the traces of negative controls to the cc plots shown in Figure 4B (Figure 5A). The close resemblance of the cc traces (solid lines with standard error bars) with the corresponding negative control traces (color-matched dashed lines) reinforced the necessity to involve other metrics to help evaluate the overlap between two ADPs.

Figure 5

Averaged overlap results based on 20 proteins and the corresponding negative controls. (A) Averaged cc based on 20 proteins. Dashed lines represent the negative controls based on scrambled experimental ADPs from the other 19 proteins as input. Left, AANM results. Right, coarse-grained results. (B) Modified cc results. Dashed lines represent the negative controls based on scrambled experimental ADPs as input. (C) Normalized cc results. Dashed lines represent the negative controls based on scrambled experimental ADPs as input. We calculated the modified cc and the normalized cc. In both cases, the clear separations from the negative control traces demonstrated a better separation between signal and noise than cc. However, two metrics gave opposite evaluations. For modified cc, the AANM with surface water produces better overlap, especially in the range with less than 30 eigenvectors, and the BNM and CGNM methods outperformed ENM (Figure 5B). In contrast, normalized cc revealed totally opposite trends (Figure 5C). This might be related with the fact that the normalized cc biases toward the ADPs of low anisotropy values (Figure 2). Again, these analyses showcase the complexity in interpreting the overlap results.

Comparing the Distribution of the Magnitude and the Orientation of Experimental and Theoretical ADPs

We started from comparing the magnitude of experimental and theoretical ADPs, using a linear correlation factor (cciso) for the isotropic B-factors within each protein (eq 11; Figure 6A).[50] The clear separation between the results (solid trace with standard error bar) from the negative control traces (dashed traces) demonstrated the strength of NMA in predicting the relative flexibility within each protein. With only 3 eigenvectors involved, the AANM method with surface water slightly outperformed other AANM methods and coarse-grained methods. Noticeably, regardless of the computational models, the results by all six methods are very similar, indicating that the isotropic B-factor might not be an effective metric in differentiating NMA results. To compare the orientation of ellipsoids represented by ADP tensors, we studied the eigenvector of the lowest eigenvalue (#1), corresponding to the highest electron density, and the eigenvector of the highest eigenvalue (#3), corresponding to the highest flexibility (Figure 6B,C). The results revealed two pieces of information: First, involving more eigenvectors does not lead to better results, especially for the model involving surface water, and second, ENM significantly outperforms the BNM and CGNM in predicting the orientation of the ADPs.

Figure 6

Comparing the magnitude and the orientation of theoretical and experimental ADPs. (A) Averaged correlation (cciso) between experimental and theoretical isotropic B-factors for 20 proteins. Dashed lines represent the negative controls based on scrambled experimental ADPs as input. Left, AANM results. Right, coarse-grained results. (B) Averaged dot product (absolute value) of the eigenvectors with the smallest eigenvalues. Results are averaged based on 20 proteins. Dashed lines represent the negative controls based on scrambled experimental ADPs as input. (C) Averaged dot product (absolute value) of the eigenvectors with the largest eigenvalue. Results are averaged based on 20 proteins. Dashed lines represent the negative controls based on scrambled experimental ADPs as input.

Discussion

The ADPs from high-resolution X-ray crystal structures embed rich information for protein dynamics; at the same time, they present an excellent test case for different computational approaches, mainly the NMA and the MD simulation. Apparently MD simulation has the advantage of incorporating a more realistic description of the protein close to the native environment. However, NMA, as an elegant analytical approach for studying protein dynamics, is still being widely used. Since NMA is based on a harmonic approximation of protein energy surface, it is perfectly suitable to reproduce the dynamic information embedded in the crystal structures because most of the X-ray data sets for solving the structure were collected at extremely low temperature (100 K). In this study, using the experimental ADPs from high-resolution structure as the reference, we compared different NMA models in predicting the theoretical ADP and investigated the effects of surface water. We found that including surface water significantly improved the EM results and outperformed other models in predicting the anisotropy of the ADP. It is well-known that the water near the protein surface has very different physical–chemical properties from the bulk water, such as the 5% increase in the density of water in hydration layer.[62,63] Both experimental studies, including the measurements by terahertz spectroscopy and neutron scattering,[64−66] and theoretical investigations[30,67,68] supported these differences between surface and bulk water. How does adding surface water improve the simulation results? Most likely this is due to the “cage”-like structure formed by surface water molecules, mediated by an extensive network of hydrogen bonds and intimate interactions with polar residues on protein surface.[30,42−46] Thus, including explicitly treated surface water should more faithfully reproduce the intimate interactions between the protein molecule and the solvent. Previously, we showed that including surface water shifts the spectrum of molecular motion generated by NMA toward higher frequencies, which had been reported by spectroscopic studies.[69−71] Here our results showed that the model involving surface water outperformed other models in reproducing the experimentally measured anisotropy. With surface water, much less eigenvectors (30–50) are needed to approach the averaged anisotropy of the experimental ADP. Since 50 eigenvectors only represent a very small fraction of the total eigenvectors (3N – 6) of the whole system, the computational cost could be drastically reduced, The metric for measuring the overlap between two ellipsoids is critical for interpreting the results. Using the prolate and oblate as examples we systematically surveyed the metrics proposed in the literature, including the original cc, modified cc, and normalized cc. We found that cc faithfully reflects the nature that the overlap between two ellipsoids depends on the degree of anisotropy. However, the close correlation between cc and anisotropy makes it difficult to distinguish the real overlaps from the contaminations with the overlap with ellipsoids similar to the sphere. To this end, negative control data sets, based on randomly selected ADPs from other proteins, should be very useful in judging the quality of the overlap results. In contrast, modified cc and normalized cc clearly separate signal from noise, but to different extent both show sensitivities to the shape and the anisotropy of the ellipsoid. Thus, our results show that to compare the overlap between ADPs, it is essential to consider anisotropy, which is related with the computational models and the number of eigenvectors, and more importantly, use different metrics in conjunction with properly designed negative controls to evaluate the results. In practical application, ADPs as negative control could be chosen from a pool of proteins that are totally irrelevant in both function and structure. Noticeably, alignment of the isotropic B-factors leads to the results around 0.5, regardless of the computational methods and force fields (Figure 6A). In the field, there is a transition from relying on ADP instead of isotropic B-factor as the metric to evaluate NMA results. This should be related with the fact that isotropic B-factors mainly report the relative flexibility within each protein, but however, ADPs provide much richer information for protein dynamics at atomic detail. Importantly, the collective molecular motions, which critically underlies the fulfillment of protein function, are embedded in the anisotropic thermal fluctuations. A comparison of different models and methods showed that the NMA based on ENM produced results comparable to the results by all-atom force field based methods, including AANM, BNM, and CGNM. ENM, related to GNM, is a knowledge-based model with a drastically simplified presentation of the system: only C-α atoms are being considered, and they are connected through a network of elastic springs with a unified force constant, 1 kcal/mol/Å2. Purely based on the positional definition of C-α atoms, ENM faithfully reproduces the atomic fluctuations determined by NMR and X-ray crystallography[52,72,73] and has been applied to large systems that are beyond the reach of canonical NMA methods.[37−41] Here our results confirmed the effectiveness of the NMA based on ENM, especially in predicting the orientation of the ADP. However, the NMA methods rooted in the all-atom presentation are capable of incorporating a more complex chemical nature and thus have the potential to better reproduce the experimental ADP. Among all the methods, the MD simulation at 300 K better reproduced the absolute amplitude of crystallographic B-factors. Previously, we investigated the effect of surface water by developing the CGNM method.[30] In that study, we reported that CGNM outperformed ENM-based method because the results of AANM instead of the experimental results were used as the reference for the purpose of methodology development. Here in the current study, we mainly used the experimentally determined ADP as the reference to compare different computational methods. Moreover, we included 20 proteins in the calculation instead of just one in a previous study. For isotropic fluctuations, the correlation coefficient by ENM is comparable to the results by other methods (Figure 6A). However, in terms of predicting the directionality of the ADP ellipsoids, ENM indeed outperformed other coarse-grained methods and matched AANM (Figure 6B). We thought this might be related with the fact that ENM is knowledge-based and rooted in the experimental observation of thermal fluctuations, especially by X-ray crystallography, which might explain the effectiveness of ENM in reproducing empirical results.

Conclusions

In summary, we compared the theoretical ADP based on NMA with the experimentally determined ADP embedded in high resolution structures. Incorporating surface water improved the EM results and the alignment with experimental ADP in certain aspects. With the number of the high resolution structures deposited in the PDB databank steadily increasing every year, the ADPs from those structures represent a trove of treasure for the study of protein dynamics but have not been intensively investigated. Further improving the alignment between experimental theoretical ADPs could be accomplished by incorporating the anharmonic behavior of the protein and the crystalline environment, although both only have weak influences over the intrinsic dynamics of the protein under cryogenic conditions.[53,56,74] From the aspect of experimental ADP, the quality of the data could be improved through careful exclusion of the whole-body movement of the crystal and standardization of the refinement procedure, as the methods of SHELX and REFMAC gives slightly different results.[37,75,76] Finally, it would be interesting to ask whether the ADPs within a protein follow certain patterns and to investigate the patterns in the context of structure and protein. Taken together, continuing study of the ADP will strengthen our understanding of the general relationship among structure, dynamics, and function and, retrogradely, would benefit the development of the methodology for refining medium-to-low resolution structures.[77,78]

65 in total

1. Is the first hydration shell of lysozyme of higher density than bulk water?

Authors: Franci Merzel; Jeremy C Smith
Journal: Proc Natl Acad Sci U S A Date: 2002-04-16 Impact factor: 11.205

Review 2. Normal mode analysis with simplified models to investigate the global dynamics of biological systems.

Authors: Florence Tama
Journal: Protein Pept Lett Date: 2003-04 Impact factor: 1.890

Review 3. Structure, dynamics and reactions of protein hydration water.

Authors: Jeremy C Smith; Franci Merzel; Ana-Nicoleta Bondar; Alexander Tournier; Stefan Fischer
Journal: Philos Trans R Soc Lond B Biol Sci Date: 2004-08-29 Impact factor: 6.237

4. Escherichia coli adenylate kinase dynamics: comparison of elastic network model modes with mode-coupling (15)N-NMR relaxation data.

Authors: N Alpay Temiz; Eva Meirovitch; Ivet Bahar
Journal: Proteins Date: 2004-11-15

5. Validation of crystallographic models containing TLS or other descriptions of anisotropy.

Authors: Frank Zucker; P Christoph Champ; Ethan A Merritt
Journal: Acta Crystallogr D Biol Crystallogr Date: 2010-07-09

6. Protein structural variation in computational models and crystallographic data.

Authors: Dmitry A Kondrashov; Adam W Van Wynsberghe; Ryan M Bannen; Qiang Cui; George N Phillips
Journal: Structure Date: 2007-02 Impact factor: 5.006

7. An extended dynamical hydration shell around proteins.

Authors: Simon Ebbinghaus; Seung Joong Kim; Matthias Heyden; Xin Yu; Udo Heugen; Martin Gruebele; David M Leitner; Martina Havenith
Journal: Proc Natl Acad Sci U S A Date: 2007-12-19 Impact factor: 11.205

8. Normal-mode refinement of anisotropic thermal parameters for potassium channel KcsA at 3.2 A crystallographic resolution.

Authors: Xiaorui Chen; Billy K Poon; Athanasios Dousis; Qinghua Wang; Jianpeng Ma
Journal: Structure Date: 2007-08 Impact factor: 5.006