Literature DB >> 23894231

Polarized Protein-Specific Charges from Atoms-in-Molecule Electron Density Partitioning.

Louis P Lee¹, Daniel J Cole, Chris-Kriton Skylaris, William L Jorgensen, Mike C Payne.

Abstract

Atomic partial charges for use in traditional force fields for biomolecular simulation are often fit to the electrostatic potentials of small molecules and, hence, neglect large-scale electronic polarization. On the other hand, recent advances in atoms-in-molecule charge derivation schemes show promise for use in flexible force fields but are limited in size by the underlying quantum mechanical calculation of the electron density. Here, we implement the density derived electrostatic and chemical charges method in the linear-scaling density functional theory code ONETEP. Our implementation allows the straightforward derivation of partial atomic charges for systems comprising thousands of atoms, including entire proteins. We demonstrate that the derived charges are chemically intuitive, reproduce ab initio electrostatic potentials of proteins and are transferable between closely related systems. Simulated NMR data derived from molecular dynamics of three proteins using force fields based on the ONETEP charges are in good agreement with experiment.

Entities: Chemical Disease Gene Mutation Species

Year: 2013 PMID： 23894231 PMCID： PMC3719162 DOI： 10.1021/ct400279d

Source DB: PubMed Journal: J Chem Theory Comput ISSN： 1549-9618 Impact factor: 6.006

Introduction

Proteins are essential components of all organisms, carrying out tasks defined by the information encoded within genes, for example catalyzing biochemical reactions, mediating cell signaling, or providing structural rigidity. Computation plays an important role in the study of proteins—simulations range from elucidation of enzymatic reaction mechanisms, to the study of folding pathways, to design of therapeutic molecules against disease.[1] In biomolecular simulations such as these, molecular mechanics (MM) force fields are often used in which electrostatic interactions are described by atom-centered point charges. However, there is no unique method for partitioning the rigorously calculated quantum mechanical (QM) electron density among the individual atoms and different charge derivation schemes often lead to very different results. In commonly used force fields such as AMBER,[2] the MM partial charges of protein molecules are optimized by fitting them to reproduce the QM electrostatic potential (ESP) of small molecules.[3,4] These ESP charges are well-suited for MM force fields, as they reproduce ab initio multipole moments and electrostatic interactions between molecular fragments.[5] A disadvantage of such techniques is the neglect of polarization by the environment—indeed, a recent density functional theory (DFT) natural population analysis of an entire protein in water found that net charges of residues can vary by up to 0.5 e from their putative integer values.[6] While mean field approaches for charge fitting are the most appropriate for deriving transferable force field parameters, often, as in the example of the study of protein–ligand binding, we are only interested in sampling in the vicinity of the protein’s native state. In these cases, it would be ideal to incorporate electrostatic polarization that is specific to that native state into the charge fitting procedure. Recent studies have calculated atom-centered charges for entire proteins, accounting for native state polarization by including background point charges in a series of iterative fragment-based ESP fits. The resulting polarized protein-specific charges perform better than standard AMBER charges in determining free energies of ligand binding,[7] in pKa calculations,[8] and in comparisons with NMR data.[9,10] These results point to the potential improvements that can be made by using polarized protein-specific point charges. However, such an approach potentially requires a large number of QM calculations to iterate all charges in the system to self-consistency[8] and requires restraints[11] or conformational averaging[12,13] to treat buried atoms and to address the sensitivity of the charges to small conformational changes. An ideal charge derivation scheme should efficiently account for the surrounding environment, while the resulting charges should be chemically intuitive, reproduce ab initio electrostatic properties, be robust with respect to conformational changes, and be insensitive to buried atoms.[14] The charges should be derived from first principles, with no empirical parameters, applicable to a wide range of systems without requiring specialized treatments based on specific chemical knowledge of a particular molecule, and preferably computable from a single QM calculation of the whole system. Recently, there has been renewed interest in electronic density-based atoms-in-molecule (AIM) charge partitioning based on the Hirshfeld approach.[15−18] Such methods differ conceptually from ESP in that the net atomic charges are assigned by dividing a converged, QM electronic density into a union of overlapping atomic basins. The density derived electrostatic and chemical charges (DDEC) method, developed by Manz and Sholl,[18] combines two AIM approaches, iterative Hirshfeld (IH) and iterated stockholder atoms (ISA), to assign atomic charges from the electron density. The resulting charges have already been shown to be suitable for force field development.[19] The charges are chemically intuitive and insensitive to small conformational changes. They adapt to the atom’s environment, reproduce the ab initio electrostatic potential, and, where applicable, correlate well with X-ray diffraction and X-ray absorption near-edge spectroscopy data.[19,20] The method can be applied with no adjustable parameters to buried atoms and to either periodic or nonperiodic systems. DDEC charges have already been used to develop force fields for molecular adsorption inside metal–organic frameworks.[21,22] The DDEC method is implemented in a freely available code (http://ddec.sourceforge.net/), which is interfaced with codes such as VASP and Gaussian among others. Thus, DDEC charges are suitable for environment-specific, flexible force field development for biomolecular simulations but are limited by the unfavorable computational scaling of the underlying QM calculation to systems of a few hundred atoms. In this paper, we overcome this limitation by implementing the DDEC scheme in the ONETEP linear-scaling DFT code. ONETEP combines high basis set accuracy, comparable to that of plane-wave DFT methods, with a computational cost that scales linearly with the number of atoms, which allows for an accurate, fully QM description of systems of thousands of atoms,[23,24] including entire proteins.[25−27] We begin by outlining the various underlying AIM schemes, followed by a brief description of the linear-scaling DFT code ONETEP. We validate our internal implementation of the DDEC methodology against quantum chemistry calculations in Gaussian09[28] for a benchmark set of 25 representative small molecules and show that charges are derived with linear-scaling computational cost, allowing analysis of proteins comprising thousands of atoms from a single DFT calculation. We demonstrate that, for these large systems, the features of the DDEC scheme that make the charges desirable for flexible force field development are maintained. Namely, the charges (i) respond to their environment in a chemically intuitive manner; (ii) reproduce electrostatic properties of the DFT calculation; and (iii) are not overly sensitive to small conformational changes or the presence of buried atoms. Finally, we construct a MM force field based on the DDEC charges for three proteins and compare the results of our MM simulations with experimentally measured NMR dynamic observables and a standard biomolecular force field.

Computational Methodology

Density Derived Electrostatic and Chemical Charges

The DDEC method is based upon two recently developed extensions to the original Hirshfeld AIM scheme.[15] In the original formulation, an electronic density n(r) is divided into overlapping atomic basins nA(r) for each atom A according to the weighting formula:where nA0(r) is the atomic reference density, whose overlapping sum over all atoms in the system ∑BnB0(r) is termed the promolecular density. This form of stockholder partitioning, where the electronic density at each point r is distributed based on the proportional contribution from the reference densities of each atom at that point, has been shown to minimize the information distance FAIM (Kullback–Liebler entropy) between the real and promolecular density,[29] maximizing the information retained in the reference atomic states nA0(r) when transferring to the molecular environment:subject to the constraint that ∑AnA(r) = n(r); that is, the electronic density is completely partitioned. In other words, the stockholder partitioning exhaustively divides the real electronic density into a set of overlapping atomic densities nA(r) in such a way as to maximize the density distribution similarity to their respective reference density counterparts nA0(r). Shortcomings of the original Hirshfeld method included an arbitrariness in the choice of reference atomic densities nA0(r). Neutral, gas-phase atomic densities were often chosen as references, although these commonly led to atomic populations that were too close to zero.[17] Such problems are addressed in recently proposed iterative extensions to the Hirshfeld method,[16,17] in which reference densities are successively improved until self-consistency is achieved. In the iterative Hirshfeld (IH) scheme proposed by Bultinck et al.,[17] the IH reference densities nA0(i,r) are derived from the partitioned atomic density at iteration i, nA(i,r), by the following procedure. First, the (noninteger) electronic populations of each atom QA(i) are computed in the usual way from the partitioned atomic densities:Instead of choosing neutral reference states as in the original Hirshfeld method, new reference states are generated by linear interpolation between densities of free atoms or ions with the next lowest integer (τ = lint(QA(i)) and the next highest integer (τ + 1) number of electrons:The purpose of the interpolation is to obtain a suitable gas-phase reference density for a hypothetical ion comprising QA(i) electrons. The partitioned atomic densities for the next iteration i + 1 are then derived from the reference states generated at iteration i:The procedure is iterated until the changes in the set of IH charges fall below a specified threshold. The resulting charges have been shown to reproduce ab initio electrostatic properties of small polypeptides and to be relatively insensitive to small conformational changes.[14] An alternative approach by Lillestolen and Wheatley,[16] meanwhile, named iterated stockholder atoms (ISA), takes the spherical average of the partitioned atomic density nA(i,r) at iteration i as the (ISA) reference density that enters into eq 5, where ⟨···⟩A denotes spherical averaging about the center of atom A:In practice, the averaging is performed on a set of discrete radial shells up to a maximum radius rmax. The ISA scheme is argued to be less empirical than IH, as the latter still relies on a library of externally generated ionic densities. ISA also produces a better fit to the ESP due to the low-order multipoles possessed by the converged nA(r) resulting from the spherical-averaging procedure used to generate the reference densities.[18] The DDEC scheme by Manz and Sholl[18,19] combines the IH and ISA methods by minimizing a combined information entropy functional (eq 2):where FIH/ISA are constructed with IH/ISA reference densities and weighted by an adjustable parameter χ. Minimizing eq 7 with respect to nA(r) with the same constraint as eq 2 leads to a partitioning of the form:where nAIH/ISA(i,r) are the respective reference densities given by eqs 4 and 6. By allowing a fraction of FIH to contribute toward curvature in regions that otherwise have shallow optimization landscapes, for example, buried atoms, this technique alleviates the slow convergence of the ISA method for such regions while retaining the appealing attributes of the ISA scheme. An important addition to the DDEC scheme is the employment of charge-compensated reference IH densities, referred to as the DDEC/c2 scheme.[18] These densities are generated from DFT ground-state ionic calculations in the presence of a charge compensation sphere, akin to a conductor-like polarizable continuum model (CPCM) treatment with a solvent of infinite dielectric constant.[18] The justification for using such compensated densities instead of free ionic states, as in the original IH scheme, stems from the dielectric screening experienced by an ion embedded within a molecule, whereby its density profile can be modified by the effective dielectric constant. The charge compensation sphere acts to expand (contract) the reference density in the case of cations (anions).

ONETEP

ONETEP[30] is a linear-scaling DFT package based on a reformulation of conventional Kohn–Sham DFT in terms of the single-particle density matrix:where {ϕα(r)} are nonorthogonal generalized Wannier functions (NGWFs) that are localized in real space,[31] and Kαβ is a representation of the density matrix in the biorthogonal duals of the NGWFs. ONETEP achieves linear-scaling by exploiting the “nearsightedness” of the single-particle density matrix in nonmetallic systems.[32] In practice, linear-scaling arises from enforcing strict localization of the NGWFs onto atomic regions and through the optimization of the density kernel and NGWFs, subject to localization constraints. Optimizing the NGWFs in situ allows for a minimal number of atom-centered orbitals to be used while maintaining plane-wave accuracy. The NGWFs are represented in a basis of highly localized periodic cardinal sine (psinc) functions (otherwise known as Fourier–Lagrange mesh functions).[33] The psinc functions are related to plane waves via a Fourier transform, meaning that systematic improvement is possible through adjustment of the psinc grid spacing, analogous to converging the kinetic energy cutoff in traditional (N3) plane-wave DFT codes. In order to expedite optimization, the NGWFs can be initialized closer to their ground states by using an in-built pseudoatomic solver, which self-consistently solves the DFT Kohn–Sham equations for isolated atoms using the same pseudopotentials and exchange-correlation functional as the full calculation.[34] Implicit solvation, whose inclusion is essential both for an accurate description of the protein’s aqueous environment and to aid optimization of the density kernel,[35] is implemented within ONETEP. This is a minimal parameter, self-consistent model based on direct solution of the inhomogeneous Poisson equation for a solute cavity defined by the isosurface of the electron density.[36,37]

DDEC Implementation in ONETEP

We have implemented the DDEC method[18,19] within ONETEP as a postprocessing module to take advantage of its parallel and efficient algorithms for sparse matrix algebra and operations with localized orbitals and electron distributions. A single DFT calculation is performed on the system to obtain the ground-state electronic density, which is then processed to extract the DDEC net atomic charges. Reference densities for the IH part of the calculation are generated internally at run time using the same exchange-correlation functional, pseudopotentials, and NGWF cutoff radius as the full DFT calculation, with the c2 charge compensation scheme.[18] Specifically, using the pseudoatomic solver module,[34] as described in section 2.2, the Kohn–Sham equations for a set of atomic orbitals of an isolated atom are solved self-consistently on a radial grid with an additional spherical surface of compensation charge.[18] The effect of this charged surface is expressed equivalently via Gauss’ Law as an additional radial electrostatic potential term Vcomp in the Hamiltonian:where the first three operators are the kinetic energy, local and nonlocal potentials, and Vcomp = Qcomp/|r| for |r| > Rcomp, with Qcomp and Rcomp the charge and radius of the compensation sphere. Following Manz and Sholl,[18] for cations, the magnitude of the compensation charge (Qcomp) is chosen to neutralize the ion, with the compensation sphere radius (Rcomp) set to the average of ⟨ψnlm|r̂|ψnlm⟩ expectation values of all occupied orbitals {|ψnlm⟩} in the neutral species that are vacant in the cation. For anions, the compensation charge is chosen to make the electrostatic potential at an infinitesimal distance outside the compensation sphere zero. The radius of the compensation sphere is incrementally adjusted in 0.1 Bohr steps until the total energy of the system is minimized. Whole-molecule densities (total and promolecular) are stored on regular Cartesian grids, while spherically averaged promolecular density profiles for individual atoms are computed and stored on atom-centered sets of equally spaced radial shells up to a predefined maximum cutoff radius. Linear-scaling computation with respect to the number of atoms (for each iteration) is achieved at a modest cost of computational memory by computing and storing the total promolecular density (∑BnB0(i,r)) once per DDEC iteration, on the same grid as the molecular density. Figure 1a shows the computational scaling of our DDEC implementation for periodic supercells of bulk water of increasing size, up to around 2500 atoms (Supporting Information). Even for the largest system, the computation time of the DDEC analysis is under 20 min. The DFT calculation itself requires around 12 h on 160 cores, making the calculation of DDEC charges for systems comprising thousands of atoms feasible on a near-routine basis.

Figure 1

(a) Computation time required for DDEC postprocessing calculations of bulk water. Simulations were executed on 160 Intel Sandy Bridge cores. Numbers beside points indicate the number of DDEC iterations required for convergence. (b) DDEC atomic and molecular charge distributions of bulk water for three system sizes. Vertical dashed lines indicate MM TIP3P water charges. Importantly, the number of iterations required for charge convergence remains virtually constant for increasing numbers of atoms, ensuring that the total computational effort remains linear-scaling with respect to system size. Figure 1b reveals that the distribution of charges obtained for bulk water is independent of system size. The atomic charges distributions are centered on values close to those typically used in MM force fields (TIP3P),[38] while they show some spread as one might expect for atoms in a liquid with differing local environments. Recently, Manz and Sholl introduced an update to the DDEC method, named DDEC/c3.[19] The DDEC/c3 method improves convergence of charges for nonporous solids with short bond lengths between diffuse atoms by enforcing exponential decay of atomic electron densities and conditioning the reference IH densities nAIH(i,r) that enter eq 8 to the material of interest. In our implementation, we use the conditioned reference densities but do not enforce radial decay of the partitioned atomic densities nA(r). Our reasoning is that our studies are aimed at biomolecular systems, which are porous materials lacking compacted regions, and we anticipate that the partial weighting by IH reference densities during optimization will be sufficient to ensure the stability of atomic charges for embedded atoms. In addition, in our implementation, core charge handling has been completely excluded, as we employ norm-conserving pseudopotentials. Literature studies have found little dependence of IH charges on the treatment of core electrons.[39] Nevertheless, in the following section, we test both of these assumptions by validating ONETEP-calculated DDEC charges against the full DDEC/c3 method employing the all-electron code Gaussian09.[28]

Validation

The validity of our DDEC implementation was tested using a set of 25 diverse, neutral, small, organic molecules.[40] Benchmarking was performed against the standard DDEC method as implemented in the CHARGEMOL package (version 2.1 beta, obtained from http://ddec.sourceforge.net/), employing 100 radial shells (NRad) up to a maximum cutoff radius (rmax) of 5 Å, together with the supplied c3 charge-compensated reference density library (DDEC/CHARGEMOL).[18,19] Molecular geometries were optimized in vacuo with a 6-311G(d,p) basis set, and electronic densities were generated with an aug-cc-pVQZ all-electron basis set in Gaussian09.[28] The PBE exchange-correlation functional[41] was used throughout. For all DDEC analysis, the mixing parameter χ was set to 3/14, which has been shown to give the optimal balance between minimizing the atomic multipoles and maximizing chemical accuracy.[19] The set of net atomic charges was considered converged when the maximum absolute change for any atom for three successive iterations was less than 2 × 10–5e. ONETEP calculations were performed in vacuum using a cubic simulation cell of 30 Å, with the spherical cutoff Coulomb approach to avoid electrostatic interactions between molecules and their periodic images.[42] Interactions between electrons and nuclei were described by norm-conserving pseudopotentials. NGWFs were initialized as orbitals obtained from solving the Kohn–Sham equation for free atoms,[34] with a 1s configuration for H, a 2s2p configuration for C, N, O, and F, and a 3s3p configuration for S and Cl. The NGWFs were expanded in a psinc basis with an energy cutoff of 1000 eV, corresponding to a grid spacing of 0.45 Bohr, and were localized in real space with radii of 10 Bohr. Convergence of the atomic charges generated by our implementation of the DDEC method in ONETEP (referred to as DDEC/ONETEP) with respect to the ONETEP psinc grid spacing and NGWF cutoff radii, as well as the DDEC parameters rmax and NRad was investigated for nitroethane and discussed in Supporting Information Figure S1. Figure 2 shows the correlation between the DDEC/ONETEP and DDEC/CHARGEMOL charges for every atom in the benchmark set. Despite the different approaches used in obtaining the ground-state electronic densities and the subsequent charge analysis, the difference between the two charge sets is very small with a mean absolute deviation (MAD) of less than 0.02 e.

Figure 2

Correlation between DDEC/CHARGEMOL and DDEC/ONETEP charges for all atoms in the 25 molecule benchmark set. Also shown is the mean absolute deviation (MAD) between the two sets.

Correlation between DDEC/CHARGEMOL and DDEC/ONETEP charges for all atoms in the 25 molecule benchmark set. Also shown is the mean absolute deviation (MAD) between the two sets. Atomic charges for use in force fields should approximately reproduce the ab initio electrostatic potential outside the molecule’s electron density. We can measure the error ΔV in the Coulombic potential of the DDEC charges, compared with DFT, as[19]where the sum is performed for all points i lying within 1.4 and 2.0 times the van der Waals radii of the nuclei,[43] on the same grid mesh used to calculate the electron density. In order to remove the arbitrary vacuum level of the DFT potential VDFT(r), the potentials are displaced by the averaged difference over the included grid points. Our calculations of ΔV are performed for a test charge of 1 e in vacuum. Typical errors in, for example, interaction energies of neutral ligands with proteins in solution will tend to be lower than this error estimate.[12] Nevertheless, it is a good indicator of the ability of a method to reproduce the ab initio ESP. Figure 3 (bottom) shows that DDEC charges reliably reproduce the electrostatic potential around the molecule, with ΔV errors of approximately 1 kcal/mol for both DDEC/CHARGEMOL and DDEC/ONETEP. For neutral molecules, the dipole is the leading order term in the multipole expansion and, thus, is the most significant in determining intermolecular electrostatic interactions. In Figure 3 (top), we compare the DDEC vector dipole moments with those computed directly from their respective ab initio electronic densities. Again, ONETEP performs as well as CHARGEMOL. For comparison, we have also plotted the error in the dipole moment and ΔV (eq 11) for ESP charges. In agreement with previous charge comparisons,[14,18,44] the ESP charges reproduce the ab initio electrostatic potential and dipole moment more closely. Despite this better performance of ESP charges for small molecule electrostatics, DDEC charges will be suitable for polarized protein-specific charges if, as we shall investigate in the next section, they are able to reproduce the electrostatic properties of much larger molecules containing buried atoms better than standard MM force fields.

Figure 3

(top) Errors in the DDEC dipole moment vectors, |μDDEC – μDFT|, compared to ab initio dipole moments. (bottom) ΔV for the 25 molecule set. ESP refers to results from Merz–Kollman charges (calculated with Gaussian09). We conclude that, at least for organic systems comprising light atoms, our use of DDEC charge analysis within the pseudopotential approximation and without enforcing radial decay of the partitioned atomic densities is valid. It should be emphasized that the DDEC analysis can only be as accurate as the electron density output by the underlying QM calculation. In the current paper, we have used the PBE exchange-correlation functional in our ONETEP calculations, although future advances will allow alternative treatments of electron–electron interactions, such as B3LYP[45] and dynamical mean field theory.[46] For comparison, Supporting Information Figure S2 compares Gaussian09 DDEC charges calculated using PBE and B3LYP exchange-correlation functionals. The MAD is less than 0.02 e, implying that the PBE functional gives a reliable electron density for the calculation of DDEC charges.

Properties of DDEC Charges

DDEC Charges Respond to their Environment

In order to aid in the interpretation of QM simulations, atomic point charges should respond in a chemically intuitive manner to their environment. As an example, Figure 4, b and c, shows ONETEP/DDEC charges for the phenol molecule in two different environments: first in water and, second, in a small model cluster representing the negatively charged binding pocket of the L99A/M102E mutant of T4 lysozyme[47] (Supporting Information). The two environments are typical of those simulated in, for example, the optimization of small molecule inhibitors for drug design.[1] For comparison, Figure 4a shows the RESP charges for phenol, calculated in vacuum at the HF/6-31G* level to approximate aqueous polarization,[5] the same level of theory that has been used to parametrize the AMBER force field[4] that we employ as a benchmark in later sections. The DDEC charges in water agree with the RESP charges with a root-mean-square (RMS) deviation of 0.04 e. Similar results are obtained by comparing with RESP charges derived using a polarizable continuum model (PCM) to model solvation (Supporting Information Figure S3). There are no large changes in the DDEC charges of the phenol molecule on moving to the protein binding pocket. Charges on the aromatic ring change by up to 0.05 e in the more hydrophobic environment, while the −OH group, which is hydrogen-bonded to residue E102, becomes more strongly polarized.

Figure 4

(a) RESP charges for phenol calculated in vacuo at the HF/6-31G* level. (b) DDEC/ONETEP charges calculated in a 20 molecule water cluster using with identical simulation parameters as Section 2.4. (c) Same as in part b, but within a 86 atom cluster representing the T4 lysozyme L99A/M102E binding pocket. Charges for parts b and c have been averaged over five different conformations. Standard error of the mean is less than 0.01 e for the −OH group, and the maximum on any atom is 0.018 e. We have utilized the linear-scaling nature of our DDEC implementation to calculate DDEC charges of three proteins: ubiquitin, the SMN Tudor Domain, and hen egg lysozyme (PDB: 1UBQ, 1MHN, 6LYT). Initial structures were prepared by protonation using the AMBER11 tleap module[2] or by the MOLPROBITY software.[48] DFT calculations were performed using the minimal parameter continuum solvation model of ONETEP with a relative dielectric permittivity of 80 to represent water solvent.[36] A postprocessing calculation was performed on the converged electronic density to extract DDEC charges (Supporting Information). It should be emphasized that the entire charge set is derived from a single DFT calculation, on systems up to 1960 atoms, and thus naturally incorporates native state polarization. As we found for the small test systems, DDEC charges for the proteins are similar to the mean field AMBER ff99SB charges,[49] with a correlation coefficient of 0.96 (Figure 5a). However, the RMS deviation between the charge sets is 0.11 e, as expected since the AIMs were optimized based on the ab initio electronic density for atoms embedded in their local environment. Comparisons with the AMBER ff03[3] force field charges yield similar conclusions (Supporting Information Figure S4). For chemically identical atoms that share the same charges in AMBER, the DDEC charges span a wider range, which is indicated by the vertical spread of points. For example, DDEC charges for backbone O atoms of leucine vary between −0.72 and −0.56 e, with an average of −0.63 e, compared to the constant AMBER value of −0.57 e. The prominent outliers in Figure 5a (with absolute charge disagreement larger than 0.25 e) are mostly nonbackbone sp3 carbon atoms, most of which are close to neutral in AMBER, but adopt a range of values in DDEC, depending on the residue type and their particular local environments. For example, the Cγ atom in glutamine has a charge of −0.06 e in AMBER, but ranges from −0.41 to −0.37 e in DDEC. Similar deviations in C charges from AMBER values have been observed in conformationally averaged ESP charges derived from dipeptide fragments.[13] The largest discrepancy between the two charge sets is in the terminal nitrogen atom, which is positively charged in AMBER (0.16 e) but negatively charged in DDEC (−0.52 e). Several ESP studies[3,13] have indicated that our DDEC value is more appropriate for the charge of the N-terminus.

Figure 5

(a) Correlation between DDEC partial atomic charges and the AMBER ff99SB force field for ubiquitin. (b) Electrostatic contribution to the backbone hydrogen bond energies in the X-ray crystal structure of ubiquitin. The x-axis denotes the protein donor/acceptor residues involved. DDEC charges give slightly more negative hydrogen bond energies, indicating enhanced stabilization. Of particular interest is the observation that the charges of backbone–backbone hydrogen bonds calculated by DDEC/ONETEP are more polarized than in the AMBER force field. Following Ji et al.,[50] we define the electrostatic contribution to the hydrogen bond between backbone NH and CO groups as the Coulombic interaction between atoms carrying point charges q:Figure 5b reveals that the resulting dipole–dipole interaction is stronger for DDEC charges than AMBER. It is possible that the enhancement of the interaction is a systematic bias in the potential of the DDEC charges, though it does support previous hypotheses that proteins are stabilized in their native environment via electronic polarization.[10] We investigate this effect further in Section 4 by comparing the dynamics of backbone hydrogen bonds with experimental NMR observables.

DDEC Charges Reproduce Ab Initio Electrostatic Potentials

The correct treatment of electrostatics is vital in the accurate determination of molecular interactions in biological systems. In particular, if DDEC charges are to be useful as polarized protein-specific charges, they should reproduce ab initio electrostatics for large molecules better than standard force fields that are based on ESP charges. We have calculated the electrostatic potential and the dipole moments of the three proteins studied in the previous section with full DFT, the derived DDEC charges, and with a standard AMBER force field (Supporting Information Figure S4 gives the same information for the AMBER ff03 force field). For all charge distributions, the electrostatic potential was computed in vacuum, in order to better observe the discrepancy between ab initio and point charge values that would otherwise be dampened by dielectric screening. We note, however, that the ab initio, DDEC and AMBER charge distributions have already incorporated solvation effects in their calculation, and the electrostatic comparisons are, therefore, equivalent. Table 1 shows that DDEC/ONETEP AIMs reproduce the total ab initio potential and dipole moments μDFT for all three proteins, vastly outperforming the standard MM charges. Figure 6 shows the difference in electrostatic potential between the two charge sets and the DFT calculation. Although the standard MM charges are fit to closely reproduce the ESP of small molecules, they neglect large-scale electronic polarization that is present in protein-specific charges, and hence, the error in the computed ESP is large. The computed errors in the MM electrostatic potential are consistent with literature studies of errors in interaction energies between charged biotin ligands and the avidin protein in vacuum (11 kcal/mol on average).[12] For more realistic simulations, the same study found that the error was reduced to approximately 2 kcal/mol for neutral ligands and to 0.7 kcal/mol if solvent screening is also included.

Table 1

RMS Error (ΔV) between the Ab Initio DFT Potential and Coulombic Point Charge Potentials Derived from DDEC/ONETEP and AMBER ff99SB Charges for Three Proteins Calculated using Equation 11a

		ΔV (kcal/mol)		\|μ_DFT – μ\| (D)
protein	total charge (e)	DDEC/ONETEP	AMBER	DDEC/ONETEP	AMBER
1UBQ	0	1.30	7.56	2.2	34.2
1MHN	–4	1.54	5.49	3.6	11.8
6LYT	+8	1.35	6.99	0.9	32.2

Also shown are errors in the protein dipole moment vectors, which have been calculated relative to the center of mass of each protein to remove ambiguity for charged systems. The DDEC charges reproduce electrostatic properties of large molecules much better than do the mean field classical point charges.

Figure 6

Difference in calculated electrostatic potential between point charges and DFT for ubiquitin. The ESP difference is calculated at the solvent-accessible surface of the protein. Also shown as gray, blue, and red arrows are the calculated DFT, DDEC, and AMBER dipole moments, respectively. Also shown are errors in the protein dipole moment vectors, which have been calculated relative to the center of mass of each protein to remove ambiguity for charged systems. The DDEC charges reproduce electrostatic properties of large molecules much better than do the mean field classical point charges. We noted earlier that ISA charges are particularly good at reproducing the ESP, since the electron density is divided into spherically symmetric partitions. This point is reinforced by Figure 7, which reveals that the deviation from the ab initio dipole moment of ubiquitin decreases with decreasing χ (increasing ISA fraction). However, one of the motivations for including IH reference densities in the DDEC information entropy functional (eq 8) was to alleviate problematic convergence for AIM charges of embedded atoms due to the shallow optimization landscapes experienced in such regions.[18] We find that the DDEC method with χ = 3/14, as recommended for all systems by Manz and Sholl,[19] is a good compromise between reproducing accurately the ESP of the system, reducing the number of self-consistency iterations required for convergence (Figure 7) and, as we shall see in the next section, improving the transferability of the charges. Further comparisons between DDEC, IH, and ISA charges are presented in Supporting Information Figure S5.

Figure 7

Dependence of the DDEC dipole moment error on the mixing parameter χ. Increasing the ISA content reduces the dipole error but also increases the number of iterations required for charge convergence and, hence, the computational time.

DDEC Charges are Transferable

In order to be of use in parametrizing flexible force fields, atomic charges, even those of buried atoms, must not be overly sensitive to small conformational changes. To study conformational dependencies of the DDEC charges for large systems, we have performed DDEC analysis of nine experimental NMR conformers of bovine pancreatic trypsin inhibitor (BPTI, PDB: 1PIT) in implicit solvent (892 atoms). Each residue in Figure 8 is colored by the average of the standard deviation in the calculated atomic charges over the nine conformations. The first point to note is that the DDEC charges are much more stable to conformational change than the ISA charges. While some fluctuation due to the changing environment is expected, the protein conformers are all close to the native state and the DDEC charges reflect this. This is in stark contrast to RESP charges, for which a recent review of atomic charge models found strong fluctuations in partial charges for an ensemble of polypeptide chain conformations.[14] Second, there is no discernible difference between charge fluctuations in buried and exposed residues, as would be observed for shallow optimization landscapes. Finally, the width of the protein chains in Figure 8 reflects the average positional fluctuation of each residue. It is noticeable that DDEC charges vary more strongly in residues with a high degree of flexibility, especially when they form intramolecular contacts (between β-sheets, within α-helices, and in sections linked by disulfide bridges), as one might expect.

Figure 8

DDEC/ONETEP charge and conformation fluctuations for nine NMR models of BPTI. Average positional fluctuation per residue is represented by its putty width and average charge fluctuation (standard deviation) per residue by its color. ISA charge fluctuations have been clipped at 0.04 e for visual clarity (ten residues in the ISA method have values greater than 0.04 e, with a maximum of 0.06 e). ISA charges are DDEC/ONETEP charges derived with χ = 0 (eq 7). We showed in the previous section that DDEC charges are able to accurately reproduce the ESP around the structure that they are fit to. Here, we investigate whether average charges derived for an ensemble of native state conformations are able to accurately describe the electrostatic properties of each conformation close to that native state. Such a test is vital if the charges are to be used in the construction of force fields for simulating the dynamics of proteins. Figure 9 shows the discrepancy between dipole moments calculated via atomic point charges and DFT. To obtain the best agreement with the DFT dipole moment of a particular structure, it is preferable to fit the DDEC atomic point charges to that particular structure. However, encouragingly, there is very little degradation in performance if DDEC charges that have been averaged over the full ensemble are used instead. This reflects the transferability of DDEC charges between conformations close to the native state. The force field charges perform relatively well for this structure but not as well as the polarized protein-specific charges. This behavior is in agreement with previous work showing that ESP charges give the best fit to ab initio electrostatics for a given structure, but that DDEC charges are generally more transferable to different, similar structures.[19]

Figure 9

Deviation between charge model and ab initio dipole moment vectors for nine NMR conformers of BPTI. DDEC/ONETEP charges have been calculated separately for each structure (dark blue) and averaged over all structures (light blue). AMBER ff99SB charges are included for comparison. The net charge of the system is +6 e.

Protein Dynamics

As we have demonstrated in the previous section, DDEC charges derived from the DFT electron density of entire proteins have properties that make them desirable for use in flexible force fields. We now put this into practice by performing molecular dynamics (MD) simulations of the three proteins described previously (PDB: 1UBQ, 1MHN, 6LYT) using force fields based on our calculated DDEC charges. We have followed the procedure of Tong et al.[9] by taking the bonded and Lennard-Jones parameters directly from the AMBER ff99SB force field but replacing the atom-centered point charges by the DDEC/ONETEP charges. It should be emphasized that torsional and Lennard-Jones parameters are as important as the charges in determining protein dynamics and a large number of alternative force fields exist, many of which outperform the ff99SB force field in comparison with experiment.[51] Nevertheless, here we concentrate on demonstrating the feasibility of using AIM approaches to supplement an existing, commonly used force field. Figure 10 shows the backbone RMS deviations of the three proteins from their X-ray crystal structures over the course of 10 ns simulations, performed at 300 K in explicit water (Supporting Information). For both force fields, the experimental structure is stable over the course of the simulation.

Figure 10

Backbone Cα root-mean-square deviation (RMSd) with respect to the initial experimental PDB structure for 1UBQ, 1MHN, and 6LYT, with running averages taken over 50 ps windows. The first and last four residues have been excluded due to their high flexibility. To assess the quality of simulated MM dynamics, it is common to use, as metrics, experimental NMR-derived quantities. The square of the generalized order parameter S2, based on the Lipari–Szabo model-free approach to analyzing nuclear spin-relaxation measurements,[52] is the plateau value of the time-correlation function C2(t) of the second-order Legrendre polynomial of backbone N–H bond unit vectors e(t):where C2(t) is time-averaged over a trajectory τ.[53]S2 represents the spatial restriction of the motion of N–H bond vectors, and a lower S2 value corresponds to a more flexible residue. S2 values were calculated from our 10 ns MD simulations using the iRED method[54] as implemented in the AMBER11 ptraj module. In this method, a covariance matrix M is constructed from the N–H unit bond vector pairs e in a fixed reference frame and averaged over the entire trajectory:S2 of residue i is then obtained from solving the eigenvalue equation Mm = λm aswhere the sum is taken over N – 5 eigenvectors excluding five with the largest eigenvalues, and m is the ith component of eigenvector m.[53,55] Figure 11 compares simulated and experimental[56−58]S2 profiles for the standard AMBER ff99SB force field and the same force field supplemented by DDEC/ONETEP atomic charges. The only difference between the simulation protocols is in the point charges and so any improvement in the calculated order parameters is due to the inclusion of native state polarization in their calculation. Table 2 compares the agreement between simulated and experimental S2 data (for all residues where experimental data is available). It indicates that DDEC AIM charges perform better than mean field force field charges in providing a suitable electrostatic environment that maintains protein stability throughout the 10 ns trajectory while remaining dynamically consistent with experimental observations.

Figure 11

Table 2

RMS Difference, MAD, and Average S2 Ratio ⟨SMM2/SNMR2⟩ between MM and experimental order parameters in Figure 11a

	RMS S²		MAD S²		⟨S_MM²/S_NMR²⟩
protein	AMBER	DDEC/ONETEP	AMBER	DDEC/ONETEP	AMBER	DDEC/ONETEP
1UBQ	0.060	0.053	0.040	0.040	1.03 ± 0.11	1.03 ± 0.13
1MHN	0.118	0.087	0.074	0.060	0.95 ± 0.17	1.01 ± 0.14
6LYT	0.065	0.047	0.043	0.034	0.99 ± 0.09	1.00 ± 0.06

The agreement of the generalized order parameter with experiment is improved by using the DDEC/ONETEP protein-specific charges.

Comparison between the experimental and calculated backbone N–H order parameter S2 for 1UBQ, 1MHN, and 6LYT using AMBER ff99SB and DDEC atomic charges. Quantitative comparisons are given in Table 2. The agreement of the generalized order parameter with experiment is improved by using the DDEC/ONETEP protein-specific charges. In addition to backbone rigidity, we also compared the NMR scalar coupling JNC′ of the N—H···O=C backbone–backbone hydrogen bonds, which provides a measure of hydrogen bond dynamics within a protein. It was shown by Barfield[59] that JNC′ values, being dependent on the wave function overlap across the hydrogen bond, are strongly geometry dependent and can be parametrized as such by several formulas, with the simplest based on only the H···O distance rOH and H···O=C angle θ:[60]Equation 16 was used to compute the time-averaged JNC′ of backbone–backbone hydrogen bonds[60,61] using the AMBER ff99SB and DDEC/ONETEP MM snapshots over the 10 ns trajectories. The results are correlated with experimental measurements[61,62] in Figure 12. The RMS deviations between simulation and experiment (Figure 12, in brackets) once again indicate that DDEC/ONETEP charges perform at least as well as AMBER charges, illustrating that backbone N–H and C=O bond polarization is also suitably described by the electron density partitioning approach to charge derivation.

Figure 12

JNC′ comparison between simulation and experiment[61,62] for 1UBQ and 1MHN. RMS differences from experiment are given in brackets beside labels.

Conclusions

The DDEC atoms-in-molecule scheme for the assignment of atomic charges through partitioning of the ab initio electron density has a number of features that makes it desirable for charge derivation for flexible force fields.[18,19] However, its use in the analysis of realistic biomolecular systems has, until now, been limited by the computational expense of the underlying QM calculation. In this paper, we have implemented the DDEC scheme in the linear-scaling DFT code ONETEP. Strict localization of the NGWF basis in ONETEP onto atomic regions allows derivation of the electronic density of systems comprising thousands of atoms with affordable computational expense. Optimizing the NGWFs in situ allows for a minimal number of atom-centered orbitals to be used while maintaining near-complete basis set accuracy. Systematic improvement in accuracy is achievable via tuning of a very small number of parameters and, along with a minimal parameter solvation model, allows the user to derive charges for any molecular system with no prior knowledge of its chemical characteristics. The DDEC method is also suitable for implementation in other DFT codes, as well as for example fragment molecular orbital approaches to generating the electron density of large systems, for which we hope our implementation will provide an accuracy benchmark. We have validated our implementation of the DDEC method in ONETEP against the standard CHARGEMOL package with quantum chemistry calculations in Gaussian09. In agreement with previous observations, DDEC charges perform less well than ESP charges in reproducing the ab initio potential of small molecules.[18] Nevertheless, our interest here is in the calculation of polarized charges for entire proteins. Indeed, we have found that the errors in the electrostatic potential at the surfaces of three proteins are very similar to those for small molecules (errors of 1.3–1.5 kcal/mol). Furthermore, the system-specific DDEC charges outperform standard AMBER charges for these molecules (errors of 5–7 kcal/mol). We would expect similar performances for other force fields that have been fit to small molecules without incorporating large-scale polarization. Although the ESP method is unsuited to the direct calculation of atomic charges from a single QM calculation due to the sensitivity of the charges to buried atoms and small conformational changes,[14] a number of studies have determined protein-specific charges based on this method. By separating the system into a large number of dipeptide fragments, fitting ESP charges to the resulting structures and averaging them over several conformations of the protein, stable and transferable charges have been obtained.[12,13] By also including the effects of the environment, represented by point charges, in an iterative ESP fit, it should be possible to further derive polarized protein-specific charges that are stable with respect to small changes in conformation.[8] Although the ESP charges are preferable to DDEC for reproducing the ab initio potential of small molecules, it is not obvious that these conformation-averaged ESP charges are the optimum choice for large molecules. In fact, Manz and Sholl have shown that DDEC and conformation-averaged ESP charges give essentially the same agreement with the ab initio electrostatic potential of small molecules.[19] It would be interesting, in future work, to make this comparison for larger molecules. Further advantages of the DDEC charges are that, since they are derived from a single DFT calculation of entire biomolecules, environmental polarization is naturally included. Moreover, no prior, system-specific chemical knowledge or specialized treatment is required to ensure proper charge derivation, rendering the method extremely versatile. In agreement with previous observations of smaller systems, the charges are insensitive to small conformational changes and buried atoms require no special treatment. In this paper, we are interested in the application of DDEC charges to standard MM force fields. Therefore, we have restricted the DDEC expansion to monopole order in our electrostatic potential comparisons. As discussed by Manz and Sholl,[18] the electrostatic potential description of DDEC charges can be trivially improved by including higher order multipole terms in the partitioned AIM densities, without requiring a new fitting procedure. We have incorporated the DDEC charges of three proteins into a classical force field and run molecular mechanics simulations to compute NMR order parameters and scalar couplings. Our observations indicate that the DDEC AIM charges perform at least as well, if not better than AMBER ff99SB in replicating protein dynamics. The charges of backbone–backbone hydrogen bonds are more polarized in DDEC/ONETEP than in the standard AMBER force field. Nevertheless, they are able to replicate dynamics consistent with experiments when used in MM force fields. Our results are therefore consistent with previous hypotheses that electronic polarization is important in stabilization of a protein’s native state.[9] It should be noted that these DDEC AIMs were obtained without explicit fitting to replicate electrostatic effects nor parametrization to experimental properties. The potential applications of linear-scaling DDEC analysis span a wide range of problems in which accurate determination of electrostatics via atomic point charges is important. DDEC charges have already been used to study molecular adsorption inside metal–organic frameworks.[21,22] In the biomolecular sciences, the optimization of protein-inhibitor interactions for drug design, pKa calculation, study of ion channel conductivity and elucidation of enzymatic reaction mechanisms all depend critically on an accurate description of the electrostatic potential, for which DDEC perform markedly better than standard classical force field charges.

41 in total

1. Generalized Gradient Approximation Made Simple.

Authors:
Journal: Phys Rev Lett Date: 1996-10-28 Impact factor: 9.161

2. A point-charge force field for molecular mechanics simulations of proteins based on condensed-phase quantum mechanical calculations.

Authors: Yong Duan; Chun Wu; Shibasish Chowdhury; Mathew C Lee; Guoming Xiong; Wei Zhang; Rong Yang; Piotr Cieplak; Ray Luo; Taisung Lee; James Caldwell; Junmei Wang; Peter Kollman
Journal: J Comput Chem Date: 2003-12 Impact factor: 3.376

3. Validation of Molecular Dynamics Simulations of Biomolecules Using NMR Spin Relaxation as Benchmarks: Application to the AMBER99SB Force Field.

Authors: Scott A Showalter; Rafael Brüschweiler
Journal: J Chem Theory Comput Date: 2007-05 Impact factor: 6.006

4. Nearsightedness of electronic matter.

Authors: E Prodan; W Kohn
Journal: Proc Natl Acad Sci U S A Date: 2005-08-08 Impact factor: 11.205

5. Comparison of multiple Amber force fields and development of improved protein backbone parameters.

Authors: Viktor Hornak; Robert Abel; Asim Okur; Bentley Strockbine; Adrian Roitberg; Carlos Simmerling
Journal: Proteins Date: 2006-11-15

6. Critical analysis and extension of the Hirshfeld atoms in molecules.

Authors: Patrick Bultinck; Christian Van Alsenoy; Paul W Ayers; Ramon Carbó-Dorca
Journal: J Chem Phys Date: 2007-04-14 Impact factor: 3.488

7. Electrostatic considerations affecting the calculated HOMO-LUMO gap in protein molecules.

Authors: Greg Lever; Daniel J Cole; Nicholas D M Hine; Peter D Haynes; Mike C Payne
Journal: J Phys Condens Matter Date: 2013-03-07 Impact factor: 2.333

8. High-resolution X-ray and NMR structures of the SMN Tudor domain: conformational variation in the binding site for symmetrically dimethylated arginine residues.

Authors: Remco Sprangers; Matthew R Groves; Irmgard Sinning; Michael Sattler
Journal: J Mol Biol Date: 2003-03-21 Impact factor: 5.469

9. Interrogation of the protein-protein interactions between human BRCA2 BRC repeats and RAD51 reveals atomistic determinants of affinity.

Authors: Daniel J Cole; Eeson Rajendra; Meredith Roberts-Thomson; Bryn Hardwick; Grahame J McKenzie; Mike C Payne; Ashok R Venkitaraman; Chris-Kriton Skylaris
Journal: PLoS Comput Biol Date: 2011-07-14 Impact factor: 4.475

10. MolProbity: all-atom structure validation for macromolecular crystallography.

Authors: Vincent B Chen; W Bryan Arendall; Jeffrey J Headd; Daniel A Keedy; Robert M Immormino; Gary J Kapral; Laura W Murray; Jane S Richardson; David C Richardson
Journal: Acta Crystallogr D Biol Crystallogr Date: 2009-12-21

6 in total

1. pH Dependence of Charge Multipole Moments in Proteins.

Authors: Anže Lošdorfer Božič; Rudolf Podgornik
Journal: Biophys J Date: 2017-10-03 Impact factor: 4.033

2. Numerical study on the partitioning of the molecular polarizability into fluctuating charge and induced atomic dipole contributions.

Authors: Ye Mei; Andrew C Simmonett; Frank C Pickard; Robert A DiStasio; Bernard R Brooks; Yihan Shao
Journal: J Phys Chem A Date: 2015-05-18 Impact factor: 2.781

3. Introducing DDEC6 atomic population analysis: part 4. Efficient parallel computation of net atomic charges, atomic spin moments, bond orders, and more.

Authors: Nidia Gabaldon Limas; Thomas A Manz
Journal: RSC Adv Date: 2018-01-11 Impact factor: 4.036

4. Biomolecular Force Field Parameterization via Atoms-in-Molecule Electron Density Partitioning.

Authors: Daniel J Cole; Jonah Z Vilseck; Julian Tirado-Rives; Mike C Payne; William L Jorgensen
Journal: J Chem Theory Comput Date: 2016-04-21 Impact factor: 6.006

5. New scaling relations to compute atom-in-material polarizabilities and dispersion coefficients: part 1. Theory and accuracy.

Authors: Thomas A Manz; Taoyi Chen; Daniel J Cole; Nidia Gabaldon Limas; Benjamin Fiszbein
Journal: RSC Adv Date: 2019-06-19 Impact factor: 4.036

6. Hybrid QM/MM study of FMO complex with polarized protein-specific charge.

Authors: Xiangyu Jia; Ye Mei; John Z H Zhang; Yan Mo
Journal: Sci Rep Date: 2015-11-27 Impact factor: 4.379

6 in total