Van Ngo1, Mauricio C da Silva2, Maximilian Kubillus3,4, Hui Li5, Benoît Roux5, Marcus Elstner3, Qiang Cui4, Dennis R Salahub2, Sergei Yu Noskov1. 1. Centre for Molecular Simulation and Department of Biological Sciences, University of Calgary , Calgary, Alberta, Canada T2N 1N4. 2. Centre for Molecular Simulation, Institute for Quantum Science and Technology and Department of Chemistry, University of Calgary , Calgary, Alberta, Canada T2N 1N4. 3. Institute of Physical Chemistry, Karlsruhe Institute of Technology , Kaiserstr. 12, 76021 Karlsruhe, Germany. 4. Department of Chemistry, University of Wisconsin-Madison , Madison, Wisconsin 53706, United States. 5. Department of Biochemistry and Molecular Biology, University of Chicago , Chicago, Illinois 60637, United States.
Abstract
Despite decades of investigations, the principal mechanisms responsible for the high affinity and specificity of proteins for key physiological cations K(+), Na(+), and Ca(2+) remain a hotly debated topic. At the core of the debate is an apparent need (or lack thereof) for an accurate description of the electrostatic response of the charge distribution in a protein to the binding of an ion. These effects range from partial electronic polarization of the directly ligating atoms to long-range effects related to partial charge transfer and electronic delocalization effects. While accurate modeling of cation recognition by metalloproteins warrants the use of quantum-mechanics (QM) calculations, the most popular approximations used in major biomolecular simulation packages rely on the implicit modeling of electronic polarization effects. That is, high-level QM computations for ion binding to proteins are desirable, but they are often unfeasible, because of the large size of the reactive-site models and the need to sample conformational space exhaustively at finite temperature. Several solutions to this challenge have been proposed in the field, ranging from the recently developed Drude polarizable force-field for simulations of metalloproteins to approximate tight-binding density functional theory (DFTB). To delineate the usefulness of different approximations, we examined the accuracy of three recent and commonly used theoretical models and numerical algorithms, namely, CHARMM C36, the latest developed Drude polarizable force fields, and DFTB3 with the latest 3OB parameters. We performed MD simulations for 30 cation-selective proteins with high-resolution X-ray structures to create ensembles of structures for analysis with different levels of theory, e.g., additive and polarizable force fields, DFTB3, and DFT. The results from DFT computations were used to benchmark CHARMM C36, Drude, and DFTB3 performance. The explicit modeling of quantum effects unveils the key electrostatic properties of the protein sites and the importance of specific ion-protein interactions. One of the most interesting findings is that secondary coordination shells of proteins are noticeably perturbed in a cation-dependent manner, showing significant delocalization and long-range effects of charge transfer and polarization upon binding Ca(2+).
Despite decades of investigations, the principal mechanisms responsible for the high affinity and specificity of proteins for key physiological cations K(+), Na(+), and Ca(2+) remain a hotly debated topic. At the core of the debate is an apparent need (or lack thereof) for an accurate description of the electrostatic response of the charge distribution in a protein to the binding of an ion. These effects range from partial electronic polarization of the directly ligating atoms to long-range effects related to partial charge transfer and electronic delocalization effects. While accurate modeling of cation recognition by metalloproteins warrants the use of quantum-mechanics (QM) calculations, the most popular approximations used in major biomolecular simulation packages rely on the implicit modeling of electronic polarization effects. That is, high-level QM computations for ion binding to proteins are desirable, but they are often unfeasible, because of the large size of the reactive-site models and the need to sample conformational space exhaustively at finite temperature. Several solutions to this challenge have been proposed in the field, ranging from the recently developed Drude polarizable force-field for simulations of metalloproteins to approximate tight-binding density functional theory (DFTB). To delineate the usefulness of different approximations, we examined the accuracy of three recent and commonly used theoretical models and numerical algorithms, namely, CHARMM C36, the latest developed Drude polarizable force fields, and DFTB3 with the latest 3OB parameters. We performed MD simulations for 30 cation-selective proteins with high-resolution X-ray structures to create ensembles of structures for analysis with different levels of theory, e.g., additive and polarizable force fields, DFTB3, and DFT. The results from DFT computations were used to benchmark CHARMM C36, Drude, and DFTB3 performance. The explicit modeling of quantum effects unveils the key electrostatic properties of the protein sites and the importance of specific ion-protein interactions. One of the most interesting findings is that secondary coordination shells of proteins are noticeably perturbed in a cation-dependent manner, showing significant delocalization and long-range effects of charge transfer and polarization upon binding Ca(2+).
Many cellular processes are critically
dependent on the tight control
of ionic homeostasis.[1−5] Metal ions play a variety of roles in cellular signaling, nutrients
and osmolyte transport, and enzymatic catalysis. Particularly, several
proteins have evolved with the ability to bind specific cations, while
efficiently excluding competing ions.[6−11] Various factors that alter the binding specificity or affinity of
enzymes, such as ion channels or ion-coupled transporters, are known
to be associated with a plethora of human diseases, ranging from cardiac
arrhythmias to renal dysfunctions to a multitude of neurological disorders.[12−15] Structural, thermodynamic, or functional studies of ion interactions
with metalloenzymes are staple items of modern physical biochemistry
via molecular simulations, which are playing an ever-increasing role
in understanding various mechanisms of cation recognition by protein
hosts. One of the key limitations that become more and more apparent
is the lack of in-depth understanding of the accuracy of different
approximations, which is required in order to define a realistic ion-protein
model. Below, we will summarize challenges and limitations relevant
to the different levels of theory chosen for this work.The
affinity and specificity of protein hosts are fundamentally
governed by interactions between the ions and protein ligands. Therefore,
the choice of method is pivotal in sampling the conformational space
of metalloproteins. The simplest and commonly accepted models are
often built on the pairwise-additive assumption used in all of the
major biomolecular simulation force fields (FFs)[16−18] (such as CHARMM,
NAMD, AMBER, GROMACS). In this assumption, atoms are described as
Lennard-Jones particles with simple point-charges. By the virtue of
parametrization strategies, it is assumed that quantum effects such
as Pauli exclusion, London interactions, etc., are captured implicitly.
The transferability of potential parameters developed is assumed.
Therefore, the parameter sets that reproduce the quantum mechanics
(QM) or experimental observables in a set of model compounds are expected
to provide accurate descriptions for biomolecules in different environments.
Additive force-fields have been widely successful in studies of biomolecular
structure and dynamics with Monte Carlo and molecular dynamics (MD)
simulations, many of which have reached millisecond time scales,[19,20] thus allowing a connection of simulations to the realm of biochemical
or biophysical experiments.Nevertheless, there are serious
limitations for the additive FFs
to accurately capture polarization effects, which are often critical
for descriptions of ion binding to protein reactive sites and solvation
phenomena, particularly for divalent ions such as Zn2+ or
Mg2+.[21−23] The additivity assumption breaks down, even for “harder”
cations such as Na+ and Ca2+ binding to a variety
of sites in metalloproteins.[24] This challenge
fueled efforts in the development of next-generation FFs, which approximate
more quantum effects, e.g., an extension of the Drude polarizable
FF to Na+, K+, and Ca2+ cations interacting
with metalloproteins, while maintaining the transferability, scalability,
and modest computational cost.[24] The use
of the Drude FF already allows for microsecond simulations with exhaustive
sampling of the relevant conformational space.[25] By virtue of the assumptions involved in the creation of
the Drude FF (auxiliary particle attached to the heavy atom), it is
limited in its ability to describe long-range partial charge transfers.[21] Possible alternatives include either building
a full QM model of the metalloprotein or resorting
to the mixed quantum mechanics/molecular mechanics (QM/MM) treatment
of the binding site and its environment.[7,26−30] Both methods are widely popular, but come with a substantial computational
cost, especially if the secondary shell ligands are included in the
QM region. A promising alternative could possibly be found in the
self-consistent-charge density–functional-based tight-binding
(SCC-DFTB) approach[31] with the latest extension
to include third-order contributions.[32,33] In DFTB3,
an electronic energy is perturbed by a series of density fluctuations
around a reference density, which is usually taken as a sum of neutral
atomic densities. Through a self-consistent solution of the density
fluctuations, charge transfers among atoms in a molecule are properly
described. A particular advantage of DFTB3 is its computational efficiency,
which allows routine sampling of biological systems at the nanoseconds
scale in a DFTB3/MM framework for hundreds of atoms.[34−36]The present study pursues two major goals:To establish the
performance of different
approximations (additive and polarizable FFs, DFTB3) in the description
of protein–cation interaction energies for a variety of selective
metalloenzymes, andTo identify the extent of polarization
and partial charge propagation in various binding sites accommodating
three major physiological cations (Na+, K+,
and Ca2+).To achieve our
first goal, we examined the interaction energies
for a benchmark set of 30 metalloproteins with available high-resolution
crystal structures.[24] The proteins perform
their function at finite temperatures, where conformational flexibility
of the cation binding site is an essential feature of the system.[37] Accurately modeling interactions between metal
ions and an ensemble of structures (vs single structure) is the first
fundamental step to enable reliable conformational sampling of ion-protein
complexes (e.g., via MD or QM/MM-MD schemes). We discuss the accuracy
of an additive FF (CHARMM C36), the Drude polarizable FF, and the
DFTB3 method in describing the ion–protein interactions in
a variety of ion-binding sites with various ligand compositions and
local electrostatic environments. Special emphasis will be placed
on the evaluation of the DFTB3/3OB parameters.[38] The parameters were developed for a benchmark set containing
many small organic molecules. Its applicability to the modeling of
ion binding to metalloproteins is yet to be established. The accuracy
of the three methods with the above-mentioned parameter sets has been
validated against density functional theory (DFT) results.Next,
we evaluated the electrostatic potentials and charge redistribution
effects for direct quantification of ion-induced partial-charge transfers
and local polarization effects. We chose two model enzymes with different
chemical compositions of the first and second coordination shells
and cation preferences for a detailed electrostatic analysis. In particular,
we focus on hen egg-white lysozyme (PDB: 193L), which is known to bind Na+ preferentially but allows K+ and Ca2+ binding,[39,40] and in α-amylase (PDB: 2AAA), which binds both Na+ and
Ca2+ to form a catalytic triad and is highly selective
for Ca2+.[41]Our results
indicate that the inclusion of explicit electronic
polarizability and the ability to model long-range partial charge
transfer is a necessity for modeling Ca2+ binding enzymes.
We observed significant differences in the computed interaction energies
and the electrostatic maps between detailed (DFT and DFTB3) and approximate
models of cation binding to proteins.
Methodologies
Generation
of Protein Conformations
We selected 30
crystal soluble proteins: 10 for each of the cations (K+, Na+, or Ca2+). The structural information
and PDB[42] entries are listed in Table S1 of the Supporting Information (SI).
To create ensembles of conformational states, we solvated the proteins
with TIP3P water molecules and 0.15 M of the corresponding salt. We
then used NAMD[43] and the CHARMM-36 FF to
minimize the structures stepwise and then performed 4-ns equilibration
simulations using NPT coupling at a pressure (P)
of 1 atm and a temperature (T) of 300 K. All sampled
structures display root-mean-square displacement (RMSD) values in
the expected range of standard thermal fluctuations.[24] The RMSD values were computed relative to the crystal structures.
Next, 20 different conformations of each binding site were extracted.
A binding site is defined with a truncation radius of 5.5 Å around
the bound ion to include the first (3.0–3.5 Å) and second
(5–6 Å) coordination shells of ions. The approach is common
in many QM studies of ion binding to proteins.[44] The truncated binding sites have ∼150–300
atoms with 1–6 ligands, several ions, and 2–5 water
molecules. The properties of the truncated binding sites in the first
coordination shell of 3.5 Å are provided in Table . These conformations of the
truncated binding sites were then used to compute ion-binding energies,
which are equal to the interaction energy between an ion and its surroundings,
using the C36, Drude, and DFTB3/3OB parameters. The C36 FF (shortened
as C36) was reported in refs (45−47); the Drude
FF (shortened as Drude) was recently developed in ref (24); and the DFTB3/3OB parameters
were also recently reported in ref (38).
Table 1
Properties of the
Ion-Binding Sites
Identified from the Crystal Structuresa
Types
of Oxygen Atomsb
PDB
OH
–C=O
COO–
H2O
extra ion
net charge
(|e|)
K+ Ion
1J5Y
0
4
0
3
0
1
1JF8
1
5
1
4
0
–1
1NI4
0
5
0
1
0
2
2BFD
1
6
0
1
0
0
1P36
1
1
1
4
0
–1
1LJL
1
5
1
3
0
–1
1TYY
0
5
1
3
0
1
1DTW
1
6
0
1
0
0
1V3Z
0
5
0
0
0
2
4LS7
1
6
1
2
0
1
Na+ Ion
193L
1
4
0
2
0
3
1E43
0
1
6
1
2Ca2+
0
1SFQ
0
2
0
5
0
2
1GEN
0
4
0
0
1Cl–
0
3N0U
0
4
0
3
0
3
1L5B
0
5
0
2
0
1
1QNJ
0
5
3
1
0
–1
1QUS
1
1
4
1
0
–3
1S36
0
5
1
0
1Cl–
0
1SK4
1
5
0
1
0
0
Ca2+ Ion
3LI3
0
4
3
4
0
–1
1BLI
0
1
5
1
1Na+
0
2UUY
0
3
2
2
0
–2
1A4V
0
2
3
2
0
–2
4KTS
0
3
3
2
0
–2
2AAA
0
2
4
3
0
–5
3TZ1
1
1
5
1
0
–2
1EXR
0
1
5
1
0
–1
1RWY
1
1
6
0
0
0
3ICB
0
5
2
1
0
–1
The total charges are summed
up over all atoms in the truncated binding sites. Each element (types
of oxygen atoms, H2O, and ions) is counted within a 3.5
Å sphere centered at the bound ion in the 5.5 Å truncated
binding sites.
OH, −C=O,
and COO– represent hydroxyl, carbonyl, and carboxylate
oxygen
atoms, respectively.
The total charges are summed
up over all atoms in the truncated binding sites. Each element (types
of oxygen atoms, H2O, and ions) is counted within a 3.5
Å sphere centered at the bound ion in the 5.5 Å truncated
binding sites.OH, −C=O,
and COO– represent hydroxyl, carbonyl, and carboxylateoxygen
atoms, respectively.
Quantum-Mechanics
Computations
In this work, we have
used density functional theory (DFT) to assess quantum effects for
reduced models of the binding sites. To evaluate an optimal level
of DFT for reference to compare the binding energies from the three
parameter sets, a quadruple-ζ basis set, 6-311++G(2d,2p),[48−50] and seven different functionals (PBE,[51,52] PW91,[53] BLYP,[54,55] TPSS,[56] B3LYP,[57] M11,[58] and B2PLYP[59]) were used for
two small protein-ion complexes, which are shown in Figure for 193L and 2AAA interacting with
Na+, K+, and Ca2+. Using these small
complexes, we intend to develop a strategy for identifying electronic
perturbations due to different cations in the binding sites. An analysis
of the performance of the various methods is provided in the SI to justify the choice of the DFT references.
The NWChem v-6.5 package[60] was used for
DFT calculations for small ion-protein complexes, while Gaussian 09
Rev. A2[61] was used for all of the larger
binding sites. We were interested in assessing an ensemble of structures
with single-point computations. To generate ensemble energies, we
considered more than 600 truncated structures. All DFT binding energies
were corrected for the basis set superposition errors (BSSEs).
Figure 1
Reduced representations of cation binding sites
in 193L and 2AAA with Na+, K+, and Ca2+. These simplified structures
enable inclusion of the largest basis set for DFT validation and electrostatic
analyses. More than 600 structures, which are used for evaluating
the different ion-protein models, are fully truncated from the conformational
sampling described in the Methodologies section.
To understand the electrostatic environments better, we computed
orbital densities, the electron localization function (ELF), and partial
charges including Mulliken,[62] ChelpG,[62] and Merz–Kollman[63] for the further-simplified complexes of 193L and 2AAA (Figure ), using B3LYP/6-31+G(d,p), B3LYP/CEP-121, and DFTB3/3OB.
The rationale of computing Mulliken charges is that we can directly
compare the results from DFT and DFTB calculations, and observe the
possible quantitative changes in protein atoms perturbed by the metal
ions. We used Gaussian 09 Rev. A2[61] to
generate cube files in association with Multiwfn[64] software to analyze the electronic properties. Jmol software
was used to prepare orbital densities, electrostatic maps, and ELF
images. All DFTB3/3OB calculations were done with DFTB+ v1.2.2,[65] using the Slater–Koster parameter files
recently developed by Elstner and colleagues.[32,66,67] For some small ion-protein complexes, electronic
smearing was performed to obtain the convergence of the self-consistent
charges (SCC) inside DFTB3/3OB. The Fermi-filling maximum temperature
factor for producing the smearing was 70 K, and the Mermin free energy
(Etot – TSele) was also used.Reduced representations of cation binding sites
in 193L and 2AAA with Na+, K+, and Ca2+. These simplified structures
enable inclusion of the largest basis set for DFT validation and electrostatic
analyses. More than 600 structures, which are used for evaluating
the different ion-protein models, are fully truncated from the conformational
sampling described in the Methodologies section.
Results and Discussion
On the
Choice of Reference for the Full DFT Method
A detailed analysis
for the choice of ab initio methods
is provided in the validation section of the SI; here, we briefly summarize the results. G4 computations[48,68] predict the 20 experimental atomization energies of the small complexes
containing Na+, K+, and Ca2+ with
an uncertainty of 2 kcal/mol; thus, G4 is considered to be a good
theoretical reference (see Table S7 in
the SI). Tables S8–S25 in the SI
show the same accuracy for different types of GGA, hybrid, meta-GGA,
and double-hybrid functionals. The DFTB3/3OB results, with respect
to those obtained with 6-31+G**, 6-31++G**, and CEP-121, span the
same energy range of almost ±20 kcal/mol (see Figures S14 and S15 in the SI). The basis set 6-311++G(2d,2p)
is the best for estimating atomization and binding energies, but the
computational cost shown in Figure S17 in
the SI is prohibitive for large systems. We note that the results
obtained with M11 and double-hybrid functionals have the same accuracy
as B3LYP; therefore, the choice of B3LYP for a total of more than
600 conformational structures of 150–300 atoms is reasonable,
in terms of accuracy and computational cost.The numerical results
for the binding energies for the two small complexes (Figure ) are collected in Tables S2 and S3. To check the effect of a larger
basis set, 6-311++G(2d,2p), interaction energies are compared with
the results reported by Li et al.[24] with
6-31++G(d,p) (Tables S2–S5). Table S6 collects the basis-set-dependent differences,
in terms of the mean values (μ), standard deviations (σ),
and min–max differences (δ). It shows that, for the Na+-complex, the binding energies are not noticeably different
between 6-31++G(d,p) and 6-311++G(2d,2p). The inclusion of extra diffuse
and polarization basis functions produces a modest effect at large
computational cost. However, for the Ca2+-complex, the
average binding energy for B3LYP/6-311++G(2d,2p) is −24.5 kcal
mol–1, which is higher than that from B3LYP/6-31++G(d,p)
(using M11, it was −15.5 kcal mol–1). This
suggests that the addition of extra diffuse and polarization basis
functions in the Ca2+-complex can make a critical contribution
to the absolute binding energies. The relative deviations and the
correlation on the binding energies of Ca2+-2AAA small
complex are presented in Figure . The similarity between the different levels of method/theory
and the reference electronic structures PBE, B3LYP, and M11, using
the largest basis set 6-311++G(2d,2p), can be seen in Figure .
Figure 2
Relative deviation and
correlated binding energies (ΔBindE) for Ca2+-2AAA small model
complexes (Figure ) for different levels of theory, using different electronic structure
references: (a) PBE/6-311++G(2d,2p), (b) B3LYP/6-311++G(2d,2p), and
(c) M11/6-311++G(2d,2p).
Relative deviation and
correlated binding energies (ΔBindE) for Ca2+-2AAA small model
complexes (Figure ) for different levels of theory, using different electronic structure
references: (a) PBE/6-311++G(2d,2p), (b) B3LYP/6-311++G(2d,2p), and
(c) M11/6-311++G(2d,2p).The long-range polarization effects are expected to be more
pronounced
in divalent-ion binding sites than in monovalent-ion ones. Therefore,
accurate models of divalent-cation binding may demand larger basis
sets. For example, we observe that electron densities display a significant
perturbation in the second coordination shell of Ca2+,
even at distances of 5–6 Å (see below), while the perturbations
due to Na+ and K+ are considerably smaller.
The long-range electrostatic effects induced by Ca2+ binding
could strongly influence the biophysical and biochemical properties
with charge perturbation reaching beyond the first coordination shell.
The DFTB3/3OB results are close to those obtained from all-electron
or pseudo-potential methods (CEP-121). The pseudo-potential (CEP-121)
approximation appears to be reasonable, compared to all-electron calculations
for Na+ or K+ binding sites. For Ca2+ binding sites, DFTB3/3OB or CEP-121 display larger deviations in
interactions energies, compared to the full electron calculations,
as shown in Tables S2, S3, and S6.The average deviation of the binding energies for the reduced model
of the Ca2+-2AAA complex from computations with DFTB3/3OB
and CEP-121 is 25.8 kcal mol–1 with the B3LYP functional,
but 10.4 kcal mol–1 with the M11 functional. The
differences between DFTB3/3OB or B3LYP/CEP-121 and all-electron calculations
using B3LYP/6-311++G(2d,2p) and M11/6-311++G(2d,2p) are 56.9 kcal
mol–1 and 47.4 kcal mol–1, respectively.
It is important to note that there is a significant variation in the
computed interaction energies among all-electron methods themselves.
The summary of the results for different all-electron DFT functionals
is presented in Figure a, suggesting that different functionals and basis sets can vary
the binding energies by up to 10%, e.g., between M11/CEP121 and B3LYP/6-311++G(2d,2p).
Henceforth, a case was considered as an outlier when exceeding this
variation of 10%.
Figure 3
(a) Binding energies
from all-electron DFT approximations (numerical
values given in Table S5). (b) Binding
energies computed by using B3LYP/CEP121, C36, Drude, and DFTB3/3OB
calculations for Na+, K+, and Ca2+ ions.
Performance of Three Ion-Protein Models
The protein
dynamics due to thermal fluctuations of ions and ligands is essential
for enzymatic functions;[37] hence, there
is an apparent need for sampling energy surfaces governing ion binding
to proteins. To lay a foundation for such sampling, a thorough examination
of the major ion–protein models is essential.(a) Binding energies
from all-electron DFT approximations (numerical
values given in Table S5). (b) Binding
energies computed by using B3LYP/CEP121, C36, Drude, and DFTB3/3OB
calculations for Na+, K+, and Ca2+ ions.The binding energies computed
from both Drude and DFTB3/3OB are
consistent with the DFT results given the deviations of ≤10%,
which returns 8–10 agreement cases for each ionic species.
This is not surprising, because the Drude parameter sets were trained
to match the DFT interactions.[24] Only two
K+-binding enzymes 1V3Z and 4LS7 in the Drude model display deviations
of 18% and 14% in computed interaction energies, relative to all-electron
DFT computations. The accuracy of the additive FF (C36) generally
becomes poorer when the binding energies are lower than −150
kcal/mol. Therefore, its performance deteriorates noticeably for all
of the studied Ca2+ binding sites. The interaction energies
computed with C36 display an average difference of ∼20%, relative
to the DFT results. This indicates serious challenges for reliable
simulations of Ca2+ binding and/or permeation with additive
FFs.On the other hand, the blind use of the recent DFTB3/3OB
parameters
developed for small molecules/protein mimetics provides an encouragingly
reasonable description for most of the interaction energies. For the
K+-binding proteins, two cases display large deviations
(12% for 1TYY and 16% for 4LS7). For the Na+-proteins, three cases have deviations of
12% (3N0U and 1E43) and 17% (1GEN). Notably, for all
of the Ca2+-binding proteins, DFTB3/3OB yields excellent
consistency (<8% deviations) with the DFT results. We found that,
in two systems (see Table ) having multiple cation-binding sites, both Drude and DFTB3/3OB
accurately reproduce the binding energies (deviations of 5%–10%
for 1BLI and
8%–12% for 1E43), with respect to the DFT results. This suggests that Drude and
DFTB3/3OB can describe multiple-ion interactions with protein ligands,
for example, in Na and Ca channels. However, DFTB3/3OB overestimates
the binding energy by 17%, with respect to the DFT results for 1GEN with a Cl– anion present in the structure, but yields an accurate result for 1S36. Drude yields more
accurate results than the DFTB3/3OB model for 1GEN, probably because
of the fact that the protein–Cl– interaction
was also optimized. In all studied multi-ion systems, the C36 FF yields
unreliable results, compared to DFT computations. This finding may
have major implications on the recent debate on multi-ionic permeation
mechanisms in K+ and Na+ ion channels.[7,69]To understand how different chemical groups affect the accuracies
of the methods, we analyze their performance for different binding
sites. Figure shows
the differences between the binding energies computed by the C36,
Drude, DFTB3, and all-electron DFT with the decomposition for the
three most common coordinating ligands (i.e., hydroxyls, carbonyls,
and carboxylates).
Figure 4
Binding energy difference between the methods and DFT
calculations
versus oxygen atoms in the binding sites. “OH”, “CO”,
and “COO” represent hydroxyl, carbonyl, and carboxylate
oxygen atoms, respectively. These numbers of oxygen atoms are counted
within a 3.5-Å sphere centered at the bound ions in the crystal
structures (see Table and Table S1 for numerical values).
Binding energy difference between the methods and DFT
calculations
versus oxygen atoms in the binding sites. “OH”, “CO”,
and “COO” represent hydroxyl, carbonyl, and carboxylateoxygen atoms, respectively. These numbers of oxygen atoms are counted
within a 3.5-Å sphere centered at the bound ions in the crystal
structures (see Table and Table S1 for numerical values).Figure a shows
that hydroxyloxygen atoms are unlikely to be the main cause for the
discrepancy between C36 and DFT results on the interaction energies.
The accurate modeling of the cation–carbonyl interactions already
requires some explicit treatment of the polarization effects or charge
transfer (Figure b).
This shows that the performance of the methods improves from C36 to
Drude to DFTB3. The need for explicit accounting for QM effects is
apparent for all sites with carboxylate groups directly coordinating
a bound cation. The performance of C36 deteriorates with the number
of carboxylates for all cases except PDB 1E43, which has two Ca2+ binding
sites. C36 clearly fails to describe the interactions between the
ions and the negatively charged carboxylate groups accurately. In
contrast, neither Drude nor DFTB3/3OB show a systematic dependence
on the types of oxygen atoms or the presence/absence of charged functional
groups. Note that an assessment of the actual accuracy with DFT as
a benchmark is a challenging task in its own right. For example, different
DFT approximations with all-electron basis sets or a pseudopotential,
one can get 10–40 kcal/mol differences between the two approximations
(see Figure ), especially
for calcium binding sites. Therefore, we are more interested in trends,
rather than matching absolute values.
The Extent of the Partial
Charge Transfer
Although
it represents an important advance, the Drude FF may also be limited
in its description of partial charge transfer and long-distance polarization
effects.[21] The latter has been proposed
to be an essential factor in determining the cation specificity of
enzymes.[70−72] Dudev and Lim[72] performed
a PDB survey for elucidation of the secondary shell ligands that affect
cation binding to metalloproteins. They noted that (i) many enzymes
have evolved to have tight coupling between first- and second-shell
ligands, and (ii) ion–ligand interactions with residues located
in secondary shells are often of the same magnitude as those with
the directly coordinating functional groups. To shed more light on
the local electrostatic environments, we computed the molecular electrostatic
potential (MEP) and the electron localization function (ELF) of one
binding site (193L), in which we probe local electrostatics in the binding pockets
by replacing Na+ with K+ and Ca2+. Figure shows that
the MEPs for Na+ and K+ complexes of lysozyme
are very similar. However, the MEP and ELF in the Ca2+ complex
is quite different from the others.
Figure 5
Molecular electrostatic potential (MEP)
in atomic units (a.u.)
for the simplified 193L complex, whose binding site is interchanged with Na+,
K+, and Ca2+.
Molecular electrostatic potential (MEP)
in atomic units (a.u.)
for the simplified 193L complex, whose binding site is interchanged with Na+,
K+, and Ca2+.The effect of the cation binding can be quantified further
by assessing
partial charges on the protein ligands. Partial charges were calculated
with two approximations (ChelpG and MK), which are shown in Figure . The perturbations
in the partial charges of O atoms caused by the binding of Na+ and K+ are almost identical and strictly localized
to the first-shell ligands. The perturbation of the charges due to
Ca2+ is very sizable and affects not only the nearest (first
coordination shell) atoms, but also significantly the second coordination
shells as found in, for instance, ion-sulfur proteins[73] and Ferritin.[74]
Figure 6
Absolute differences
in partial charges of the O atoms in 193L; the index value
of 1 indicates the O atom closest to the binding site, and the index
value of 6 indicates the O atom furthest from the binding site. Charges
were calculated with B3LYP/6-31+G(d,p).
Absolute differences
in partial charges of the O atoms in 193L; the index value
of 1 indicates the O atom closest to the binding site, and the index
value of 6 indicates the O atom furthest from the binding site. Charges
were calculated with B3LYP/6-31+G(d,p).To provide additional insight into the underpinnings of the
observed
perturbations due to the cation binding, we analyzed the highest occupied
molecular orbitals (HOMOs) of the complex due to the cation substitution
using B3LYP/6-31+G(d,p), B3LYP/CEP-121 and DFTB3/3OB. Both the lowest
unoccupied molecular orbital (LUMO) and the HOMO (Figures S8–S12 in the SI) obtained from a few other
binding sites are also provided in the SI.Figure shows
the
HOMO orbitals for the 193L complex using B3LYP/6-31+G**, B3LYP/CEP-121, and DFTB3/3OB,
which are localized near the ions and are more or less identical for
Na+ and K+-complexes. In sharp contrast, the
HOMO orbitals (and, thus, the overall electron densities shown in Figure S13 in the SI) computed by B3LYP for the
Ca2+-complex are remarkably different from the other complexes:
electrons are found to be transferred through the nearest ligands
to delocalize onto the second coordination shell. DFTB3/3OB seems
to capture a similar fraction of the electron delocalization[75,76] reported from the DFT calculations. Of course, the amount of charge
transfer or delocalization in the various methods is critically dependent
on the binding site.[77] For example, the
HOMO obtained from DFTB3/3OB in 2AAA (Figure S11) is not as dramatically different from the B3LYP HOMO as from those
in Figure and Figure S9. An ELF analysis of the Na+, K+, and Ca2+ model complexes also indicates
that the electronic densities of Na+ and K+ give
rise to almost the same partial charges of 1|e| when
using B3LYP with 6-31+G(d,p) and CEP-121 (Figures S1 and S2), but there is a charge transfer from the ligand
to Ca2+, resulting in a partial charge as small as 1.2
|e|.
Figure 7
Highest occupied molecular orbital (HOMO) for the 193L complex probed with
Na+, K+, and Ca2+ on its binding
site for DFT and DFTB3/3OB.
Highest occupied molecular orbital (HOMO) for the 193L complex probed with
Na+, K+, and Ca2+ on its binding
site for DFT and DFTB3/3OB.To illustrate how the electrostatic environments of an entire
protein
host affects the ion binding affinities, Figure shows the deviations between the binding
energies and DFT calculations against the net charges of the truncated
binding sites. Consistent with Figure c, the electrostatic environments (see Figure a) modeled by C36 induce overbinding
to the ions, since most of the data points have negative-energy deviations
(see Figure ). In
contrast, Figures b and 8c simply show random-like data for
energy deviations of <17% over many of the electrostatic environments
having a net charge of |q| ≤ 5|e|. In the neutral systems, DFTB3/3OB and Drude sometimes experience
difficulties in accurately calibrating the electron densities to have
an accuracy of <10%. Therefore, the efficiency of the DFTB3/3OB
and Drude FFs for simulating larger systems and longer time scales
may be very appealing, because, for most of the studied systems, it
appears to be a reasonable tradeoff with only a few outliers, having
deviations of 12%–17%, relative to the DFT computations.
Figure 8
Binding energy
deviation ratio (ΔE/E) versus
the net charge (|e|) of the binding
sites for C36, Drude, and DFTB3/3OB.
Binding energy
deviation ratio (ΔE/E) versus
the net charge (|e|) of the binding
sites for C36, Drude, and DFTB3/3OB.
Conclusions
In conclusion, we have benchmarked the
accuracy of the nonpolarizable
C36 force field (FF), the newly developed Drude polarizable FF, and
the DFTB3/3OB method against DFT calculations for effectively describing
interactions among 30 enzymatic metalloproteins and cations K+, Na+, and Ca2+. We found that, even
though C36 can reasonably describe the interactions between the monovalent
ions and some proteins (∼60% success rate), it fails to accurately
reproduce QM trends observed for interactions between Ca2+ and all 10 of the proteins. In contrast, the Drude FF and DFTB3/3OB
reproduce QM energetics for most of the studied binding sites with
various electrostatic environments. Therefore, taking account of polarization
effects is crucial for studies of Ca2+-binding sites and
is strongly recommended for divalent cations. It also captures the
main average trends for all conformations of the Ca2+-complexes
very well, although the absolute values of the binding energy may
vary somewhat (∼10%), compared to higher levels of theory.
We provide evidence that this may be due to the approximate treatments
of long-range electronic effects. Detailed examination of electron
densities shows that, while C36 and Drude FF are unable to sufficiently
capture the delocalization effect of electron clouds onto the second
coordination shells of divalent ions, DFTB3/3OB provides a reasonable
level of accuracy. That is, the electron densities of ligands around
Ca2+ can be delocalized and, therefore, additional measures
for charge-transfer and polarizable effects must be taken and long-range
interaction effects and polarization of the electron clouds could
be crucial to the accuracy of any biological models. This work is
a critical step toward future research that intends to determine the
optimal protein structures that result in the largest delocalization
effect in divalent-ion binding sites, and reveal how such structures
determine the functionalities of biological systems at multiple levels
of theory.
Authors: H M Berman; J Westbrook; Z Feng; G Gilliland; T N Bhat; H Weissig; I N Shindyalov; P E Bourne Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971
Authors: Robert B Best; Xiao Zhu; Jihyun Shim; Pedro E M Lopes; Jeetain Mittal; Michael Feig; Alexander D Mackerell Journal: J Chem Theory Comput Date: 2012-07-18 Impact factor: 6.006
Authors: Benoît Roux; Simon Bernèche; Bernhard Egwolf; Bogdan Lev; Sergei Y Noskov; Christopher N Rowley; Haibo Yu Journal: J Gen Physiol Date: 2011-05 Impact factor: 4.086
Authors: Feng Qiu; Adam Chamberlin; Briana M Watkins; Alina Ionescu; Marta Elena Perez; Rene Barro-Soria; Carlos González; Sergei Y Noskov; H Peter Larsson Journal: Proc Natl Acad Sci U S A Date: 2016-09-19 Impact factor: 11.205
Authors: Van Ngo; Hui Li; Alexander D MacKerell; Toby W Allen; Benoît Roux; Sergei Noskov Journal: J Chem Theory Comput Date: 2021-02-04 Impact factor: 6.006