Literature DB >> 25099338

Using quantum mechanical approaches to study biological systems.

Abstract

Conspectus Quantum mechanics (QM) has revolutionized our understanding of the structure and reactivity of small molecular systems. Given the tremendous impact of QM in this research area, it is attractive to believe that this could also be brought into the biological realm where systems of a few thousand atoms and beyond are routine. Applying QM methods to biological problems brings an improved representation to these systems by the direct inclusion of inherently QM effects such as polarization and charge transfer. Because of the improved representation, novel insights can be gleaned from the application of QM tools to biomacromolecules in aqueous solution. To achieve this goal, the computational bottlenecks of QM methods had to be addressed. In semiempirical theory, matrix diagonalization is rate limiting, while in density functional theory or Hartree-Fock theory electron repulsion integral computation is rate-limiting. In this Account, we primarily focus on semiempirical models where the divide and conquer (D&C) approach linearizes the matrix diagonalization step with respect to the system size. Through the D&C approach, a number of applications to biological problems became tractable. Herein, we provide examples of QM studies on biological systems that focus on protein solvation as viewed by QM, QM enabled structure-based drug design, and NMR and X-ray biological structure refinement using QM derived restraints. Through the examples chosen, we show the power of QM to provide novel insights into biological systems, while also impacting practical applications such as structure refinement. While these methods can be more expensive than classical approaches, they make up for this deficiency by the more realistic modeling of the electronic nature of biological systems and in their ability to be broadly applied. Of the tools and applications discussed in this Account, X-ray structure refinement using QM models is now generally available to the community in the refinement package Phenix. While the power of this approach is manifest, challenges still remain. In particular, QM models are generally applied to static structures, so ways in which to include sampling is an ongoing challenge. Car-Parrinello or Born-Oppenheimer molecular dynamics approaches address the short time scale sampling issue, but how to effectively use QM to study phenomenon covering longer time scales will be the focus of future research. Finally, how to accurately and efficiently include electron correlation effects to facilitate the modeling of, for example, dispersive interactions, is also a major hurdle that a broad range of groups are addressing The use of QM models in biology is in its infancy, leading to the expectation that the most significant use of these tools to address biological problems will be seen in the coming years. It is hoped that while this Account summarizes where we have been, it will also help set the stage for future research directions at the interface of quantum mechanics and biology.

Entities: Chemical Disease Gene Species

Mesh：

Substances：
Ligands
Proteins
Water

Year: 2014 PMID： 25099338 PMCID： PMC4165465 DOI： 10.1021/ar5001023

Source DB: PubMed Journal: Acc Chem Res ISSN： 0001-4842 Impact factor: 22.384

Introduction

Quantum mechanics (QM) has revolutionized our understanding of the structure and reactivity of small molecular systems. For example, QM based methods provide structural information in excellent agreement with experiment, match experimental barrier heights for chemical reactions, and provide chemically accurate interaction energies for hydrogen-bonded or dispersive systems.[1,2] Given the tremendous impact QM has had for systems of <100 atoms, it is attractive to believe that this impact could also be brought into the biological realm where systems of a few thousand atoms and beyond are standard. To achieve this goal, the bottlenecks of QM methods have to be addressed. Depending on the method employed, various steps in a QM calculation can be rate determining. In semiempirical methods, matrix diagonalization is rate limiting, while in density functional theory (DFT) or Hartree–Fock (HF) theory electron repulsion integral computation is rate-limiting.[3] These theories neglect the correlation energy, which is important to account for to obtain highly accurate results including barrier heights and interaction energies (especially dispersion dominated ones). In this case, the correlation treatment of choice is rate-limiting be it Møller–Plesset (MP) theory or coupled-cluster (CC) methods.[3] State-of-the-art linear-scaling algorithms, which linearize the computational cost with the system size, have attracted much attention.[4−6] Significant effort has been devoted to the development of linear-scaling methods to compute the total energy of large molecular systems at the HF or DFT level.[7−14] One of the challenges is to speed up the calculation of long-range Coulomb interactions when assembling the Fock matrix elements. Fast multipole based approaches have successfully reduced the scaling in system size to linear[11,12,15−17] and made HF and DFT calculations affordable for larger systems when small to moderate sized basis sets are utilized. There are also fragment-based methods for QM calculation of protein systems including the divide and conquer (D&C) method of Yang,[8] Yang and Lee,[9] Dixon and Merz,[18,19] Gogonea et al.,[20] Shaw and St-Amant,[21] and Nakai and co-workers,[22−25] the adjustable density matrix assembler (ADMA) approach method of Exner and Mezey,[13,26−28] the fragment molecular orbital (FMO) method of Kitaura and co-workers,[29−31] the X-pol model of Gao and Xie,[32] and the molecular fractionation with conjugate caps (MFCC) approach developed by Zhang and co-workers.[33,34] Most applications of these methods to protein systems have been largely limited to semiempirical, HF, and DFT calculations. Among these approaches, FMO has been applied to higher level ab initio calculations such as second-order Møller–Plesset perturbation theory (MP2)[35] and coupled cluster theory (CC).[36] Nakai and co-workers have described D&C-MP2[22,25,37] and D&C-CCSD[38] approaches and applied them to linear or near-linear systems. QM excels at the static modeling of a molecular system. However, in chemistry and biology, fluctuations are important to describe a broad range of effects. Thus, the ensemble picture of a molecular system is more appropriate, and while molecular dynamics (MD) methods are beginning to address dynamical issues using classical potentials, how to address this “sampling” issue when using more expensive QM based approaches will need to be addressed. Herein we briefly review work carried out in my laboratory over the past decade in exploiting fully QM methods in the study of biological systems (see Scheme 1). In particular, we will discuss the use of D&C techniques to address the matrix diagonalization step in semiempirical methods and in HF and DT approaches. After summarizing the technical aspects of how to compute the QM energy of many thousands of atoms, we will review several fully QM application studies carried out for the first time in my laboratory on biological systems. Finally, I will discuss the future outlook of using QM to solve biological problems and the ongoing challenges of using a fully QM model on large systems.

Scheme 1

Application of QM Methods to Biological Problems

Divide-and-Conquer Approach for Hartree–Fock Based Calculations

In biological systems, the D&C approach[8] takes advantage of chemical locality. Details of the implementations of this approach at the semiempirical,[9,18,19] HF and post-HF[21−25,39] levels of theory are given in the literature, so below we only give a brief overview. In this approximation, it is assumed that the atoms that are far away from the region of interest only weakly influence local regions of a protein. Hence, the entire system is divided into fragments called core regions (Coreα) with an associated buffer region (Bufferα) assigned to each core region to account for the local environmental effects. The combination of every core region and its buffer region constitutes a subsystem (Rα) (see Scheme 2). Local MOs of each subsystem are solved by the Roothaan–Hall equationwhere Fα and Sα are local Fock overlap matrices, respectively.

Scheme 2

D&C Subsetting (Fragmentation) Scheme

After the local MO coefficient matrices Cα are obtained, the total density matrix system is given bywhere Dμνα is the partition matrix,and pμνα is the local density matrix defined bywhere nα is a smooth approximation to the Heaviside step function:εF is determined through the normalization of the total number of electrons in the entire system.After the density matrix is converged, the total HF energy is given aswhere Hμνα is the local one-electron core Hamiltonian matrix defined similarly to the local Fock matrix (see eq 2). For HF calculations, the construction of the Coulomb matrix and exchange matrix along with the diagonalization of the Fock matrix are the three most time-consuming steps, while the diagonalization of the Hamiltonian matrix scales as O(N3). In the D&C scheme, the diagonalization calculation is performed for each subsystem, linearizing the diagonalization step as a function of the number of subsystems. However, it is important to realize that the D&C algorithm does not help to reduce the scale of computation of the Coulomb matrix and exchange matrix. Other approaches[18] provide ways to linearize these steps. However, for semiempirical methods where matrix diagonalization is rate-limiting, the D&C approach greatly enhances the capabilities of these methods.[9,18,19]

Applications of Fragment QM to Biological Problems

We have used our fragment QM methods (largely at the semiempirical level) to study biological problems with a focus on protein structure and solvation,[40−44] structure-based drug design[45,46] and biological structure refinement by NMR and X-ray spectroscopy.[47−64] Below we will briefly highlight work done in our laboratory using these methods to address specific biological problems.

A New View of the Protein–Water Interface

The interface between a biological molecule and water affects its stability and function.[65] The classical picture of this interface does not allow for the commingling of the electron density (the so-called charge transfer (CT) effect) between the protein and the surrounding aqueous environment. We examined the QM nature of a small protein (E. coli cold-shock protein A; CspA) in aqueous solution through the use of semiempirical D&C calculations over 100 snapshots. The surprising result was the observation that two units of charge were transferred from the protein surface to that of the water molecules. Thus, water molecules, at the protein interface, are perturbed not only by the rugged protein interface but also electronically. Via integration over the entire protein surface, the net results of numerous small net transfer of charge yield the observed value of 2 units of charge. Not surprisingly, the main participants in this charge transfer effect are charged amino acids at the protein–water interface (see Figure 1). The magnitude of the CT transfer effect on the energetics of hydrogen bond interactions has been debated,[44,66−68] but we reported[44] a value of ∼30% for a series of dimers using a Kitaura-Morokuma analysis[69] (this latter paper also reports a similar CT effect for the water dimer), a value later supported by more sophisticated calculations.[67] We carried out extensive validation studies of the nature of CT[44] because our pioneering studies employed semiempirical models. Through these studies, we concluded that in terms of the magnitude of the CT effect the semiempirical models employed were reliable. Subsequent work using ab initio FMO[70] and ab initio MD calculations[71] reconfirmed our initial predictions.

Figure 1

E. coli cold-shock protein A (CspA) surrounded by a water layer. The green water molecules are near Lys residues and experience a slight loss of electron density (∼0.06e in aggregate), while the orange water molecules are near Glu/Asp residues gain electron density (∼0.2e in aggregate). The magenta water molecules are not within 5 Å of the charged residues. In related work, charge transfer in receptor ligand interactions in the context of SBDD has been examined by my laboratory.[45] In a study of 165 noncovalent protein–ligand complexes, we found that 11% of the complexes had more than 0.1e of charge transferred from the protein to ligand. Not surprisingly, in 49 metalloenzyme complexes, on average, there is 0.6e transferred between the protein and the ligand. The direction of the CT effect depended on the nature of the protein–ligand complex. For example, in matrix metalloproteases (MMP), charge was transferred from the protein to the ligand, while for human carbonic anhydrase (HCA) and carboxypeptidases (CPA) charge was transferred n the opposite direction. The view that the interface between a protein, water, or a ligand is electronically altered via polarization and CT effects is now generally accepted. Indeed, the observation of CT at the protein–water interface highlights the need to include this effect in modern force field methods, and work is being pursued along these lines.[72,73] However, without the aid of QM based methods, this effect would not have been discovered and its importance might have gone unappreciated for some time.

QM in Structure-Based Drug Design (SBDD)

The docking and subsequent scoring of small molecules bound to receptors has been a mainstay of modern SBDD tools.[74,75] However, the reliability of these approaches has been debated.[75−77] Most tools in this class utilize approximate potentials to study the interaction between the protein and its ligand. We reasoned that the use of more advanced QM based methods might improve the outcome.[45,46] We reported the first application of linear-scaling methods to SBDD where we calculated the binding affinity of ligands bound to the HCA and CPA with reasonable accuracy.[46] The free energy of binding in solution was calculated using the following set of equations:where the free energy of binding in solution was calculated as the sum of the gas-phase interaction energy (ΔGbg) and a solvation correction (ΔGsolv terms). The gas-phase interaction energy consisted of enthalpic (semiempirical computed ΔHf’s) and entropic components. The electrostatic part of the enthalpic component was calculated using the DivCon program, employing semiempirical Hamiltonians.[46] A dispersion correction was lso utilized (the (1/R6)LJ term). The solvation correction was calculated as a difference of the solvation free energies of the protein–ligand complex (PL), the protein (P), and the ligand (L). The solvation free energy was calculated using a Poisson–Boltzmann (PB) based self-consistent reaction field (PB/SCRF) method.[78] A major advantage of PB is the internal dielectric of the protein need not be preset as in classical treatments.[78] Finally, an empirical entropy term depending on the change in solvent accessible area (ΔSA) and a ligand rotatable bond count. In further studies, we carried out a larger scale validation of this QM based scoring function for predicting binding affinity. We calculated interaction energies for a diverse range of protein–ligand complexes comprising of 165 noncovalent complexes and 49 metalloenzyme complexes.[45] For the 165 noncovalent complexes, the interaction energies, without any fitting, agreed with experimental binding affinity within 2.5 kcal/mol. When different parts of the scoring function were fit to experimental free energy of binding using regression methods, the agreement was within 2.0 kcal/mol. For metalloenzymes, the agreement with experiments without fitting was within 1.7 kcal/mol, and with fitting was within 1.4 kcal/mol. Overall, the correlation with experiment was satisfactory with R2 values of 0.69[46] and 0.55,[45] respectively. However, while the method performed reasonably it was not a “quantum” improvement over traditional methods Much further work has been done to explore the use of QM and QM/MM approaches to predict protein–ligand binding affinities[79,80] with the general conclusion that,while QM does help, it is not a panacea, at least in the current incarnation. Beyond simply having a better level of theory, many issues remain to be resolved to produce robust computed binding affinities including the inclusion of explicit water molecules[81] and incorporation of receptor flexibility. These effects plague both simpler and more sophisticated model Hamiltonians and is the subject of much on ongoing research.[82]

QM Based Structure Refinement

QM NMR Refinement of Protein/Ligand Complexes

NMR spectroscopy has proven itself to be a powerful and versatile tool for the study of protein–ligand interactions. The three-dimensional structures of protein–ligand complexes are determined by combining interproton distance restraints derived from nuclear Overhauser effect (NOE) with other restraints from J coupling constants, hydrogen bonds, and/or residual dipolar couplings. Since Fesik and co-workers introduced SAR (structure–activity relationship) by NMR,[83] many NMR-based screening methods have been developed to identify potential drug molecules in pharmaceutical research.[84−88] All these techniques take advantage of the fact that, upon ligand binding, significant perturbations can be observed in NMR parameters of either the receptor or the ligand. These perturbations can be utilized qualitatively to detect complex formation or quantitatively to measure the binding affinity. Structure of GPI. Among these NMR parameters, chemical shifts are exquisitely sensitive on the chemical environments of compounds. Therefore, theoretical calculation of chemical shift perturbations (CSP) upon ligand binding provides insights into protein–ligand interactions at the molecular level. There are two categories of computational approaches to calculate NMR chemical shifts: classical/empirical models and QM. The classical/empirical models[89−95] are parametrized to experimental data or DFT results. These approaches are fast so that they can be easily applied to proteins, but are less general being largely focused on protein chemical shifts. In this case, the generality of a QM based approach offers a significant advantage. We developed a fast and accurate approach[96] to calculate NMR chemical shifts using the D&C method at the semiempirical level of theory. Along with this approach, we have reported the automatic fragmentation QM/MM (AF-QM/MM) method that computes chemical shifts for large systems using ab initio methods.[54] This approach was first applied to the FKBP-GPI complex (see Figures 2 and 3).[48] By comparing calculated proton chemical shifts of the ligand to experimental data, it was possible to determine the best binding site pose. To further validate this approach, we generated several hundred poses of GPI using different docking programs and then scored them by calculating CSPs and then comparing them to experiment.[97] We have found that the deviation of the computed CSPs from experiment better differentiate decoy poses from native poses than scoring functions used in docking studies. This demonstrates that CSP based approaches can provide an accurate way in which to predict protein/ligand complex structure using in silico NMR approaches.

Figure 2

Structure of GPI.

Figure 3

By comparing CSP RMSDs the preferred pose for the FKBP-GPI complex can be determined. Left hand panel (a) shows 10 possible poses, while after analysis by CSP RMSDs between experimental and computed chemical shifts, one pose gives the best agreement between theory and experiment.

QM X-ray Refinement

We have also explored the use of QM based methods in X-ray refinement. We utilized D&C QM calculations to refine an entire protein,[98] and we have carried out several studies looking at the QM/MM refinement of small molecules.[59−62] One of the strengths of QM is the ability of these methods to provide high quality structural models of both the protein and the small molecule. Modern X-ray refinement of protein structures typically uses a pseudoenergy formalism where a chemical model (EChem) is combined with the X-ray signal (EX-ray) for which the gradients are obtained and the energy minimized.[99] In this formulation, the quality of the results depends on both the chemical model term and the quality of the experimental data. For proteins, highly parametrized expressions are available for the EChem term, but the quality of this term for small molecules is variable. However, QM models can treat both the protein and ligand equally well with no parametrization. Benzamidine with the amidine group indicated. As an example of how a QM (in this case a QM/MM refinement) can improve structure quality, I briefly describe our work on benzamidine ligands bound to a number of receptors.[55] In the case of benzamidines, the amidine group (see Figure 4) prefers to be out of the plane of the benzene ring by ∼ ±40°; however, many of the reported protein X-ray structures model this group as planar. Using QM/MM X-ray refinement approaches, we demonstrated that this moiety was indeed nonplanar and that the models employed in the earlier refinements did not accurately represent this functional group. Figure 5 shows an example of this for PDBID 1Y3X, where we show the result before and after QM/MM X-ray refinement. Moreover, the standard R, Rfree, and real-space R standard structure quality metrics all improved in the QM/MM X-ray refined structures.[55]

Figure 4

Benzamidine with the amidine group indicated.

Figure 5

Left-hand panel (a) shows the benzamidine derivative refined using standard approaches, while the right-hand panel (b) shows the results from QM/MM X-ray refinement. Note the out of plane twist of the amidine group.

The Future of QM in Biology

QM has already had a major impact on the study of biological systems primarily through its use to build sophisticated classical force field representations of biological macromolecules. Via QM/MM methods,[100] enzymatic catalysis can be routinely studied and this development was awarded the Nobel Prize in Chemistry in 2013. However, it is important to keep in mind that the computational study of biological systems at the molecular-level faces two daunting challenges: We must both accurately[101−106] calculate the energies and forces involved, but we must also sample all relevant states of a system. QM largely addresses the former, but how to extensively sample biological systems at the QM level of theory remains a challenging issue. Beyond the creation of faster and more accurate QM models, strategies to address the sampling issue will also have to be devised. This will likely be addressed via a combination of classical and QM models[82] and remains an active area of research.

70 in total

1. Biological water at the protein surface: dynamical solvation probed directly with femtosecond resolution.

Authors: Samir Kumar Pal; Jorge Peon; Ahmed H Zewail
Journal: Proc Natl Acad Sci U S A Date: 2002-02-12 Impact factor: 11.205

Review 2. Structure-based strategies for drug design and discovery.

Authors: I D Kuntz
Journal: Science Date: 1992-08-21 Impact factor: 47.728

3. Direct calculation of electron density in density-functional theory.

Authors:
Journal: Phys Rev Lett Date: 1991-03-18 Impact factor: 9.161

Review 4. Fragment-based error estimation in biomolecular modeling.

Authors: John C Faver; Kenneth M Merz
Journal: Drug Discov Today Date: 2013-08-27 Impact factor: 7.851

Review 5. Predicting protein druggability.

Authors: Philip J Hajduk; Jeffrey R Huth; Christin Tse
Journal: Drug Discov Today Date: 2005-12 Impact factor: 7.851

6. Accuracy of the three-body fragment molecular orbital method applied to Møller-Plesset perturbation theory.

Authors: Dmitri G Fedorov; Kazuya Ishimura; Toyokazu Ishida; Kazuo Kitaura; Peter Pulay; Shigeru Nagase
Journal: J Comput Chem Date: 2007-07-15 Impact factor: 3.376

7. Spatial localization of ligand binding sites from electron current density surfaces calculated from NMR chemical shift perturbations.

Authors: Mark A McCoy; Daniel F Wyss
Journal: J Am Chem Soc Date: 2002-10-02 Impact factor: 15.419

8. Extending the power of quantum chemistry to large systems with the fragment molecular orbital method.

Authors: Dmitri G Fedorov; Kazuo Kitaura
Journal: J Phys Chem A Date: 2007-05-19 Impact factor: 2.781

9. Extension of linear-scaling divide-and-conquer-based correlation method to coupled cluster theory with singles and doubles excitations.

Authors: Masato Kobayashi; Hiromi Nakai
Journal: J Chem Phys Date: 2008-07-28 Impact factor: 3.488

10. The effects of charge transfer on the aqueous solvation of ions.

Authors: Marielle Soniat; Steven W Rick
Journal: J Chem Phys Date: 2012-07-28 Impact factor: 3.488

14 in total

1. Metabolite Structure Assignment Using In Silico NMR Techniques.

Authors: Susanta Das; Arthur S Edison; Kenneth M Merz
Journal: Anal Chem Date: 2020-07-15 Impact factor: 6.986

2. Perspective: Quantum mechanical methods in biochemistry and biophysics.

Authors: Qiang Cui
Journal: J Chem Phys Date: 2016-10-14 Impact factor: 3.488

3. Enzymatic Kinetic Isotope Effects from Path-Integral Free Energy Perturbation Theory.

Authors: J Gao
Journal: Methods Enzymol Date: 2016-07-22 Impact factor: 1.600

4. Toward Closing the Gap: Quantum Mechanical Calculations and Experimentally Measured Chemical Shifts of a Microcrystalline Lectin.

Authors: Matthew Fritz; Caitlin M Quinn; Mingzhang Wang; Guangjin Hou; Xingyu Lu; Leonardus M I Koharudin; Tatyana Polenova; Angela M Gronenborn
Journal: J Phys Chem B Date: 2016-12-21 Impact factor: 2.991

5. Determination of accurate backbone chemical shift tensors in microcrystalline proteins by integrating MAS NMR and QM/MM.

Authors: Matthew Fritz; Caitlin M Quinn; Mingzhang Wang; Guangjin Hou; Xingyu Lu; Leonardus M I Koharudin; Jochem Struppe; David A Case; Tatyana Polenova; Angela M Gronenborn
Journal: Phys Chem Chem Phys Date: 2018-04-04 Impact factor: 3.676