Thermally stable and labile proteases are found in microorganisms. Protease mediates the cleavage of polyproteins in the virus replication and transcription process. 6 µs MD simulations were performed for monomer/dimer SARS CoV-2 main protease system in both SPC/E and mTIP3P water model to analyse the temperature-dependent behaviour of the protein. It is found that maximum conformational changes are observed at 348 K which is near the melting temperature. Network distribution of evolved conformations shows an increase in the number of communities with the rise in the temperature. The global conformation of the protein was found to be intact whereas a local conformational space evolved due to thermal fluctuations. The global conformational change in the free energy ΔΔG value for the monomer and the dimer between 278 K and 383 K is found to be 2.51 and 2.10 kJ/mol respectively. A detailed analysis was carried out on the effect of water on the temperature-dependent structural modifications of four binding pockets of SARS CoV-2 main protease namely, catalytic dyad, substrate-binding site, dimerization site and allosteric site. It is found that the water structure around the binding sites is altered with temperature. The water around the dimer sites is more ordered than the monomer sites regardless of the rise in temperature due to structural rigidity. The energy expense of binding the small molecules at substrate binding is less compared to the allosteric site. The water-water hydrogen bond lifetime is found to be more near the cavity of His41. Also, it is observed that mTIP3P water molecules have a similar effect to that of SPC/E water molecules on the main protease.
Thermally stable and labile proteases are found in microorganisms. Protease mediates the cleavage of polyproteins in the virus replication and transcription process. 6 µs MD simulations were performed for monomer/dimer SARS CoV-2 main protease system in both SPC/E and mTIP3P water model to analyse the temperature-dependent behaviour of the protein. It is found that maximum conformational changes are observed at 348 K which is near the melting temperature. Network distribution of evolved conformations shows an increase in the number of communities with the rise in the temperature. The global conformation of the protein was found to be intact whereas a local conformational space evolved due to thermal fluctuations. The global conformational change in the free energy ΔΔG value for the monomer and the dimer between 278 K and 383 K is found to be 2.51 and 2.10 kJ/mol respectively. A detailed analysis was carried out on the effect of water on the temperature-dependent structural modifications of four binding pockets of SARS CoV-2 main protease namely, catalytic dyad, substrate-binding site, dimerization site and allosteric site. It is found that the water structure around the binding sites is altered with temperature. The water around the dimer sites is more ordered than the monomer sites regardless of the rise in temperature due to structural rigidity. The energy expense of binding the small molecules at substrate binding is less compared to the allosteric site. The water-water hydrogen bond lifetime is found to be more near the cavity of His41. Also, it is observed that mTIP3P water molecules have a similar effect to that of SPC/E water molecules on the main protease.
Statistics based on environmental and epidemiological studies signify the temperature dependence of COVID-19 transmission across the globe. Experimental reports suggest the inactivation of the SARS-CoV-2 virus at higher temperatures while elevated infectivity rate at lower temperatures [1]. The temperature-dependent activity of corona virus at the molecular level is still unclear and can be studied effectively using computational tools. Efforts have been done by the research communities to develop efficient drugs and vaccines against SARS CoV-2 virus using in silico methods [2], [3], [4]. Generally in viruses, the protease enzyme is responsible for the viral protein maturation after the translation process in the host cell. Coronavirus main protease (MPro) mediates the cleavage of polyproteins in virus replication and transcription process [5]. The inhibition of protease activity reduces the mature virus particles in the host cell; therefore, the protease enzymes are considered important drug targets. The crystal structure reveals that the structure of protease consists of two β-barrel domains (domain I and domain II) and an α-helical bundle (domain III) (Fig. 1
). The substrate-binding site and the Cys-His catalytic dyad are found in the cleft between domain I and domain II. Domain III regulates the dimerization of protease essential for MPro activity [6], [7]. Dimerization involves the interaction of N-terminal residues of two protomers with the Glu166 of another protomer to hold the pocket of substrate binding site [8].
Fig. 1
Representation of SARS CoV-2 main protease (PDB ID: 6Y2E) showing Domain I (blue), Domain II (red) and Domain III (green). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Representation of SARS CoV-2 main protease (PDB ID: 6Y2E) showing Domain I (blue), Domain II (red) and Domain III (green). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)Proteins are sensitive and behave differently in lower and higher temperature conditions. The temperature effect can either alter or break the non-covalent interactions thereby causing changes in the three-dimensional folding of the protein chain. Both, thermally stable, as well as labile proteases, are found in microorganisms, plants and animals. The human coronaviruses including SARS-CoV-2 showed a higher average mean lifetime in 10–60 °C temperature range and decontamination above 70 °C [9]. Studies showed that the differences in an external environment like temperature disables the outermost structural proteins, spike protein, in the coronavirus to bind ACE2 (Angiotensin Converting Enzyme 2) [10]. It is reported that the SARS-CoV-2 RBD (Receptor Binding Domain) affinity decreases at a higher temperature above 40 °C from both experimental and simulation studies [10], [11], [12]. As for the main protease enzyme, crystallographic studies have been carried out in cryogenic conditions, which bias enzyme conformations. Experimental evidence suggests that the protease enzyme exists in a monomer-dimer equilibrium, in which the dimer form is active whereas the monomer form is inactive [13], [14], [15]. This equilibrium can be affected by temperature and small molecule inhibitors [16]. Theoretically, Chen et. al proposed that only a protomer is active in SARS 3C-like protease dimer using MD simulations [17]. Since SARS CoV-2 belongs to similar family and genus as SARS CoV, it is essential to understand the mechanism of protease action to develop efficient small molecule inhibitors. Many researches on drug discovery focussed more on SARS CoV-2 main protease as an excellent target [2], [18], [19], [20], [21]. So it will be interesting to study the dynamic behaviour of both monomer and dimer forms as well as the temperature/ water induced changes in the system. Recently, Kordzadeh et. al showed the temperature and pH-dependent conformational fluctuations on whole SARS CoV-2 main protease using molecular dynamics simulations [22]. This study ignores the structural and dynamic changes of protein and water structure at various binding sites of protease. Even though high-resolution crystal structure of SARS-CoV-2 MPro was recorded [23], the molecular level mechanism and the solvent effect on MPro folding, stability and biological activity have not been reported for the wide temperature range.In this work, we study the temperature effect on the structural modifications of SARS CoV-2 main protease, both, monomer and dimer using atomistic molecular dynamic simulations at four different temperatures. This allows us to understand the impact of temperature on the active dimer and inactive monomer main protease. Specifically, the role of amino acid residues in substrate binding pocket, catalytic dyad, dimerization site and allosteric sites in monomer and dimerized protease were determined using various geometric criteria of non-bonded interactions. The dynamic nature of water molecules at the interface of ligand binding sites is crucial for maintaining the activity of the protein. The interfacial water molecules may favour or disfavour the energetics of binding sites which affect the binding of chemical moieties [24], [25], [26]. Generally, this fact is overlooked while studying the degradation process of protein. Here, the effect of water molecules was studied using radial distribution functions, orientational tetrahedral order parameter, free energy ΔG(r) and hydrogen bond dynamics between key residues at the above-mentioned sites and water molecules. Dihedral analysis at these sites determines the transition between the secondary structure elements in the protein [27]. Principal Component Analysis (PCA) characterizes the essential dynamics which govern the amino acid residue fluctuations at the pocket due to temperature change. Further, the structural evolution and the stability of the whole monomer and dimer during simulation are interpreted using network analysis based on the root mean square deviation for the Cα atom in the protein. The global free energy of folding was determined based on the changes in the percentage of α-helices and β-sheets for each temperature. Additionally, the results of simulations were compared using two water models: SPC/E model and modified TIP3P model.
Materials and methods
The model of SARS COV-2 main protease (33.83 kDa) was retrieved from Protein Data Bank (PDB ID: 6Y2E, resolution: 1.75 Å) [6]. The enzyme model is devoid of small molecules and consists of 306 amino acid residues. The protein structure was solvated in a cubic box with distance between box boundary and protein structure as 1.0 nm in all directions using SPC/E water model [28] and the system is neutralized with 4 Na+ ions. The energy minimization of the system was done by the steepest descent algorithm with a maximum of 50,000 steps until a convergence tolerance of 1000 kJmol−1nm−1. All the molecular dynamics simulations were performed by GROMACS 2018.4 [29] using CHARMM27 force field for proteins [30], [31]. A 10 ns NVT equilibration was performed using velocity-rescale temperature coupling method [32] with a time step of 2 fs. Next, a 10 ns NPT equilibration was carried out with a time step of 2 fs at 1 atm using Nose-Hoover thermostat [33] and Parrinello-Rahman pressure coupling scheme [34] respectively. The long-range electrostatic interactions were calculated by particle mesh Ewald method [35] with a cut-off 1.2 nm and Fourier spacing of 0.16 nm. The short-range van der Waals cut-off was fixed to 1.2 nm. LINCS constraints [36] were used to restrain the bond involving hydrogen atoms. The structural and dynamic behaviour of the protein were analyzed for a wide temperature range, from 278 K to 383 K. Finally, a production run of 200 ns was performed until the convergence of RMSD at lower temperature. The trajectories were saved at every 10 ps and GROMACS tools were used for analysis. The simulations were replicated twice (a total of 8 monomer simulations) to check the reproducibility of the results and error bars are given wherever necessary. The model retrieved from PDB is the monomer form of protease. However, the global stoichiometry of the protease is homodimer [6], [37]. Therefore, for the dimer simulations, protein-protein docking was carried out using HDOCK server [38]. The structure retrieved from PDB (ID: 6Y2E) is considered as the target and the average structure obtained from the MD simulation of monomer at each temperature is considered as the substrate for protein-protein docking. The docking energy score obtained between the ligand and the target protease is −555.86 kcal/mol. The geometry of dimer obtained from docking is chosen as the initial configuration for the dimer simulations. We have done a total of 8 dimer simulations. Further, a set of 200 ns simulations (4 simulations each for monomer and dimer) were performed using modified TIP3P water model (m-TIP3P) to determine the effect of water model on the temperature dependent behaviour of main protease. Additional, five 200 ns simulations each for 288 K, 298 K, 308 K, 318 K, 338 K and 358 K were performed for monomer protease to determine the entropy and enthalpy contributions to the free energy. A total of 6 µs simulations were performed.
Results and discussion
Structural changes in protein conformation
Temperature effect on MPro enzyme
The temperature-dependent stability of monomer and dimer were studied based on structural parameters and solvent effect. 200 ns molecular dynamics simulations were performed using SPC/E and m-TIP3P water model at 4 different temperatures, 278 K (cold denaturation), 310 K (physiological), 348 K (higher temperature reported for enzyme activity [39]) and 383 K (extreme thermophilic conditions [40]), to obtain the insights into the conformational evolution and stability of SARS CoV-2 main protease enzyme. The dimer is composed of two monomer proteases, which are denoted as chain A and chain B. Both the chains are showing similar results (SI, Section: Dimer results). Therefore, we selected one chain for comparing the structure and dynamics of the main protease dimer form with the monomer. For the dimer, the monomer-monomer interaction energy is calculated between the two chains of the dimer. At 310 K, the interaction energy is found to be stable than other temperatures.The global structure of SARS CoV-2 main protease protomer and dimer is found to be similar despite the increase in temperature and difference in the water model as shown in Figure S1. Per residue secondary structure analysis of protease using do_dssp tool [41] at different temperatures shows no considerable changes in the average occupancy of total secondary structure components in the protein (Table S1). The percentage of secondary structure element obtained by do_dssp tool can be correlated with the calculated percentage from X-Ray diffraction technique (α-helix: 0.31%, β-sheet: 0.27%, unstructured region: 0.42%). However, it can be observed that β-bridges and α-helices are formed with the increase in temperature whereas native α-helices are destroyed in the structure. The secondary structure element plots reveal the degradation of the enzyme with the rise in temperature which confirms the change of the MPro conformational space (Figure S2). The free energy of folding for monomer and dimer-chain A were determined by the changes in α-helix and β-sheet elements.where FN is the fraction of protein folding [42]. The fraction of folding (FN) is determined by the fraction of residues with α-helix, β-sheet elements and turns present in the protein as a function of time. It is found that the monomer and the dimer-chain A conformations at 310 K (human body temperature) are more stable than other temperatures with ΔGfold values of −3.20 ± 0.25 and −3.43 ± 0.13 kJ/mol (Figure S3). The probability distribution of Cα-atom RMSD for both monomer and chain A of the dimer (dimer-chain A) in SPC/E water model is shown in Fig. 2
(a) and the corresponding time profile of Cα RMSD is given in Figure S4. It is found that at 278 K and 310 K, the protein has a single probable structure for both dimer-chain A and monomer. At 348 K, the conformation of protease gradually evolves during simulation due to thermal fluctuations. The system of monomer has well equilibrated within 25 ns of the simulation time with a value of 0.187 ± 0.02 and 0.200 ± 0.02 nm at 278 K and 310 K respectively. At temperatures 348 K and 383 K, marginally higher RMSD values are observed (Figure S4). The dimer-chain A has achieved the rigidity after dimerization is evident from the fewer RMSD fluctuations with a value of 0.131 ± 0.01 and 0.137 ± 0.01 nm at 278 K and 310 K respectively. It can be observed that the rigidity is lost with the rise in temperature as seen in 348 K and 383 K, which is evident from the tail region. The compactness of the protease monomer decreases with rising in temperature due to the thermal fluctuations at the α-helical region of the protein [43]. At 383 K, due to random motion of the protein molecules, frequent contacts are made and broken at different stages which are not stable resulting to broader distribution. Similar trend is observed for dimer-chain A (Fig. 2
(b)). However it is observed that the monomer is more compact than the dimer at 383 K. Further, this observation is supported by the correlation of inter-residue distance (Figure S5). The correlation map shows the contact formation (orange region) and contact rupture (blue region) between amino acid residues. At 348 K, the non-bonded interactions were broken compared to the lower temperature. At 383 K, contact formation is more than the contact rupture [44]. It is evident from the RMSD and the rGyr values that the dimer-chain A has fewer fluctuations than the single-chain protease at lower temperatures. At 348 K and 383 K, both the systems behave differently compared to lower temperature. SASA provides the surface area of protein exposed to the solvent molecule. A change in SASA value indicates a change in the tertiary conformations of the protein (Fig. 2
(c)). The main protease has the lowest SASA value at 278 K and 310 K indicates that the residues are less exposed to the environment. This also suggests that the protein undergo considerable thermal degradations at 348 K and 383 K in comparison to lower temperature. The SASA value for monomer is higher at 348 K suggesting higher solvation around the protein due to the degradation of protein. This anomaly is observed because 348 K is the melting temperature for the main protease. The dimer-Chain A showed a similar SASA value as the monomer at 278 K, 310 K and 383 K which suggests that the interaction of two monomers does not affect the solvent accessibility of monomer units. At 348 K, the single peak SASA value for dimer-chain A is observed which is quite different from the monomeric form. From secondary structure analysis, it is found that at 348 K, monomer undergoes thermal fluctuations leading to the degradation of α-helix and β-sheet at domain I and domain II respectively (Figure S6). However in case of dimer, the interaction of two protease chains protects the secondary structure at domain II and domain III. Therefore, at 348 K, the solvent exposure is less in dimer as compared to monomer unit. Among the dimer, there is an increase in the SASA value with rise in temperature. Hydrogen bond interactions between protein-protein and protein-water are considered an important factor for the stabilization and activity of protease enzyme. The number of hydrogen bonds was calculated using gmx_hbond tool. Fig. 2
(d) shows the average number of hydrogen bonds formed between protein/protein and protein/water. It is found that there is no significant decrease in number of intra-protein hydrogen bonds when the temperature rises from 288 K to 310 K while there is a significant reduction in the average number of hydrogen bond when the temperature rises to 348 K and 383 K. Both monomer and dimer-chain A show a decrease in the number of hydrogen bonds with temperature change. Similarly, protein-water hydrogen bonding showed a gradual decrease with the temperature rise. The number of water molecules decreases with the temperature inside the binding cavity which suggests the expansion of protein structure in both monomer and dimer-chain A. However, at higher temperatures, a tail region is observed as the water molecules can easily move in and out due to the cavity expansion. The number of water molecules in the substrate-binding site is shown in Fig. 2
(e). The isotropic temperature factor (B-factor) was calculated from the root mean square fluctuation (RMSF) values of the protease backbone to measure the mobility of each residue from the initial position during the simulation. The B-factor is directly related to RMSF by the following formula [45].
Fig. 2
Probability distribution of (a) Cα-RMSD (b) radius of gyration of protein (c) SASA (d) Number of MPro-MPro/ MPro-water hydrogen bonds of protein (e) number of water molecules in the substrate binding site during 200 ns simulation and (f) Calculated B-factor of Cα atoms with amino acid residues for MPro. The solid and dotted lines represent monomer and dimer-Chain A respectively in SPC/E water model.
Probability distribution of (a) Cα-RMSD (b) radius of gyration of protein (c) SASA (d) Number of MPro-MPro/ MPro-water hydrogen bonds of protein (e) number of water molecules in the substrate binding site during 200 ns simulation and (f) Calculated B-factor of Cα atoms with amino acid residues for MPro. The solid and dotted lines represent monomer and dimer-Chain A respectively in SPC/E water model.Apart from the two terminal ends of the protein, the highest fluctuations are observed for amino acid residues in the range Arg40- Glu55 and Asn180- Ile200 at higher temperatures (Fig. 2
(f)). It is observed that from 348 K to 383 K, the amino acid residues from Lys100- Phe150 show higher fluctuations compared to the lower temperature. It is believed that residues in the regions Arg40- Glu55 and Lys100- Phe150 contribute to the catalytic activity of the protease [46]. At higher temperatures, the fluctuations are found to be higher in the above-mentioned regions for single chain and dimerized main protease.
Temperature effect on various binding sites
Even though the structural analyses such as RMSD, rGyr, RMSF and SASA give information about the global changes in the protein structure (Figure S4, S7-S9), it may ignore the temperature-induced conformational changes at the ligand-binding sites and dimerization site. The structural analyses were performed selectively for important amino acid residues as given in Table 1
. The catalytic dyad and the substrate-binding site are located between the cleft of domain I and domain II while the dimerization site and allosteric site is regulated mostly by domain III [7]. Domain I consists of 10–99 residues which can be regarded as (α + β) regions, domain II consists of 100–182 residues which are β-sheet regions and domain III consists of 198–303 residues which are α-helix regions.
Table 1
Important amino acid residues taken for analysis and their biological function.
Important amino acid residues taken for analysis and their biological function.These structural properties for the amino acid residues present at substrate-binding site, dimerization site and allosteric site were determined for different temperatures rather than the whole protein for a better understanding of conformational changes occurring at the binding pockets. The basic structural analyses of binding site amino acid residues are shown in Fig. 3
. The Cα RMSD profile of amino acid residues present at various binding sites with respect to the domain (I, II, III) movements in monomer suggests different dynamics of protease binding sites at lower and higher temperatures. The higher RMSD values at 348 K and 383 K indicate the flexible nature of all the three domains in the protein as well as the binding site residues. Domain I and domain III motions are more flexible compared to domain II due to the α-helix regions (Figure S10). The highly flexible nature of domain III at higher temperature may have implications on the dimerization of protease which is crucial for its biological function. Similar behaviour is observed in dimer-chain A for lower temperatures. Unlike the monomer, the single chain in the dimer showed a stable movement at 348 K. At 383 K, the chain became unstable as expected. The rGyr values indicate that with temperature rise the compactness of binding site residues decreases. In case of monomer form, the compactness of binding site residue decreases with rise in temperature except 310 K. The rGyr value at 310 K is due to the stable global structure of protease as evident from ΔGfold value. In dimer form, the average value of rGyr at each temperature is almost similar, while the value of tail region is increasing which suggests the loss of rigidity with rise in temperature. This is because most of the residues lie in the α-helical region which is more flexible than the β-sheet element. An increasing trend is observed for the solvent-exposed surface area of the protease single chain and dimer. Exception at 348 K for SASA value is due to the presence of more coiled structure.
Fig. 3
Probability distribution of (a) Cα-RMSD of binding site residues with respect to D-I, D-II, D-III of protein (b) radius of gyration of binding site residues (c) SASA of binding site residues. The binding site residues are considered as per Table 1. The solid and dotted lines represent monomer and dimer-Chain A respectively in SPC/E water model.
Probability distribution of (a) Cα-RMSD of binding site residues with respect to D-I, D-II, D-III of protein (b) radius of gyration of binding site residues (c) SASA of binding site residues. The binding site residues are considered as per Table 1. The solid and dotted lines represent monomer and dimer-Chain A respectively in SPC/E water model.
Conformational dynamics of the protein
Further, we evaluated the change in the local structures in the conformational space due to the thermal fluctuations by network analysis using RMSD as the criterion. Around 60–80 prominent structures which cover the major conformations of protein were extracted from 200 ns trajectory for each temperature using gromos clustering method [50]. Next, to find out the similarity/connections between two conformations, we calculated the RMSD between these conformations. Two conformations were considered to be connected if the RMSD value between them is less than 2.0 Å [51]. Since we compared the change in the conformation for all the temperatures, we preferred RMSD criteria 2 Å where we get a single prominent conformation for 278 K. Conformations obtained from other temperatures are compared with respect to 278 K. The details of the network distribution can be found in our previous work [52]. The network distribution for monomer and dimer-A is given in Fig. 4
with communities represented in different colours. It can be seen that only a single community is present at 278 K and are densely connected to each other. At 310 K, two communities are mainly found where the violet community corresponds to evolved conformations and is densely connected. At higher temperatures, many communities with fewer connections are observed which explains the conformational evolution due to thermal fluctuations. This trend is further confirmed by RMSD based PMF (potential of mean force) calculation [53] which is given by the formula
Fig. 4
Network distributions of most prominent conformations of main protease from 200 ns trajectory at different temperatures in SPC/E water model. PMF profile for SARS CoV-2 main protease as a function of RMSD (Right) (a) monomer (b) dimer-Chain A.
Network distributions of most prominent conformations of main protease from 200 ns trajectory at different temperatures in SPC/E water model. PMF profile for SARS CoV-2 main protease as a function of RMSD (Right) (a) monomer (b) dimer-Chain A.where kB is the Boltzmann constant (8.3145 × 10−3 kJ/mol) and P(rmsd) is the probability of conformations with particular RMSD. This examines the energy change of the evolved conformations as a function of RMSD. PMF profiles of the monomer and dimer-Chain A provides the information about 1-D free energy landscape which depicts the most stable conformation at a particular RMSD value (Fig. 4
). It can be seen that at 278 K and 310 K, the monomer has similar free energy minima around −7.5- −7.7 kJ/mol with different RMSD range at 0.187 and 0.205 nm respectively. For dimer-chain A, a single minimum in the range of −8.4- −9.0 kJ/mol with RMSD 0.13 and 0.14 nm is observed at 278 K and 310 K respectively. With the increase in the temperature, the free energy curve becomes shallow having multiple global minima. This explains the presence of structures with different energy values. These results correlate well with the network distribution results. The above results suggest that main protease possesses unique structural and dynamic properties at lower and higher temperatures. The energy levels for the monomer are found to be much shallower than the dimer at higher temperatures. The unique structural and dynamic properties observed at different temperatures are further analyzed by PCA analysis and deviation in dihedrals angles considering the various binding sites (Fig. 5
).
Fig. 5
Plot of eigenvalue vs first 25 eigenvector index derived from PCA over 200 ns MD trajectory for (a) MPro monomer and (b) MPro dimer-Chain A. 2-D projection of first two principal motions (PC1 and PC2) of main protease (Right).
Plot of eigenvalue vs first 25 eigenvector index derived from PCA over 200 ns MD trajectory for (a) MPro monomer and (b) MPro dimer-Chain A. 2-D projection of first two principal motions (PC1 and PC2) of main protease (Right).
Principal component analysis
PCA is a statistical method to describe the dynamics, which reduces the degrees of freedom of a molecule through decomposition process that spans from largest to smallest observed motions. PCA is based on the correlated fluctuations inside a protein and provides orthogonal eigenvectors and corresponding eigenvalues. The eigenvectors indicate the directions of concerted motions whereas the eigenvalues represent the magnitude along the direction of motion. PCA was performed for backbone Cα atoms of key amino acid residues in both the monomer and dimer-Chain A to characterize the essential dynamics which govern the conformational changes throughout 200 ns simulation (Fig. 5). The first eigenvectors correspond to all the possible motions induced by the temperature. The eigenvalue plot indicates the highly flexible nature of the binding site residues. For better representation, the first two eigenvectors are plotted in 2D projection plot for both monomer and dimer enzyme binding pocket. In the case of the monomer, the scatter plot spread over the phase space more than the dimer. Also, it is observed that at 278 K and 310 K, the conformational changes are not very different but at 348 K and 383 K, the plots deviated to a great extent indicating the temperature-induced flexibility in the protein. The binding site of dimer-Chain A is less flexible due to the structural stability achieved during the dimerization. The PCA analysis shows the flexibility of amino acid residues at the binding pocket of the monomer, agreeing with the structural analysis. Even though the dimer-Chain A is less flexible compared to the monomer, it is found that the flexibility of the binding site increased with temperature rise. The structural flexibility observed at different temperatures is characterized by deviation in dihedrals angles considering the various binding sites (Fig. 6
).
Fig. 6
Dihedral angle (φ, ψ) transitions of amino acid residues at various sites of monomer during the course of simulations (a) Catalytic dyad (b) Substrate binding site (c) Dimerization site (d) Allosteric site. The colour scale represents the simulation time in ns.
Dihedral angle (φ, ψ) transitions of amino acid residues at various sites of monomer during the course of simulations (a) Catalytic dyad (b) Substrate binding site (c) Dimerization site (d) Allosteric site. The colour scale represents the simulation time in ns.
Scatter plot of dihedral angles at various binding sites
The changes in the backbone of protein residues can be characterized with the help of dihedral angles (φ, ψ). The deviations in φ, ψ angles provide insight into the changes in the secondary structure elements due to the temperature-dependent fluctuations at these binding sites. The time evolution of dihedral angle deviations for the key residues at four different binding sites is presented in Fig. 6. Each dot in the scatter plots shows the combination of φ, ψ angles of residues and the colour indicates the time scale at which the current combination is present. It can be seen that broader distribution of φ, ψ angles are observed at 348 K and 383 K except for amino acid residues at the catalytic dyad. At lower temperatures, the catalytic dyad is found to be intact with fewer fluctuations providing better stability and biological activity to the protein. Even though deviations are observed for 348 K and 383 K, the residues showed a tendency to revert to the initial conformation during simulation. At 278 K, the residues at the substrate-binding site, dimerization site and allosteric site tend to remain in the initial structure during the simulation. With temperature rise, it can be seen that there are more deviations in φ, ψ angles which might be due to the changes in the local and non-local interactions. This includes the making and breaking of non-covalent interactions such as hydrogen bonds, salt bridges, van der Waals interactions and aromatic interactions. This also suggests the local structural changes at the secondary structure elements of protein binding sites which might get neglected in global conformational analysis. The results for dimer-chain A show a similar trend as monomer (Figure S11). Therefore, it became more important to see the water role at the binding site, as there are many deviations at the protein binding sites with rise in temperature.
Role of water at binding sites
The structure and dynamics properties of water molecules were calculated to understand the role of water in the previously mentioned binding sites as given in (Table 1
). The amino acid residues, His41, Cys117, Glu166 and Arg298 were selected from the catalytic dyad, dimerization site, substrate-binding site and allosteric site respectively to analyze the role of water. These residues are selected based on their interaction with the organic moieties/inhibitors at four different binding sites.
Radial distribution function
The radial distribution functions (RDF) between the pair of atoms are used to analyse the structural properties of the system. For this, we have plotted the RDF between oxygen atoms and hydrogen atom of water molecules gow-ow(r), gHw-Ow(r) (Fig. 7
) and gCα-Ow pair correlation functions of His41, Cys117, Glu166 and Arg298 amino acids (Fig. 8
) at four different temperatures. It can be seen that there are three distinct peaks in Fig. 7
(a). The first peak is the highest and sharp, which is found at the separation of about 0.345 nm from centred atom’s position. The second and third peaks are relatively shorter and wider, which are located approximately at positions 0.57 nm and 0.68 nm, respectively. It can be seen that the peak height of the RDFs decreases with the increase in temperature and higher solvation shell peaks are not well defined. Therefore, it can be concluded that there is a change in the hydration shell of the water molecules with the rise in temperature which becomes less compact.
Fig. 7
RDF of (a) Oxygen–Oxygen and (b) Hydrogen-Oxygen of water molecules at different temperatures.
Fig. 8
Radial distribution function of Cα-Ow of key amino acid residues at various binding site of protease at different temperatures in SPC/E water model. Solid line and dash line represent monomer and dimer-Chain A respectively. (a) His41 (b) Cys117 (c) Glu166 (d) Arg298.
RDF of (a) Oxygen–Oxygen and (b) Hydrogen-Oxygen of water molecules at different temperatures.Radial distribution function of Cα-Ow of key amino acid residues at various binding site of protease at different temperatures in SPC/E water model. Solid line and dash line represent monomer and dimer-Chain A respectively. (a) His41 (b) Cys117 (c) Glu166 (d) Arg298.Further, to study the detailed overview of the arrangement of water molecules around Cα atom of the amino acids mentioned at the binding sites (Table 1), we plotted the Cα-Ow RDF around His41, Cys117, Glu166 and Arg298. Therefore, we calculated the Cα-Ow RDF of four selected amino acid residues at different temperatures in Fig. 8. Here also, it is found that the solvation of the water molecules becomes broader with the increase in the temperature. In the case of the monomer, the water molecules are more altered around Cα atom of His41 at lower temperatures and showing the sharp peak but at higher temperatures the peak is broad. It is observed that Cys117 have fewer neighbouring water molecules while Glu166 showed a narrow and sharp peak (Fig. 8
(c)) which suggests that the water molecules present here are more ordered around the binding site at lower temperature. In the monomer, the Cα-Ow RDF of Arg298 indicates the low number of water molecules is present in the first solvation shell, thereby less interaction with the water. In the case of the dimer, the Cα-Ow RDF of all selected amino acids has a narrow and sharp peak which indicates the homogeneity of the water molecule compared to the monomer. Further, it would be interesting to check the tetrahedral arrangement of water molecules around the Cα atom of key amino acid residues in the solvation shell by orientational order parameter which is discussed in a later section.
Number of hydrogen-bonded water molecules
Further, the distorted structure of water molecules near the interface of His41, Cys117, Glu166 and Arg298 residues was confirmed by calculating the fraction (fn) of oxygen atoms of water molecules that contains n number of water-water hydrogen bonds. The hydrogen bond distance criteria between oxygen-oxygen are considered as 3.25 Å, to incorporate the flexible hydrogen bonds [26]. A cut-off value of 6.0 Å was selected to define the interfacial water molecules present around Cα of selected amino acid residues of protease based on the first solvation shell in the RDF. The distance criteria will select the neighbouring water molecules with in this cut-off value obtained from RDF.In all the cases, it has been observed that the probability of lower coordinated (mono-coordinated) water molecules increases at higher temperatures (Fig. 9
). At the lower temperature, the fraction of higher coordinated water molecules is higher as compared to 348 K and 383 K. At the higher temperature the probability of higher coordinated water molecules is less. Further, we have calculated the tetrahedral order parameter to see the orientation of the water molecules around the binding site residues (Fig. 10
).
Fig. 9
Fraction of water molecules having n number of hydrogen bonds within a distance of 6.0 Å from Cα of amino acid residues of (a) MPro monomer and (b) dimer-Chain A for SPC/E water model at different temperatures.
Fig. 10
Orientational tetrahedral order parameter of water molecules within 6.0 Å from Cα of key amino acid residues at various binding site of monomer (solid line) and dimer-Chain A (dash line) in SPC/E water model at different temperatures.
Fraction of water molecules having n number of hydrogen bonds within a distance of 6.0 Å from Cα of amino acid residues of (a) MPro monomer and (b) dimer-Chain A for SPC/E water model at different temperatures.Orientational tetrahedral order parameter of water molecules within 6.0 Å from Cα of key amino acid residues at various binding site of monomer (solid line) and dimer-Chain A (dash line) in SPC/E water model at different temperatures.
Orientational tetrahedral order parameter
The structure of water molecules near the protein surface can be characterized based on the tetrahedral order parameter. The orientation order parameter is defined as the normalized sum of the squares of the differences between the cosines of the inter-bond angles and the cosine of the angles that would have been made if the bonds are arranged tetrahedrally [54]. Since, all the angles between bonds are the same in a tetrahedral arrangement,where θj, k is the angle between the central oxygen atom and the j and k bonds. The factor 3/32 normalizes the S values to the range 0 < S < 1. The squaring ensures the contribution from each inter-bond angle is always greater than or equal to zero. If all the angles are arranged in perfect tetrahedral order, the value of S will be zero and 1 for extreme non-tetrahedral cases. If the arrangement is random, then S has a value 0.25.The probability distribution of Sg is plotted for the water molecules which are present at a cut-off of 6.0 Å from Cα atoms of His41, Cys117, Glu166 and Arg298 residues selected from various ligand binding sites (Fig. 10). It can be seen that the peaks corresponding to the monomer are broader compared to the dimer, which suggests more heterogeneity of the water molecules in the monomer compared to dimer-chain A. The sharp and narrow distribution of Sg for dimer-Chain A indicates less heterogeneity of the water molecules. This homogeneous distribution of water molecules in dimer systems are observed irrespective of the temperature. In the case of monomer, a tail region towards the higher Sg value is observed indicating that the water molecules have a more distorted tetrahedral arrangement. Hence, the water structure in the first solvation shell around the binding site residues of the monomer has a more disrupted and distorted tetrahedral structure with a lower density as compared to dimer-Chain A. Further, with the increment of the temperature this tail region decreases both for the dimer and the monomer which suggests that the water molecules are randomly and homogenously distributed at the interface of the protein at higher temperature as compared to the lower temperature. This can be explained based on the interstitial water molecules present in the cavity formed from the binding site residues. The interstitial water molecules can lead to non-tetrahedral arrangement of water molecules called high density water [55]. With the rise in temperature, the interstitial water molecules moves out of the cavity due to thermal expansion, which may leads to more tetrahedral like water at higher temperatures. Next, we focused on the thermodynamic properties in terms of free energy, entropy and enthalpy of the water molecule with the binding site amino acid residues.
Free energy vs. temperature
The hydration shell formed by the water molecules around the Cα of binding site amino acids residues at 288 K, 308 K and 348 K is shown in Figure S12
and Table S2. The hydrogen-bonding network between the water molecules is perturbed due to the interaction of water molecules with these amino acids residues with the rise in temperatures which in turn changes the free energy. The free energy, ΔG(r), between these residues and water was determined with the pair correlation function g(r) given by equationwhere r is the inter-atomic distance, kB is the Boltzmann constant and T is the temperature. Additional 5 MD simulations were performed to calculate the entropy-enthalpy contributions as shown in Table S3. The thermodynamic contributions to the free energy have been calculated from the finite difference temperature derivative of the PMF at each temperature from the g(r). The RDFs for Cα-Ow pairs for additional temperatures (298 K, 318 K, 338 K and 358 K) are given in Figure S13. The entropy contributions, -ΔS(r) in free energy profile for 288 K, 308 K and 348 K is calculated as followsThe enthalpy contribution, ΔH(r) is computed using equation (7)
[56], [57].The entropy and enthalpy contributions to the free energy for each binding site residue are given in Figure S14. The free energy of Cα-Ow pairs calculated as a function of pair distance for monomer residues at different temperatures are presented in Fig. 11
. With the rise in temperature, the free energy of the binding pocket increases and become more positive as compared to the lower temperature. This can be explained based on the number of hydrogen bonded water molecules present inside the cavity formed by the binding sites. With the rise in temperature, there is presence of more fractions of broken hydrogen bonds in water molecules (Fig. 9) and the water-water hydrogen bond strength also decreases. Therefore, the protein-solvent interaction becomes weaker resulting in a positive free energy. It is found that the free energy of the allosteric site and substrate binding site is lower at 288 K and higher at 348 K. The ΔΔG value for the water interaction is 0.3 and 0.55 kJ/mol for the substrate-binding site and allosteric site, respectively, between 288 K and 348 K. The low ΔΔG value (0.3 kJ/mol) indicates that the water can be easily replaced at the substrate-binding site by small organic molecules. The higher value of ΔΔG (0.55 kJ/mol) for the allosteric site indicates the more stability of the amino acid residues due to the water molecules. It may be concluded that the energy expense of binding the small molecules at substrate binding is less compared to the allosteric site. Even though the dimerization site residues show an increase in free energy at higher temperatures, ΔG values at 288 K and 308 K show a similar trend. At 348 K higher value for free energy can be attributed due to the thermal fluctuations of protein and the free energy differences between the 288 K and 348 K are found to be ΔΔG = 0.51 kJ/mol. In the catalytic dyad, the free energy is almost similar at all temperatures. It is observed that there is the presence of stable conformation of the protein at lower temperatures which are unstable at the higher temperature. It can be found that the interaction of free energy is governed by the arrangement of water molecules and the fraction of hydrogen-bonded water molecules around the Cα atom of key amino acid residues in the solvation shell.
Fig. 11
Free energy function of Cα-Ow of amino acid residues at various binding site of protease at different temperatures for monomer protease in SPC/E water model. (a) Catalytic dyad (b) Substrate binding site (c) Dimerization site (d) Allosteric site.
Free energy function of Cα-Ow of amino acid residues at various binding site of protease at different temperatures for monomer protease in SPC/E water model. (a) Catalytic dyad (b) Substrate binding site (c) Dimerization site (d) Allosteric site.
Hydrogen bond dynamics
The hydrogen bond population variables h(t) and H(t) were defined to calculate the hydrogen bond dynamics. h(t) = 1, when a particular pair of water molecules are hydrogen bonded at time t according to the definition of hydrogen bond. If two molecules remain continuously hydrogen bonded from t = 0 to t = t, then the function H(t) = 1 or it is 0 otherwise. The two molecules are considered to be hydrogen-bonded if the interatomic distance between hydrogen of donor and acceptor atom (H---O) is < 2.5 Å. Also, the HO-O distance is < 3.5 Å [58] and the OH---O angle criterion is considered as 130° [59]. The continuous hydrogen bond autocorrelation functions SHB(t) are defined as [60], [61],where 〈…〉 describes an average over all the pairs of a given type. The function describes the probability of hydrogen bond pair at t = 0 continuously bonded up to time t. The hydrogen bond lifetime of water molecule around His41 and Glu166 residues is comparatively higher as compared to Cys117 and Arg298 residues for MPro monomer at 278 K and 310 K. The lifetime is even more at cold denaturation temperature, 278 K than 310 K. The His41 and Glu166 in catalytic dyad and substrate binding site are buried between the cleft of domain I and domain II of the protease. This forms a cavity that holds the water molecules firmly leading to strong hydrogen bond interaction between water molecules. At the high temperatures 348 K and 383 K the hydrogen bond lifetime decreases around the residues of the binding site. The higher hydrogen bond lifetime of water molecules indicates that the binding sites in the protease are stable. This is supported by the observation that the number of water molecules decreases with a rise in temperature (Fig. 2
(e)) due to the expansion of binding sites. The hydrogen bond lifetime in ps of water molecules around the amino acids has been shown in Table 2
. In the case of the dimer, a similar trend for water-water hydrogen bond lifetime is observed with a lower value due to its protein structural rigidity. The two monomers interact in such a way that the binding sites are buried inside the homodimer. The number of water molecules residing in the binding sites of dimer is lesser than the monomer. Therefore, the water-water hydrogen bond lifetime is lower in case of dimer as compared to the monomer, even though the trend is similar. Thus, it can be concluded that the hydrogen-bonded water molecules stabilize the binding sites of the protein.
Table 2
Hydrogen bond lifetime (ps) of water molecules around the key amino acid residues at the various sites of main protease in SPC/E water model.
Temperature
Monomer
Dimer
His41
Cys117
Glu166
Arg298
His41
Cys117
Glu166
Arg298
278 K
4.67
2.21
3.07
2.33
0.92
0.78
0.81
0.98
310 K
1.29
1.24
1.83
1.22
0.60
0.56
0.59
0.54
348 K
0.85
0.72
0.74
0.76
0.40
0.39
0.42
0.39
383 K
0.57
0.51
0.52
0.60
0.32
0.30
0.31
0.30
Hydrogen bond lifetime (ps) of water molecules around the key amino acid residues at the various sites of main protease in SPC/E water model.
Structure and dynamics of protease in mTIP3P water model
The temperature-dependent stability of monomer and dimer were studied based on structural parameters and solvent effect using mTIP3P water model. It is observed that the global conformation of the main protease monomer and dimer remains similar despite of the rise in temperature (Figure S1). The local structural evolutions are found due to certain changes in the secondary structure elements in the protease. Unlike SPC/E model, structural analyses like Cα-RMSD, rGyr and SASA suggest that the protease monomer undergoes thermal fluctuations starting from 310 K. The dimer-Chain A is found to be more rigid with fewer conformational evolutions. Like SPC/E water model, the solvent exposure is higher in the monomer than the dimer with the rise in temperature as expected. In the case of monomer and dimer, the water molecules present in the first solvation shell of Arg298 is less as compared to other binding site residues in the mTIP3P water model which shows a similar trend to SPC/E water model. Like SPC/E water model, it has been observed that the probability of the lower coordinated water molecules is found at higher temperatures around all residues in both monomer and dimer. The probability distribution of order parameter around the binding site residues shows that the TIP3P water molecules are more heterogeneous in monomer compared to dimer chain-A which shows a similar trend to SPC/E water model. All the results are given in Supplementary Material, Section: mTIP3P results (Figure S1
8- S21).
Conclusion
Temperature dependence on the conformational evolution of SARS CoV-2 MPro using classical molecular dynamics simulations reveals that the global structure of the enzyme is intact with minimum changes while significant changes are observed at the various binding sites of the protease. The various binding sites of the main protease can be divided into catalytic dyad, substrate-binding site, dimerization site and allosteric site. The basic structural changes evaluated by RMSD, rGyr and RMSF describe the thermal fluctuations in the protein with the rise in the temperature. Even though the dimerization of protease helps in the structural stability and function, the dimer turns to be unstable with rise in the temperature. We studied two different water models, namely the CHARMM-SPC/E and the CHARMM-TIP3P water model system. Network analysis shows the conformational evolution of the protease at higher temperatures. The structural analyses performed selectively for important amino acid residues at the various binding site showed the flexible nature of domain III at higher temperatures may affect the dimerization of the main protease which is crucial for its biological function. The dihedral analysis and PCA analysis around the binding sites concluded the flexible behaviour of both monomer and dimer-Chain A with an increase in temperature. The water structural properties were analysed in terms of radial distribution functions, tetrahedral order parameter (Sg) values and fraction of the number of hydrogen bonds around various binding sites of the main protease. The water structure in the solvation shell is found to be more heterogeneous at lower temperatures around the binding site amino acid residues. The higher hydrogen bond lifetime of the water molecules shows that the water molecules have a considerable effect on the stability of the binding sites. It is found that the ΔΔG value of interaction between water and the binding site amino acids is around 0.5 kJ/mol for both dimerization and allosteric binding site and 0.3 kJ/mol for substrate binding site. This explains the exposure of amino acid residues to the water molecules. The results of the two different water models showed a similar trend; however, the SPC/E water model showed a more pronounced effect compared to the TIP3P water model.
CRediT authorship contribution statement
Pushyaraga P. Venugopal: Software, Methodology, Validation, Formal analysis, Visualization, Investigation, Data curation, Writing – original draft. Omkar Singh: Software, Methodology, Validation, Formal analysis, Visualization, Data curation, Writing – original draft. Debashree Chakraborty: Conceptualization, Methodology, Formal analysis, Resources, Investigation, Writing – review & editing, Supervision, Funding acquisition.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Authors: Noam Morningstar-Kywi; Kaichen Wang; Thomas R Asbell; Zhaohui Wang; Jason B Giles; Jiawei Lai; Dab Brill; Brian T Sutch; Ian S Haworth Journal: J Chem Inf Model Date: 2022-03-09 Impact factor: 6.162
Authors: A D MacKerell; D Bashford; M Bellott; R L Dunbrack; J D Evanseck; M J Field; S Fischer; J Gao; H Guo; S Ha; D Joseph-McCarthy; L Kuchnir; K Kuczera; F T Lau; C Mattos; S Michnick; T Ngo; D T Nguyen; B Prodhom; W E Reiher; B Roux; M Schlenkrich; J C Smith; R Stote; J Straub; M Watanabe; J Wiórkiewicz-Kuczera; D Yin; M Karplus Journal: J Phys Chem B Date: 1998-04-30 Impact factor: 2.991