Literature DB >> 33518776

Computed optical spectra of SARS-CoV-2 proteins.

Zhuo Li1, Jonathan D Hirst2.   

Abstract

Treatment for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the virus that causes Covid-19, may well be predicated on knowledge of the structures of protein of this virus. However, often these cannot be determined easily or quickly. Herein, we provide calculated circular dichroism (CD) spectra in the far- and near-UV, and infra-red (IR) spectra in the amide I region for experimental structures and computational models of SARS-CoV-2 proteins. The near-UV CD spectra offer greatest sensitivity in assessing the accuracy of models.
© 2020 Elsevier B.V. All rights reserved.

Entities:  

Year:  2020        PMID: 33518776      PMCID: PMC7836526          DOI: 10.1016/j.cplett.2020.137935

Source DB:  PubMed          Journal:  Chem Phys Lett        ISSN: 0009-2614            Impact factor:   2.328


Introduction

Since the outbreak of SARS-CoV-2 at the beginning of 2020, scientists have been seeking insights that will underpin solutions to this new pandemic. Structural characterization of the proteins of this virus is crucial in terms of understanding their biological functions, finding inhibitors and designing vaccines. The structure of SARS-CoV-2 spike glycoprotein has been determined by cryo-electron microscopy and is being used in the design of vaccines targeting entry into cells [1]. The main protease, which is essential in processing RNA translation, is another key target. The crystal structures of the main protease with and without inhibitors have been reported [2], which have provided a basis for some theoretical studies. Wang tried to repurpose approved drugs or drugs in clinical trials by performing docking screening against the crystal structure of the main protease [3]. Huynh et al. screened the ZINC database targeting the main protease as well [4], but they used the structure of the protein receptor after equilibration in molecular dynamics (MD) simulations. Docking with the equilibrated structure revealed an interesting binding mechanism, the occupation of the “anchor” site by the ligand. The virus attaches to the host cell via recognition of the angiotensin-converting enzyme 2 (ACE2). Han and Král designed peptide inhibitors by mimicking the helical structure on the binding interface of ACE2 to block the binding of the SARS-CoV-2 receptor binding domain (RBD) [5]. These inhibitors showed stable binding in MD simulations. When combining the experimental structures with MD simulations, we can get better understanding of some biological processes of SARS-CoV 2, such as the recognition of the RBD and ACE2 [6] and the catalytic mechanism of the 3C-like protease [7]. Currently, structures of more than half of the SARS-CoV-2 proteins have not yet been determined experimentally. Prediction of protein structures using computational methods is an alternative approach. Computational researchers from around the world are sharing the models of SARS-CoV-2 proteins that they obtained with different prediction algorithms, either from a homology template or de novo, in order to facilitate drug design or functional studies [8]. However, the predicted protein structures need to be validated via experimental methods. Optical spectroscopy may provide valuable information in this respect.

Methods

In this study, we have collected SARS-CoV-2 protein coordinates either from experimental determination as reported in the Protein Data Bank (PDB) or from computational modelling [9], [10], [11] in order to calculate their optical spectra (Table S1 in the Supplementary Information). Eight of the proteins have more than one PDB entry. Multiple PDB structures of the same protein were used if they were determined by the same research group under similar experimental conditions. Other PDB entries of the considered proteins were not included, because of the variation of the experimental conditions, bound ligands and complex composition. For the spike glycoprotein we only calculated its near-UV CD spectrum, due to its large size. Protein structures are flexible under physiological conditions. Although informative, a crystal structure or a modelled structure can only represent a fixed stable state. Thus, we also calculated near-UV CD spectra using snapshots from MD trajectories shared by D. E. Shaw Research [12] of five proteins to take into account the conformational flexibility of the proteins in solution. Simulations of the spike protein RBD-ACE2 complex (6M17), spike glycoprotein (6VXX and 6VYB) and RNA polymerase-nsp7-nsp8 (6M71) had been performed for 10 μs. We extracted 425 snapshots sampled uniformly every 24 ns to use in the calculation of the near-UV CD spectra. The simulation of the main protease (6Y84) was over a period of 100 μs and for this protein we extracted 500 snapshots sampled uniformly every 200 ns. The methodology for the MD simulations was described in the report of D. E. Shaw [12]. We summarise just the most salient details; additional information and the associated literature references can be found in the original report [12]. The trajectory of the apo main protease started from the X-ray crystal structure (PDB entry 6Y84). The protein was solvated in a 120 × 120 × 120 Å water box. The simulation was conducted at 300 K in the NPT ensemble and was performed on Anton 2 using the DES-Amber force field. Trajectories of the trimeric SARS-CoV-2 spike glycoprotein were initiated from the closed state (PDB entry 6VXX), and from a partially opened state (PDB entry 6VYB). The simulations used the Amber ff99SB-ILDN force field for proteins, the TIP3P model for water, and the generalized Amber force field for glycosylated asparagine. The simulations were conducted at 310 K in the NPT ensemble. The protocol used for the spike glycoprotein was also used for the spike protein RBD-ACE2 complex. For all simulations, the systems were neutralized and salted with NaCl, with a final concentration of 0.15 M. We considered 23 computational models of 17 protein or protein fragments from three sources (Table S1). The SWISS-MODEL server has provided the full SARS-CoV-2 proteome including 25 protein models and five hetero-oligomeric complexes [9]. In this study, we selected the models where there are no experimental coordinates available and where their QMEAN quality estimates [13] indicate high quality/confidence. Thus, we computed spectra of seven SwissModel SARS-CoV-2 structures (Table S1). There are proteins where no homology model is available, due to a lack of a suitable template. Six proteins have also been studied with the ‘free modelling’ method, AlphaFold, which is a deep learning system to predict the structure of proteins for which there is no similar template on which to base a model [10]. Models of these proteins and another four were refined by Heo and Feig [11] via their pipeline including inter-residue distance prediction with trRosetta [14], lowest energy model selection and MD simulation-based refinement. The secondary structure content of the proteins shown in Table S1 was calculated with the DSSP server [15], [16]. We calculate three types of optical spectra of SARS-CoV-2 proteins: far-ultraviolet (UV) electronic circular dichroism (CD), near-UV CD and infrared (IR) spectra in the amide I region. Although these techniques do not provide atomic level structural information, they reflect the conformational flexibility of proteins in solution (and by extension under physiological conditions). All three spectroscopic methods can monitor conformational changes perhaps induced by changes in experimental conditions or by ligand/inhibitor binding. Thus, comparing the experimental spectra with the calculated spectra may provide information about the quality of the structural model. Far-UV CD spectroscopy is sensitive to chiral features of the protein backbone which are reflected by the excitation bands arising from the peptide bond. Each type of protein secondary structure has its unique far-UV CD features [17] and the far-UV CD spectrum can be deconvolved as a combination of each secondary structure component [18]. It has been used in determining the secondary structures of the papain-like protease of MERS [19] and the envelope protein of SARS [20]. The far-UV CD has also been used to measure the interaction between the SARS-CoV-2 spike protein receptor binding domain (RBD) and heparin [21]. Calcium binding by the SARS 3a protein has been monitored using CD [22]. The far-UV CD spectrum showed the increase of α-helicity when mixing peptides from two heptad repeat regions of the SARS spike protein which suggested the formation of a complex [23], [24]. This fusogenic mechanism is also evident in SARS-CoV-2 and has been used to design fusion inhibitors [25]. In this study, we employed the PDB2CD server [26], an empirical method, to predict the far-UV CD spectra of the SARS-CoV-2 proteins. A reference dataset, SMP180 (soluble + membrane), is used to search structural similarity of the query protein and proteins in the dataset and the CD spectra of the reference proteins are used to create the CD spectrum. The amide I IR spectrum of a protein also provides information about the backbone conformation. The signal arises mainly from the CO stretch mode and the N-H bond wagging and bending. Different secondary structures have different peak positions between 1620 and 1690 cm−1 [27]. The amide I band was calculated with Coupled Oscillator Model Spectrum Simulator (COSMOSS) [28]. All settings were used at their default values. COSMOSS constructs a vibrational exciton Hamiltonian. The coupling between the local amide I modes was modelled using transition dipole coupling with the nearest neighbour coupling corrected by Jansen’s (ϕ, ψ) angle map [29]. The local mode frequencies of the amide I vibrations are also calculated using a nearest-neighbour frequency shift [29]. The calculation of the IR amide I band was based on a single structure, but the influence of conformational dynamics on the inhomogeneous broadening of the bands is approximated by adding some random disorder to the elements of the exciton Hamiltonian: 20 cm−1 to the diagonal terms and 5 cm−1 to the off-diagonal elements, corresponding to the magnitude of fluctuations that might be anticipated in solution under ambient conditions. Homogeneous broadening is approximated by convolving the computed line spectra with a 10 cm−1 linewidth. Near-UV CD spectra of proteins, on the other hand, contain information regarding the orientations of the aromatic side chains as well as their interactions with the surrounding environment [18], [30], [31]. Near-UV CD spectra were calculated with the DichroCalc server [32], which uses a matrix formulation to represent the exciton coupling of the chromophores in the protein. In the exciton framework an effective Hamiltonian is constructed from a basis of local excitations [33]. The exciton methodology for protein CD calculations has been recently reviewed [34]. The electronic transitions of the aromatic side chain chromophores, phenylalanine (Phe), tyrosine (Tyr) and tryptophan (Trp), were described via ab initio calculated parameters [35] extended to incorporate the vibrational structure under the electronic bands [36]. Backbone chromophores were modelled with an ab initio parameter set [37] and two transitions were employed for the peptide bond. The calculated near-UV CD line spectra were convolved with a Gaussian function with a 4 nm bandwidth. Calculated intensities from each of the MD snapshots were averaged with equal weighting to give the final spectrum.

Results

Fig. 1 compares the spectroscopic features of the four structures of the main protease in its apo form. All four structures have almost identical far-UV CD spectra with one moderate positive peak at 190 nm, one moderate negative band at 208 nm and an unresolved shoulder at 220 nm.
Fig. 1

Calculated spectra with four experimentally determined structures of the apo-form SARS-CoV-2 main protease. Left: far-UV CD; middle: near-UV CD; right: IR. PDB ID and colour code: 6M03 (black), 6Y2E (blue), 6Y84 (red), 5R8T (green). The spectrum depicted with dashed lines in the middle panel is calculated with MD simulation snapshots. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Calculated spectra with four experimentally determined structures of the apo-form SARS-CoV-2 main protease. Left: far-UV CD; middle: near-UV CD; right: IR. PDB ID and colour code: 6M03 (black), 6Y2E (blue), 6Y84 (red), 5R8T (green). The spectrum depicted with dashed lines in the middle panel is calculated with MD simulation snapshots. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) This feature matches the mixed α-helical and β-strand content of the secondary structure of this protein (Table S1). The calculated amide I IR bands of these four structures are very similar as well. The near-UV CD spectra, however, show obvious differences. Calculation with the 6Y2E structure leads to less intense signals in the tyrosine and tryptophan region (270–290 nm), while the other three structures show similar bands in terms of the band shape and the intensity. 6Y2E and 6 M03 have a stronger negative bands in the phenylalanine region (250 – 268 nm) than the other two 3C-like proteinase structures. This example illustrates that crystal structures from different experiments show similar secondary structure content, but the conformations of the aromatic side chains may be different. Near-UV CD spectrum is sensitive to the different orientation of the aromatic side chains. Calculated spectra of the methyltransferase-nsp10 complex (Figure S1), ADP ribose phosphatase (Figure S2) and RNA polymerase-nsp7-nsp8 complex (Figure S6) also illustrate that the near-UV CD is sensitive to the local conformational differences of the aromatic residues. The other three proteins, nsp9 (Figure S4), nucleocapsid phosphoprotein (Figure S5) and the spike protein RBD and ACE2 complex (Figure S7), have similar band shapes for all three types of calculated spectra. We have calculated the near-UV CD spectra with the snapshots extracted from the trajectories provided by D. E. Shaw Research [12] (Fig. 1, middle panel), since theoretical prediction of the spectrum can sometimes be enhanced by using a set of conformations sampled from an equilibrium ensemble [38]. The calculated spectra with snapshots show similar band shapes for the tyrosine and phenylalanine region with varying magnitude. However, the tryptophan peak is oppositely signed. Small molecule inhibitors are one possible solution to prevent the function of SARS-CoV-2 main protease in processing RNA translated proteins. Fig. 2 compares the calculated spectra of the main protease with and without the inhibitor binding. Compared to the apo form (6Y2E), inhibitor binding structures (6Y2F and 6Y2G) have slightly lower helical content and a slightly higher amount of turn (Table S1). 6Y2G has the same β-strand content as the apo form, whereas 6Y2E has less β-strand in its structure. These secondary structure changes are reflected in the far-UV CD spectra (Fig. 2 left panel). In the near-UV CD spectra (Fig. 2 middle panel), binding to inhibitors induced opposite spectroscopic changes. 6Y2F shows a weak band with opposite sign in the tyrosine region (270–286 nm) compared to the apo structure (6Y2E), while 6Y2G gives a much stronger positive spectral signal from the tyrosine and tryptophan residues. The influence of inhibitor binding on the near-UV CD arises from the overall conformation changes of the protein rather than direct electronic coupling with the aromatic side chain, since only one phenylalanine is within 6 Å of the inhibitor (Figure S13).
Fig. 2

Calculated spectra with crystal structures of SARS-CoV-2 main protease with (red and blue) or without (black) alpha-ketoamide inhibitors binding. Left: far-UV CD; middle: near-UV CD; right: IR. PDB ID and colour code: 6Y2E (black), 6Y2F (blue), 6Y2G (red). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Calculated spectra with crystal structures of SARS-CoV-2 main protease with (red and blue) or without (black) alpha-ketoamide inhibitors binding. Left: far-UV CD; middle: near-UV CD; right: IR. PDB ID and colour code: 6Y2E (black), 6Y2F (blue), 6Y2G (red). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) Fig. 3 shows examples of calculated spectra with different models of the same proteins, nsp6, M-protein and nsp4. The calculated far-UV CD features (Fig. 3) consist of an intense positive peak at 190 nm and negative peaks at 208 and 220 nm, which correspond well to this secondary structure composition, namely high helix content (Table S1). The intensities of peaks decrease with the reduction of the helical content of the protein. The two models of nsp6 show a similar band shape in the near-UV CD spectrum below 290 nm, but the Feig model gives weaker calculated signals. The tryptophan in the Feig model leads to a negative signal, while no such band was observed in the AlphaFold structure. For M-protein and nps4, models from the two groups have opposite signed peaks in their calculated near-UV CD spectra. We are still investigating the origin of this. However, it is hard to dissect the near-UV CD signals especially for proteins with a large number of aromatic residues.
Fig. 3

Calculated spectra of SARS-CoV-2 proteins with models from the Feig group (solid line) or AlphaFold (dashed line). Left: far-UV CD; middle: near-UV CD; right: IR. Colour code: nsp6 (black), M−protein (blue) and nsp4 (red). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Calculated spectra of SARS-CoV-2 proteins with models from the Feig group (solid line) or AlphaFold (dashed line). Left: far-UV CD; middle: near-UV CD; right: IR. Colour code: nsp6 (black), M−protein (blue) and nsp4 (red). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) ORF10 and Protein 7a have no helical structure. They contain 26.3% and 60.3% β-strand content, respectively. As shown in Figure S10, they have very different far-UV CD spectra. This is mainly due to the nature of the β structures. Regular β sheets have a positive band at around 195 nm and a negative band with comparable magnitude near 216 nm. A spectrum with a positive band near 190 nm and a minimum at about 200 nm indicates irregular β structures in the protein [39]. Thus, Protein 7a shows regular β-sheet features and ORF10 has more β-bulges and irregular strands in its structure. The numerical data for all the calculated spectra have been deposited on Mendeley [40].

Conclusions

The experimentally determined protein structures share similar secondary structures among different PDB entries. These generated similar calculated far-UV CD and amide I IR spectra. The near-UV CD spectrum is sensitive to the local conformational changes of aromatic residues. Thus, it can distinguish subtly varying tertiary structure features in different protein models. Near-UV CD spectra are difficult to interpret in the absence of complementary experimental or computational data. However, when combined with an ensemble of MD snapshots, calculated spectra may correspond directly to the actual conformational distribution of the protein in solution. Furthermore, the calculated spectra are sensitive to ligand binding. They provide a direct connection between models and experimental observables and they should be useful in assessing the accuracy of the computational models when compared with experimentally measured spectra of the proteins. The above remarks, of course, apply generally to proteins. In the context of the intense current interest in the SARS-CoV-2 proteins, we suggest that measurement of the optical spectra, particularly of the near-UV CD spectra, would be a valuable complement to the ongoing associated structural, simulation and modelling studies.

CRediT authorship contribution statement

Zhuo Li: Investigation, Writing - original draft. Jonathan D. Hirst: Conceptualization, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
  30 in total

1.  DichroCalc--circular and linear dichroism online.

Authors:  Benjamin M Bulheller; Jonathan D Hirst
Journal:  Bioinformatics       Date:  2009-01-07       Impact factor: 6.937

Review 2.  Circular dichroism.

Authors:  R W Woody
Journal:  Methods Enzymol       Date:  1995       Impact factor: 1.600

3.  Improved calculation of the n-pi rotational strength in polypeptides.

Authors:  R W Woody
Journal:  J Chem Phys       Date:  1968-12-01       Impact factor: 3.488

4.  Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features.

Authors:  W Kabsch; C Sander
Journal:  Biopolymers       Date:  1983-12       Impact factor: 2.505

5.  Fast Identification of Possible Drug Treatment of Coronavirus Disease-19 (COVID-19) through Computational Drug Repurposing Study.

Authors:  Junmei Wang
Journal:  J Chem Inf Model       Date:  2020-05-04       Impact factor: 4.956

6.  In Silico Exploration of the Molecular Mechanism of Clinically Oriented Drugs for Possibly Inhibiting SARS-CoV-2's Main Protease.

Authors:  Tien Huynh; Haoran Wang; Binquan Luan
Journal:  J Phys Chem Lett       Date:  2020-05-21       Impact factor: 6.475

7.  Structure of a conserved Golgi complex-targeting signal in coronavirus envelope proteins.

Authors:  Yan Li; Wahyu Surya; Stephanie Claudine; Jaume Torres
Journal:  J Biol Chem       Date:  2014-03-25       Impact factor: 5.157

8.  Structural and functional characterization of MERS coronavirus papain-like protease.

Authors:  Min-Han Lin; Shang-Ju Chuang; Chiao-Che Chen; Shu-Chun Cheng; Kai-Wen Cheng; Chao-Hsiung Lin; Chiao-Yin Sun; Chi-Yuan Chou
Journal:  J Biomed Sci       Date:  2014-06-04       Impact factor: 8.410

9.  Structure, Function, and Antigenicity of the SARS-CoV-2 Spike Glycoprotein.

Authors:  Alexandra C Walls; Young-Jun Park; M Alejandra Tortorici; Abigail Wall; Andrew T McGuire; David Veesler
Journal:  Cell       Date:  2020-03-09       Impact factor: 41.582

10.  Is the Rigidity of SARS-CoV-2 Spike Receptor-Binding Motif the Hallmark for Its Enhanced Infectivity? Insights from All-Atom Simulations.

Authors:  Angelo Spinello; Andrea Saltalamacchia; Alessandra Magistrato
Journal:  J Phys Chem Lett       Date:  2020-06-05       Impact factor: 6.475

View more
  3 in total

Review 1.  Step-by-step design of proteins for small molecule interaction: A review on recent milestones.

Authors:  José M Pereira; Maria Vieira; Sérgio M Santos
Journal:  Protein Sci       Date:  2021-05-10       Impact factor: 6.993

2.  Computed optical spectra of SARS-CoV-2 proteins.

Authors:  Zhuo Li; Jonathan D Hirst
Journal:  Chem Phys Lett       Date:  2020-08-29       Impact factor: 2.328

3.  B.1.1.7 (Alpha) variant is the most antigenic compared to Wuhan strain, B.1.351, B.1.1.28/triple mutant and B.1.429 variants.

Authors:  Manojit Bhattacharya; Ashish Ranjan Sharma; Bidyut Mallick; Sang-Soo Lee; Eun-Min Seo; Chiranjib Chakraborty
Journal:  Front Microbiol       Date:  2022-08-12       Impact factor: 6.064

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.