Literature DB >> 34185681

AI-based spectroscopic monitoring of real-time interactions between SARS-CoV-2 and human ACE2.

Sheng Ye1,2,3, Guozhen Zhang3, Jun Jiang4.   

Abstract

The novel coronavirus, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), invades a human cell via human angiotensin-converting enzyme 2 (hACE2) as the entry, causing the severe coronavirus disease (COVID-19). The interactions between hACE2 and the spike glycoprotein (S protein) of SARS-CoV-2 hold the key to understanding the molecular mechanism to develop treatment and vaccines, yet the dynamic nature of these interactions in fluctuating surroundings is very challenging to probe by those structure determination techniques requiring the structures of samples to be fixed. Here we demonstrate, by a proof-of-concept simulation of infrared (IR) spectra of S protein and hACE2, that time-resolved spectroscopy may monitor the real-time structural information of the protein-protein complexes of interest, with the help of machine learning. Our machine learning protocol is able to identify fine changes in IR spectra associated with variation of the secondary structures of S protein of the coronavirus. Further, it is three to four orders of magnitude faster than conventional quantum chemistry calculations. We expect our machine learning protocol would accelerate the development of real-time spectroscopy study of protein dynamics.
Copyright © 2021 the Author(s). Published by PNAS.

Entities:  

Keywords:  IR spectroscopy; SARS-CoV-2; neural networks; protein dynamics

Mesh:

Substances:

Year:  2021        PMID: 34185681      PMCID: PMC8256048          DOI: 10.1073/pnas.2025879118

Source DB:  PubMed          Journal:  Proc Natl Acad Sci U S A        ISSN: 0027-8424            Impact factor:   11.205


The ongoing pandemic of COVID-19, a highly infectious disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has posed tremendous threat to human health and well-being by having affected tens of millions of people and killed more than 1 million affected since December 2019 (1). It has spurred enormous efforts in biological and biomedical research to search for a solution to this fatal disease, which rapidly advance our knowledge about it, including the identity of the pathogen (i.e., SARS-CoV-2), the genome sequence of the virus, and the structural basis for coronavirus recognition and infection (2–5). SARS-CoV-2 recognizes human angiotensin-converting enzyme 2 (hACE2) as the entry receptor to host cells using its surface spike glycoprotein (S protein) (1). The interactions of S protein with hACE2 have been subjected to intensive investigations by several groups (6–10), which laid the foundation for comprehensive understanding of the invasion of SARS-CoV-2 into the human body at the atomic scale (11), helps the search for intermediate hosts of the coronavirus (12), and will guide the design of therapeutics and vaccines (11, 13). Since the physiological environment in which S protein and hACE2 interact is always fluctuated due to the dynamic nature of water, a dynamic picture of the interactions between them is needed for precise mechanistic understanding that will inspire modulation and application (14). Unfortunately, such information relies on real-time tracking of protein conformations, which cannot be achieved by powerful structure characterization techniques with atomic precision like X-ray diffraction and cryoelectron microscopy, because they require fixed structures in samples. It motivates us to develop alternative approaches to resolve the issue. Recently, time-resolved infrared (IR) spectroscopy techniques have realized successful monitoring of changes of secondary structure with time (15), signaling the feasibility of real-time observation of protein dynamics in ambient conditions using spectroscopy. However, to facilitate the monitoring of specific peptide fragments in a secondary structure typically requires isotope labeling (e.g., C=O in the amide of protein backbone is replaced with 13C=O or C=18O) in the preparation of samples, which is, unfortunately, tedious and expensive for systematic investigation on conformation changes in protein dynamics. Therefore, it is desirable to develop isotope labeling-free spectroscopy to accelerate structure study of proteins for biological and biomedical sciences. To achieve this goal, one needs to employ quantum chemistry calculations to complete spectra signal assignment and structure determination. In fact, it relies on computer simulations of various possible conformers to nail the job, which is, unfortunately, very expensive for macromolecules like proteins. One of the biggest bottleneck problems in spectroscopic measurement of proteins is lack of rapid theoretical interpretation that can timely translate spectra signals into structural information. As a result, it is nearly impossible for an experimental spectroscopic study to monitor continuous structural changes associated with protein functions. Developing a cost-effective spectra simulation protocol is a pressing task to advance the real-time spectroscopy study of protein structures. Machine learning (ML), a collection of statistics-based methods which gain prediction power from the learning of big data, has emerged as a powerful toolkit to reduce the barrier to revealing the structure−property relationship (16). It has been increasingly popular in the study of molecules and materials, such as predicting chemical reaction routes (17) and accelerating discovery of materials (18). Especially, neural networks (NN), a subclass of ML algorithms, are well recognized for handling complex nonlinear problems. NN established a predictive model for desired properties by iterative optimization of a complex high-dimensional function in a virtually infinite space of parameters. This feature makes it a transferrable tool for predicting protein spectra (19). In this article, we developed and applied a cost-effective ML protocol, to predict the IR spectra along with the kinetic process of a COVID-2019 virus (SARS-CoV-2) protein binding to hACE2. The efficient simulation of IR signals of different states of the coronavirus associated with the changes in its secondary structure is very encouraging for studying dynamic interactions between S protein of SARS-CoV-2 and human ACE2 with the help of ML techniques. This will enable a real-time spectroscopic monitoring of protein structure evolution for this deadly virus, providing valuable information for understanding its molecular mechanism, as well as developing cures and vaccines. ML should provide a cost-effective tool for simulating optical properties of SARS-CoV-2.

Results and Discussion

The technique details of this ML protocol have been elaborated elsewhere (20). Here we just sketch the basic idea of the framework (Fig. 1). We adopt a divide-and-conquer strategy to treat the amide I vibrations of the whole protein. The vibration of a protein is represented as a set of n oscillators associated with each peptide bond in its backbone. The Frenkel exciton model is employed to construct a vibrational model Hamiltonian (21), in which the diagonal elements are the frequency (ω) of the ith amide I oscillator, and the off-diagonal elements include the coupling coefficient (J) between two oscillators i and j (Fig. 1). To obtain these matrix elements, a protein is split into individual peptide bonds and dipeptides. The values of ω and are predicted from an NN model of peptide, that is, N-methylacetamide (22, 23). For off-diagonal elements, there are two scenarios: Those coupling coefficients between two neighboring oscillators are computed using an NN model of dipeptide, that is, N-acetyl-glycine-N′-methylamide (GLDP) (24, 25); those between a pair of nonneighboring oscillators are calculated with the dipole approximation (26) assuming that, given the distances between oscillators are greater than the length of the peptide bond,, where ε0 is the dielectric constant, () is the transition dipole of peptide bond i (j), and r is the vector connecting dipoles i and j. After all matrix elements of the model Hamiltonian are obtained, IR spectra are simulated using the SPECTRON program developed by Mukamel and coworkers (27). We also make this ML protocol available online to facilitate the development of experimental spectroscopy of rapid protein IR spectroscopy prediction (28).
Fig. 1.

ML protocol for the IR spectra of proteins.

ML protocol for the IR spectra of proteins. We first simulated the amide I IR spectra of SARS-CoV-1 and SARS-CoV-2 using the ML protocol described in Fig. 1 by the averages from 1,000 and from 2,000 snapshots, respectively (which would be prohibitively expensive via direct quantum mechanics computations). The simulation environment was water, which serves as the solvent of protein solution. The real protein solution contains more than protein and water molecules, but the specific aim in this work is to investigate how our ML protocol accelerates the simulation of protein IR spectra to facilitate the atomic-scale understanding of structure changes associated with S protein of SARS-CoV-2 binding to hACE2, not to understand impacts of specific enviromental factors in solution on protein−protein complex. Therefore, we made a necessary simplification of the protein solution model and considered water as the only component other than protein of interest in our model. For this reason, we used molecular dynamics (MD) simulation trajectories of protein solutions which only involve water as the environment. The structures and trajectories of SARS-CoV-1 and SARS-CoV-2 are obtained from MD simulations by ourselves and Komatsu et al. (29), respectively. The good agreement of SARS-CoV-1 between our ML predictions (average 1,000 snapshots) and experimental spectra (30) is evident from the high Spearman rank correlation coefficients (ρ = 0.93) (31) (Fig. 2), which was widely used to measure the agreement between the predicted and experimental spectra. From the 10 microseconds (μs) MD simulation trajectories (contain 10 trajectories; 1,000 snapshots for nos. 1 through 10 trajectories) obtained from Komatsu et al., we have chosen the amide I IR spectra of the SARS-CoV-2 with this ML protocol by average 2,000 snapshots in the first 2us for comparison, since the results have converged on the considered number of snapshots () (for the results of the remaining 8,000 snapshots, please see ). Then we predicted the amide I IR spectra of the SARS-CoV-2 with this ML protocol (average 2,000 snapshots). As shown in Fig. 2, the dominant peak of SARS-COV-2 has a 5 cm−1 blue shift compared with SARS-COV-1 (SARS-COV-1: 1,658.72 cm−1, SARS-COV-2: 1,663.62 cm−1). This may be accounted for by SARS-COV-2 having a larger portion of the β-turns content than SARS-COV-1 (Table 1), and β-turns possessing an amide IR signal of higher frequency (32–34). Importantly, our ML protocol identified the fine difference in amide I IR spectra associated with the difference between their secondary structures, and it is four orders of magnitude faster than conventional quantum chemistry calculations (Table 1).
Fig. 2.

ML-predicted IR spectra of SARS-CoV-1, SARS-CoV-2, SARS-CoV-1-hACE2, and SARS-CoV-2-hACE2. (A) Comparison of experimental (30) (black line) and ML-predicted (red line: single crystal structure [PDB ID code 2AMQ]; blue line: average of 1,000 configurations) spectra of SARS-CoV-1. (B) ML-predicted IR spectra of SARS-CoV-2 based on a single crystal structure (red lines, PDB ID code 6LU7) and 2,000 MD configurations (blue lines). (C) ML-predicted IR spectra of SARS-CoV-1-hACE2 (PDB ID code 2AJF) during 10us MD simulation (contains nine trajectories; 1,000 snapshots for nos.1 to 8 trajectories, 334 snapshots for no. 9 trajectory). (D) Same as C but for SARS-CoV-2-hACE2 (PDB ID code 6M17). Intensity is scaled to have the same maximum intensity for each panel.

Table 1.

Average secondary structure content (computed by Stride program) of various coronaviruses and comparison of the time required for computing IR spectra of a single structure by Density Functional Theory (DFT) and our ML model in the framework of vibrational exciton model

β-Strands (%)β-Turns (%)α-Helix (%)310-Helices (%)Coil (%)Bridge (%)DFT (s)ML (s)
SARS-COV-130.119.923.92.521.02.51,165,32070.69
SARS-COV-228.325.520.32.620.42.91,173,00072.68
SARS-CoV-1-hACE27.623.245.23.918.02.21,482,120100.80
SARS-CoV-2-hACE27.021.245.63.221.81.21,474,44098.68
Trimeric SARS-CoV-2 S protein (closed state)30.725.618.01.921.91.76,068,1005,295.60
Trimeric SARS-CoV-2 S protein (open state)30.625.118.71.922.11.66,068,1004,613.40
RBD/hACE2 binding (S1 state)32.322.19.47.827.70.8370,44020.64
RBD/hACE2 binding (S2 state)31.821.512.16.227.31.2370,44020.64
RBD/hACE2 binding (S3 state)33.525.512.16.221.51.2370,44020.64
RBD/hACE2 binding (S4 state)33.021.49.47.827.31.2370,44020.64
RBD/hACE2 binding (S5 state)33.021.911.64.727.61.2370,44020.64

All reported times refer to calculations on an eight-core Intel(R) Xeon(R) CPU (E5-2683v4 at 2.1 GHz). DFT, Density Functional Theory.

ML-predicted IR spectra of SARS-CoV-1, SARS-CoV-2, SARS-CoV-1-hACE2, and SARS-CoV-2-hACE2. (A) Comparison of experimental (30) (black line) and ML-predicted (red line: single crystal structure [PDB ID code 2AMQ]; blue line: average of 1,000 configurations) spectra of SARS-CoV-1. (B) ML-predicted IR spectra of SARS-CoV-2 based on a single crystal structure (red lines, PDB ID code 6LU7) and 2,000 MD configurations (blue lines). (C) ML-predicted IR spectra of SARS-CoV-1-hACE2 (PDB ID code 2AJF) during 10us MD simulation (contains nine trajectories; 1,000 snapshots for nos.1 to 8 trajectories, 334 snapshots for no. 9 trajectory). (D) Same as C but for SARS-CoV-2-hACE2 (PDB ID code 6M17). Intensity is scaled to have the same maximum intensity for each panel. Average secondary structure content (computed by Stride program) of various coronaviruses and comparison of the time required for computing IR spectra of a single structure by Density Functional Theory (DFT) and our ML model in the framework of vibrational exciton model All reported times refer to calculations on an eight-core Intel(R) Xeon(R) CPU (E5-2683v4 at 2.1 GHz). DFT, Density Functional Theory. Then we simulated the amide I IR spectra of SARS-CoV-1-hACE2 (hACE2 in complex with the receptor binding domain of spike protein from SARS-CoV-1) and SARS-CoV-2-hACE2 (hACE2 in complex with the receptor binding domain of spike protein from SARS-CoV-2) by average 8,334 snapshots with our ML protocol (Fig. 2). These MD simulation data were retrieved from the website of D. E. Shaw Research (35). Each MD simulation is 10 μs and contains nine trajectories (1,000 snapshots for nos. 1 to 8 trajectories, 334 snapshots for no. 9 trajectory). We also chose the averaged IR spectra of the first trajectory (1st: 1,200 ns which contains 1,000 snapshots) for comparison. From the average secondary structure content analysis (by average 1,000 snapshots from no. 1 trajectory) by the Stride program (36), the random coil content of RBD2-hACE2 was higher than that of RBD1-hACE2, and the β-turn content was lower than that of RBD1-hACE2, which led to a 6 cm−1 red shift of the dominant peak (32–34, 37) (RBD1-hACE2: 1,649.33 cm−1; RBD2-hACE2: 1,643.41 cm−1) (Table 1). Again, the difference in secondary structures between RBD1-hACE2 and RBD2-hACE2 is clearly characterized by our ML-based IR spectra simulation. The trimeric SARS-CoV-2 S protein has two distinctive states: closed state and open state (6). Intriguingly, they have substantially different secondary structures. From the 10 μs MD simulation trajectories (contain nine trajectories; 1,000 snapshots for nos. 1 to 8 trajectories, 334 snapshots for no. 9 trajectory) obtained from the website of D. E. Shaw Research, we have simulated the amide I IR spectra of the trimeric SARS-CoV-2 S protein with closed and open states by using 800 snapshots in the first trajectory for comparison. (For the results of remain trajectories, please see .) It is noticed that the dominant peak of the trimeric SARS-CoV-2 S protein in the open state has a 3 cm−1 red shift compared with closed state, which coincides with the secondary structure content difference (the β-turn of the open state is lower but the coil content is higher than closed state (33, 37, 38) (Fig. 3 and Table 1).
Fig. 3.

ML-predicted IR spectra of Trimeric SARS-CoV-2 S protein. (A) Closed state (PDB ID code 6VXX). (B) Open state (PDB ID code 6VYB).

ML-predicted IR spectra of Trimeric SARS-CoV-2 S protein. (A) Closed state (PDB ID code 6VXX). (B) Open state (PDB ID code 6VYB). Finally, we investigated the dynamics of S protein of SARS-CoV-2 interacting with hACE2 interaction, using our ML protocol. Five representative structures were selected from D. E. Shaw Research (35). We predicted the IR spectra of S protein in different states during the combination process by ML and calculated the average secondary structure components in each state (Fig. 4 and Table 1). The identified five states are of chemical interest for understanding the process of dynamic interaction between the S protein of SARS-CoV-2 and the hACE2. They are five successive states used for describing such a process. Specifically, we have identified S1 to S5 states based on the trajectory of accelerated weighted ensemble MD simulations (source: D. E. Shaw Research) of 9,072 ps duration. Specifically, S1 denotes t = 0 ps in the MD simulation; S2: t = 1,008 ps; S3: t = 3,931.2 ps; S4: t = 4,838.4 ps; and S5: t = 7,056 ps. From the S1 to S2 state, the IR spectra has a 2.57 cm−1 blue shift. The analysis of the average secondary structure content showed that the main change from S1 to S2 was the increased content of α-helix which led to a blue shift (33, 37, 38). From S2 to S3, the IR spectra also has a 6 cm−1 blue shift corresponding to the averaged secondary structure content change (33, 37, 38) (S2 to S3: β-turns increased while coil decreased). From S3 to S4, the IR spectra has a 5 cm−1 red shift which is caused by the β-turns and α-helix decreasing while coil content increased (32, 34, 37, 38). From S4 to S5, the IR spectra has a 4 cm−1 blue shift which is caused by β-turns and α-helix increasing (33, 34). The changes in the IR spectra of the S protein under different states associated with the changes in the secondary structure are correctly captured by our ML protocol. We have further investigated the amide I signals of different SARS-CoV-2 spikes (S proteins), as shown in ; from Sa to Sb, the dominant peak of spectra has a blue shift, which corresponds to the increase of β-turns and α-helix and the simultaneous decrease of coil (). From Sb to Sc, the dominant peak of spectra has a red shift, which corresponds to the decrease of β-turns and α-helix and the simultaneous increase of coil (). The structural change is clearly captured by the change of spectra (). This supplementary result suggests that our ML protocol can help spectroscopy experiments track structural changes of proteins; we think our method provides a promising route for studying real-time dynamics regarding to the interactions of SARS-CoV-2 and human ACE2.
Fig. 4.

Five representative states of the receptor-binding domain of the SARS-CoV-2 spike (S protein) and the human ACE2 (hACE2) receptor were selected from the combination trajectory.

Five representative states of the receptor-binding domain of the SARS-CoV-2 spike (S protein) and the human ACE2 (hACE2) receptor were selected from the combination trajectory.

Conclusions

In conclusion, we have proposed a cost-effective ML protocol for predicting amide I IR spectra of SARS-COV-2 spike protein. The change in secondary structure of coronavirus can be clearly captured by our ML protocol, indicating its potential for monitoring of real-time interactions between SARS-CoV-2 and human ACE2. ML technique significantly accelerates the simulation of IR spectra of protein complexes, crucial for developing time-resolved IR spectroscopy techniques for studying dynamic protein−protein interactions.

Methods

MD simulations for SARS-CoV-1 (PDB ID code 2AMQ) were performed with the GROMACS package (39) and the OPLS-AA force fields (40). Electrostatic interactions were treated by the Particle mesh Ewald method, and Coulomb interactions were truncated at 12.0 Å. Energy minimization was performed for 50,000 cycles for each protein. Thereafter, an equilibration process in isothermal-isobaric (NPT) ensemble with an integration time step of 2 fs ran for 0.5 ns (40). Production dynamics were performed for a period of 2 ns in the NPT ensemble at 300 K while maintaining pressure at 1 atm. One thousand configurations were extracted with a 2-ps interval for calculating the IR spectra.
  33 in total

1.  An infrared spectroscopy approach to follow β-sheet formation in peptide amyloid assemblies.

Authors:  Jongcheol Seo; Waldemar Hoffmann; Stephan Warnke; Xing Huang; Sandy Gewinner; Wieland Schöllkopf; Michael T Bowers; Gert von Helden; Kevin Pagel
Journal:  Nat Chem       Date:  2016-09-26       Impact factor: 24.427

2.  Intermolecular interaction effects in the amide I vibrations of polypeptides.

Authors:  S Krimm; Y Abe
Journal:  Proc Natl Acad Sci U S A       Date:  1972-10       Impact factor: 11.205

3.  Vibrational-exciton couplings for the amide I, II, III, and A modes of peptides.

Authors:  Tomoyuki Hayashi; Shaul Mukamel
Journal:  J Phys Chem B       Date:  2007-08-29       Impact factor: 2.991

4.  Computing infrared spectra of proteins using the exciton model.

Authors:  Fouad S Husseini; David Robinson; Neil T Hunt; Anthony W Parker; Jonathan D Hirst
Journal:  J Comput Chem       Date:  2016-11-21       Impact factor: 3.376

5.  Angiotensin-converting enzyme 2 (ACE2) as a SARS-CoV-2 receptor: molecular mechanisms and potential therapeutic target.

Authors:  Haibo Zhang; Josef M Penninger; Yimin Li; Nanshan Zhong; Arthur S Slutsky
Journal:  Intensive Care Med       Date:  2020-03-03       Impact factor: 17.440

6.  Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding.

Authors:  Roujian Lu; Xiang Zhao; Juan Li; Peihua Niu; Bo Yang; Honglong Wu; Wenling Wang; Hao Song; Baoying Huang; Na Zhu; Yuhai Bi; Xuejun Ma; Faxian Zhan; Liang Wang; Tao Hu; Hong Zhou; Zhenhong Hu; Weimin Zhou; Li Zhao; Jing Chen; Yao Meng; Ji Wang; Yang Lin; Jianying Yuan; Zhihao Xie; Jinmin Ma; William J Liu; Dayan Wang; Wenbo Xu; Edward C Holmes; George F Gao; Guizhen Wu; Weijun Chen; Weifeng Shi; Wenjie Tan
Journal:  Lancet       Date:  2020-01-30       Impact factor: 79.321

7.  Structural basis for the recognition of SARS-CoV-2 by full-length human ACE2.

Authors:  Yuanyuan Zhang; Yaning Li; Renhong Yan; Lu Xia; Yingying Guo; Qiang Zhou
Journal:  Science       Date:  2020-03-04       Impact factor: 47.728

8.  Novel antibody epitopes dominate the antigenicity of spike glycoprotein in SARS-CoV-2 compared to SARS-CoV.

Authors:  Ming Zheng; Lun Song
Journal:  Cell Mol Immunol       Date:  2020-03-04       Impact factor: 11.530

9.  Structural basis of receptor recognition by SARS-CoV-2.

Authors:  Jian Shang; Gang Ye; Ke Shi; Yushun Wan; Chuming Luo; Hideki Aihara; Qibin Geng; Ashley Auerbach; Fang Li
Journal:  Nature       Date:  2020-03-30       Impact factor: 49.962

10.  Structure, Function, and Antigenicity of the SARS-CoV-2 Spike Glycoprotein.

Authors:  Alexandra C Walls; Young-Jun Park; M Alejandra Tortorici; Abigail Wall; Andrew T McGuire; David Veesler
Journal:  Cell       Date:  2020-03-09       Impact factor: 41.582

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.