Literature DB >> 33583326

Structural genetics of circulating variants affecting the SARS-CoV-2 spike/human ACE2 complex.

Francesco Ortuso1,2, Daniele Mercatelli3, Pietro Hiram Guzzi3, Federico Manuel Giorgi4.   

Abstract

SARS-CoV-2 entry in human cells is mediated by the interaction between the viral Spike protein and the human ACE2 receptor. This mechanism evolved from the ancestor bat coronavirus and is currently one of the main targets for antiviral strategies. However, there currently exist several Spike protein variants in the SARS-CoV-2 population as the result of mutations, and it is unclear if these variants may exert a specific effect on the affinity with ACE2 which, in turn, is also characterized by multiple alleles in the human population. In the current study, the GBPM analysis, originally developed for highlighting host-guest interaction features, has been applied to define the key amino acids responsible for the Spike/ACE2 molecular recognition, using four different crystallographic structures. Then, we intersected these structural results with the current mutational status, based on more than 295,000 sequenced cases, in the SARS-CoV-2 population. We identified several Spike mutations interacting with ACE2 and mutated in at least 20 distinct patients: S477N, N439K, N501Y, Y453F, E484K, K417N, S477I and G476S. Among these, mutation N501Y in particular is one of the events characterizing SARS-CoV-2 lineage B.1.1.7, which has recently risen in frequency in Europe. We also identified five ACE2 rare variants that may affect interaction with Spike and susceptibility to infection: S19P, E37K, M82I, E329G and G352V.Communicated by Ramaswamy H. Sarma.

Entities:  

Keywords:  ACE2; COVID-19; SARS-CoV-2; mutations; spike

Mesh:

Substances:

Year:  2021        PMID: 33583326      PMCID: PMC7885719          DOI: 10.1080/07391102.2021.1886175

Source DB:  PubMed          Journal:  J Biomol Struct Dyn        ISSN: 0739-1102            Impact factor:   5.235


Introduction

The Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) has emerged in late 2019 (Zhu et al., 2020) as the etiological cause of a pandemic of severe proportions dubbed Coronavirus Disease 19 (COVID-19). The disease has reached virtually every country in the globe (Hilton & Keeling, 2020), with more than 40,000,000 confirmed cases and more than 1,100,000 deaths (source: World Health Organization). SARS-CoV-2 is characterized by a 29,903-long single stranded RNA genome, densely packed in 11 Open Reading Frames (ORFs); the ORF1 encodes for a polyprotein which is further split in 16 proteins, for a total of 26 proteins (Mercatelli & Giorgi, 2020). The second ORF encodes for the Spike (S) protein, which is the key protagonist in the viral entry into host cells, through its interaction with human epithelial cell receptors Angiotensin Converting Enzyme 2 (ACE2) (Tai et al., 2020), Transmembrane Serine Protease 2 (TMPRSS2) (Hoffmann et al., 2020), Furin (Xia et al., 2020) and CD147 (Ulrich & Pillat, 2020). Investigators have focused their attention on the Spike/ACE2 interaction, trying to disrupt it as a potential anti-COVID-19 therapy, using small drugs (Hanson, 2020) or Spike fragments (Peter & Schug, 2020). Using X-ray crystallography, some models of the Spike/ACE2 have been generated (Lan et al., 2020; Shang et al., 2020; Wang, Zhang et al., 2020), providing a structural instrument for the analysis of this key interaction. These models determined that the Receptor Binding Domain (RBD) of Spike, directly interacting with ACE2, is a compact structure of ∼200 amino acids (AAs) over a total of 1273 AAs of the full-length Spike. The SARS-CoV-2 Spike protein adapted from subsequent mutations from a wild bat beta-coronavirus (Ou, 2020), in order to exploit the N-terminal ACE2 peptidase domain conformation. As a result, SARS-CoV-2 Spike can establish a strong interaction with the human cell surface, allowing the virus to fuse its membrane with that of the host cell, releasing its proteins and genetic material and starting its replication cycle (Hoffmann et al., 2020). While SARS-CoV-2 shows low mutability (Ceraolo & Giorgi, 2020), with less than 25 predicted events/year (Hadfield et al., 2018), the virus is in continuous evolution from the original Wuhan reference sequence (NC_045512.2) (Tang et al., 2020), and there are currently at least six major variants circulating in the population (Mercatelli, Triboli et al., 2020; Mercatelli & Giorgi, 2020). Some of these strains are characterized by a mutation in Spike, at AA 614, whereas an Aspartic Acid (D) is substituted by a Glycine (G) (Sashittal et al., 2020). In fact, the Spike D614G mutation gives the name to the most frequent viral clade (G), which was first detected in Europe at the end of January 2020, and is currently present in all continents, with increasing frequency over time (Mercatelli & Giorgi, 2020). D614G does not fall within the putative RBD (AA ∼330–530), but some studies suggest it may have a clinically relevant role: D614G is positively correlated with increased case fatality rate (Becerra‐Flores & Cardozo, 2020), and it shows increased transmissibility and infectivity compared to the reference genome (Korber, 2020). In vitro studies show that viruses carrying the D614G Spike mutation have an increased viral load and cytopathic effect in cultured Vero cells (Tang et al., 2020). Despite these preliminary observations, there are still several doubts on the molecular effects of the D614G variant (Grubaugh et al., 2020). Other recurring Spike mutations have been observed in the population worldwide, however at frequencies of 1% or below (Mercatelli & Giorgi, 2020); some of these mutations fall within the RBD and therefore may have a direct role in ACE2 interaction. On the other hand, genetic variants of ACE2 in human population may influence susceptibility or resistance to SARS-CoV-2 infection, possibly contributing to the difference in clinical features observed in COVID-19 patients (Benetti, 2020). ACE2 gene is located on chromosome Xp22.2 and consists of 18 exons, coding for an 805 AAs long protein exposed on the cell surface of a variety of human organs, including kidneys, heart, brain, gastrointestinal tract, and lungs (Burrell et al., 2013). It is unclear if tissue-expression patterns of ACE2 may be linked to the severity of symptoms or outcomes of SARS-CoV-2 infections; however, ACE2 levels in lungs were found to be increased in patients with comorbidities associated to severe COVID-19 clinical manifestations (Pinto, 2020), whereas polymorphisms of ACE2 have been already described to play a role in hypertension and cardiovascular diseases (Bosso et al., 2020), particularly in association with type 2 diabetes (Burrell et al., 2013), all conditions predisposing to an increased risk of dying from COVID-19 (Zheng, 2020). Despite early studies, the presence of Spike mutations potentially altering the binding with ACE2 is still largely under-investigated, as is the role of ACE2 variants in the human population in determining patient-specific molecular interactions between these two proteins. In the present study, we aim at detecting which Spike and ACE2 AAs are the most important in determining the SARS-CoV-2 entry interaction and analyze which ones have already mutated in the population. The task is clinically relevant, providing a functional characterization of present and future mutations targeting the ACE2/Spike binding and detected by sequencing SARS-CoV-2 on a patient-specific basis. Characterizing the variability of both proteins must be taken in consideration in the process of developing anti-COVID-19 strategies, such as the Spike-based vaccine currently deployed by the National Institute of Allergy and Infectious Diseases and Moderna (Jackson, 2020).

Results

We set out to analyze the key AAs involved in the Spike/ACE2 interaction, in order to highlight which ones may alter the binding affinity and therefore etiological and clinical properties of different SARS-CoV-2 variants on different patients. Following that, we determined which Spike and ACE2 AA variations relevant for this interaction have been observed in the SARS-CoV-2 and human population, respectively.

Structural analysis of spike/ACE2 interaction

We obtained structural models of the SARS-CoV-2 Spike interacting with the human ACE2 from three recent X-ray structures, deposited on the Protein Data Bank: 6LZG (Wang, Zhang et al., 2020), 6M0J (Lan et al., 2020) and 6VW1 (Shang et al., 2020). For 6VW1, two Spike/ACE2 complexes were available, so we report results for both as 6VW1-A and 6WV1-B, separately. All models show the core domains of interaction, located in the region of AA 330–530 for Spike and in the region AA 15–615 of ACE2. Full length proteins would be 1273 AAs (Spike only known isoform, from reference SARS-CoV-2 genome NC_045512.2) and 805 AAs (ACE2 isoform 1, UniProt id Q9BYF1-1). Selected PDB entries are wild type and their primary sequence and the higher order structures were identical. Residues 517–519 were missed in 6VW1-B. With the aim to investigate the conformation variability, PDB complexes were aligned by backbone and the Root Mean Square deviation (RMSd) was computed on all equivalent not hydrogen atoms. RMSd data have shown some conformation flexibility that confirmed our idea to take into account all PDB structures in the next investigation (Figure 1).
Figure 1.

Conformational comparison of Spike-ACE2 PDB complexes: (A) alignment of PDB entries, Spike and ACE2 are respectively surrounded by cyan and orange fog, and (B) bar graph showing RMSd (in Å) computed on protein atoms.

Conformational comparison of Spike-ACE2 PDB complexes: (A) alignment of PDB entries, Spike and ACE2 are respectively surrounded by cyan and orange fog, and (B) bar graph showing RMSd (in Å) computed on protein atoms. The GBPM method was originally developed for identifying and scoring pharmacophore and protein–protein interaction key features by combining GRID molecular interaction fields (MIFs) according to the GRAB tool algorithm (Ortuso et al., 2006). In the present study, GBPM has been applied to all selected complex models considering Spike and ACE2 either as host or guest. DRY, N1 and O GRID probes were considered for describing hydrophobic, hydrogen bond donor and hydrogen bond acceptor interaction. For each probe a cutoff, required for highlighting the most relevant MIFs points, was fixed above the 30% from the corresponding global minimum interaction energy value. With respect to the known GBPM application, where pharmacophore features are used for virtual screening purposes, here these data guided us in the complex stabilizing AAs identification. In fact, Spike or ACE-2 residues, within 3 Å from GBPM points, were marked as relevant in the host–guest recognition and were qualitatively scored by assigning them the corresponding GBPM energy. If a certain residue was suggested by more than one GBPM point, its score was computed as summa of the related GBPM points energy (Figure 2).
Figure 2.

Summary of the pipeline adopted by GBPM to identify key residues contributing to the SARS-CoV-2 Spike/Human ACE2 interface. Spike is depicted in cyan, and ACE2 in orange, based on the 6LZG PDB model (Wang et al., 2020). Residues highlighted by GBPM are then tested for mutation frequency in the worldwide SARS-CoV-2 population.

Summary of the pipeline adopted by GBPM to identify key residues contributing to the SARS-CoV-2 Spike/Human ACE2 interface. Spike is depicted in cyan, and ACE2 in orange, based on the 6LZG PDB model (Wang et al., 2020). Residues highlighted by GBPM are then tested for mutation frequency in the worldwide SARS-CoV-2 population. Finally, for each selected residue, the four models averaged score was considered for estimating the role in complex stabilization. Taking into account their average scores, Spike and ACE2 AAs were divided by quartiles to facilitate the interpretation of the results: quartile 1 (Q1) includes the strongest complex stabilization contributors; quartile 2 (Q2) contains residues less important than those reported in Q1 but most relevant of those included in quartile 3 (Q3); quartile 4 (Q4) indicates the weakest predicted interacting AAs. Such an extension of the original approach allowed us to highlight known relevant interaction residues of both Spike (Table 1) and ACE-2 (Table 2).
Table 1.

GBPM scores, average values, and quartile distribution of Spike relevant AAs in three PDB models.

Residue#PDB entries
GBPM
6LZG6M0J6VW1-A6VW1-BAveragescoreQuartile
LYS417–43.58–12.120.000.00–13.93Q2
ASN4390.000.00–12.30–34.94–11.81Q2
GLY446–22.52–5.750.00–10.32–9.65Q3
GLY447–5.630.000.000.00–1.41Q3
TYR449–25.72–6.38–20.37–24.76–19.31Q1
TYR4530.000.00–1.77–1.76–0.88Q4
LEU455–11.59–16.82–21.78–7.04–14.31Q2
PHE456–34.20–30.16–39.72–20.76–31.21Q1
ALA475–52.35–49.72–38.73–77.00–54.45Q1
GLY476–21.720.00–17.16–34.59–18.37Q2
SER477–22.320.00–11.44–40.68–18.61Q2
GLU484–8.52–13.230.000.00–5.44Q3
PHE486–28.99–53.63–32.56–53.43–42.15Q1
ASN487–31.67–59.57–33.98–52.21–44.36Q1
TYR489–62.10–27.67–45.92–69.38–51.27Q1
PHE490–4.58–4.48–22.90–40.32–18.07Q2
GLN493–37.20–56.08–79.60–70.51–60.85Q1
GLY496–15.54–8.74–18.72–16.80–14.95Q2
PHE497–8.860.00–4.68–29.10–10.66Q3
GLN498–77.24–80.38–42.340.00–49.99Q1
PRO4990.000.000.00–11.64–2.91Q3
THR5000.00–66.00–92.90–122.50–70.35Q1
ASN501–60.14–61.04–61.82–70.59–63.40Q1
GLY502–24.84–35.42–39.45–40.92–35.16Q1
VAL5030.00–5.37–5.45–5.54–4.09Q3
TYR505–30.60–23.22–20.90–40.62–28.84Q1

GBPM scores and average values are reported in kcal/mol.

Table 2.

GBPM scores, average values and quartile distribution of ACE2 relevant AAs in three PDB models. GBPM scores and average values are reported in kcal/mol.

Residue#PDB entries
GBPM
6LZG6M0J6VW1-A6VW1-BAveragescoreQuartile
SER19–31.45–26.08–53.61–79.33–47.62Q1
GLN24–31.15–23.62–34.15–85.23–43.54Q1
THR27–16.93–32.58–38.70–16.65–26.22Q2
PHE28–20.68–25.02–14.10–27.48–21.82Q2
ASP300.00–17.010.000.00–4.25Q3
LYS31–84.06–43.67–32.98–46.60–51.83Q1
HIS340.00–30.42–27.78–67.56–31.44Q2
GLU35–11.730.000.00–19.40–7.78Q2
GLU37–11.58–20.36–11.83–20.52–16.07Q2
ASP38–41.09–40.52–25.75–34.16–35.38Q2
TYR41–52.50–75.07–62.35–76.07–66.50Q1
GLN42–36.78–37.15–28.53−63.49–41.49Q2
LEU45–12.80–16.430.00–16.20–11.36Q2
LEU790.000.000.00–5.99–1.50Q3
MET820.000.00–6.36–6.00–3.09Q3
TYR83–40.50–66.29–57.86–60.81–56.37Q1
GLU3290.000.000.00–17.25–4.31Q3
ASN330–11.84–5.92–11.82–6.04–8.91Q2
GLY352–1.97–8.36–8.86–14.66–8.46Q2
LYS353–79.38–70.11–120.73–46.03–79.06Q1
GLY354–21.87–31.15–12.74–15.25–20.25Q2
ASP355–68.95–81.24–57.99–89.12–74.33Q1
ARG3570.00–4.990.000.00–1.25Q3
ALA3860.000.00–4.850.00–1.21Q4
ARG3930.000.00–4.850.00–1.21Q4
GBPM scores, average values, and quartile distribution of Spike relevant AAs in three PDB models. GBPM scores and average values are reported in kcal/mol. GBPM scores, average values and quartile distribution of ACE2 relevant AAs in three PDB models. GBPM scores and average values are reported in kcal/mol. Basically, the same number of AAs was highlighted for Spike (26 AAs) and ACE2 (25 AAs). The average score was also in the same range. Spike reported a population of Q1 larger than ACE2: 12 and 7 AAs, respectively. The opposite scenario was observed in the Q2 that accounted for 7 residues for Spike and 11 for ACE2. No remarkable difference can be addressed to the Q3 and Q4 Spike–ACE2 comparison. We reasoned that mutations and variants in Q1 residues could have a more relevant impact in the complex stability. The analysis of all designed GBPM suggested the Spike–ACE2 molecular recognition is largely sustained by polar interactions, such as hydrogen bonds, and by very few putative hydrophobic contributions (Table 3).
Table 3.

Composition of the GBPM models designed.

GBPMfeature6LZG
6M0J
6VW1-A
6VW1-B
Host/Guest
#AIE#AIE#AIE#AIE
Hydrophobic4–2.074–1.825–2.053–2.12Spike/ACE2
HBD18–6.4815–6.4717–6.2219–6.31
HBA4–6.6113–5.2512–5.4714–5.48
Hydrophobic1–1.493–1.162–1.491–1.76ACE2/Spike
HBD18–6.2618–6.3224–5.6328–5.94
HBA7–4.8410–4.539–4.9812–4.60

HBD = Hydrogen Bond Donor; HBA = Hysdrogen Bond Acceptor; # = number of features; AIE = Average Interaction Energy (in kcal/mol).

Composition of the GBPM models designed. HBD = Hydrogen Bond Donor; HBA = Hysdrogen Bond Acceptor; # = number of features; AIE = Average Interaction Energy (in kcal/mol).

Mutational analysis of SARS-CoV-2 spike

We analyzed 295,507 publicly available SARS-CoV-2 full-length genome sequences collected worldwide and deposited on the GISAID database on December 30, 2020 (Shu & McCauley, 2017). From these, we obtained 257,434 samples containing at least one AA-changing mutation in the Spike protein. A total of 3314 different AA-changing mutations were detected in the 1279 AA-long Spike sequence. However, many of these are unique events (or possibly even sequencing errors), as only 2023 mutations were found in more than one sample, 788 were found in more than 10 samples, and 196 in more than 100 samples (Supplementary File 1). We then focused on mutations located in the Spike RBD (AA 330–530) with predicted interaction contribution, as assessed by our GBPM method. The majority of mutations here are found in only a handful of samples (Table 4 and Figure 4(A)), with a few notable exceptions. The mutations S477N and N439K are the most frequent in the current population and were identified in 16,547 patients (5.60%) and 5587 patients (1.89%) respectively. These two variants (N439K and S477N) are also amongst the top 20 most frequent in the population and involve two positions productively contributing to the interaction between Spike and ACE2, according to GBPM (see Table 1 and Figure 3 for locations 439 and 477).
Table 4.

Spike mutations located within the RBD (AA 330–530) with at least two cases in the population and non-zero GBPM average score in the ACE2/Spike interaction models.

    GBPM 
MutationPositionAbundanceFrequencyAverage scoreQuartile
S477N47716,5470.055995–18.61Q2
N439K43955870.018906–11.81Q2
N501Y50149210.016653–63.3975Q1
Y453F4539170.003103–0.8825Q4
E484K4843520.001191–5.4375Q3
K417N4172600.00088–13.925Q2
S477I4771570.000531–18.61Q2
G446V446580.000196–9.6475Q3
F490S490530.000179–18.07Q2
S477R477490.000166–18.61Q2
N501T501470.000159–63.3975Q1
L455F455440.000149–14.3075Q2
G476S476430.000146–18.3675Q2
E484Q484430.000146–5.4375Q3
A475V475350.000118–54.45Q1
F486L486340.000115–42.1525Q1
F490L490186.09E-05–18.07Q2
YQ505WK505144.74E-05–28.835Q1
Q493L493124.06E-05–60.8475Q1
V503F50393.05E-05–4.09Q3
E484A48482.71E-05–5.4375Q3
G446S44672.37E-05–9.6475Q3
E484D48441.35E-05–5.4375Q3
Q493*49341.35E-05–60.8475Q1
Y505W50541.35E-05–28.835Q1
G476A47631.02E-05–18.3675Q2
S477G47731.02E-05–18.61Q2
F456L45626.77E-06–31.21Q1
V503I50326.77E-06–4.09Q3
Y449F44926.77E-06–19.3075Q1

The asterisk (*) indicates a stop codon. A lower GBPM score indicates a stronger effect in the ACE2/Spike interaction.

Figure 4.

(A) Occurrence of AA-changing variants on SARS-CoV-2 Spike protein. X-axis indicates the position of the affected AA. Y-axis indicates the log10 of the number of occurrences of the variant in the SARS-CoV-2 dataset. Labels indicate variants affecting ACE2/Spike binding and detected in at least five SARS-CoV-2 sequences. Vertical dashed lines indicate crystalized region analyzed (aa 330 – 530). The D614G variant, located outside the RBD, is also indicated. (B) Scatter plot indicating the occurrence of the variant in the population (x-axis) and the GBPM score of the reference AA in the model (y-axis). Mutations with non-zero GBPM score are indicated. CC indicates the Pearson correlation coefficient and p indicates the p-value of the CC.

Figure 3.

3 D ribbon representation of the interaction domains of SARS-CoV-2 Spike (left, orange) and human ACE2 (right, green), based on the crystal structure 6LZG deposited on Protein Data Bank and produced by Wang et al. (2020). The positions of the three most frequent Spike mutations in the interacting region (AA 350-550) with a non-zero GBPM score are indicated: N439K, N501Y and S477N.

3 D ribbon representation of the interaction domains of SARS-CoV-2 Spike (left, orange) and human ACE2 (right, green), based on the crystal structure 6LZG deposited on Protein Data Bank and produced by Wang et al. (2020). The positions of the three most frequent Spike mutations in the interacting region (AA 350-550) with a non-zero GBPM score are indicated: N439K, N501Y and S477N. Spike mutations located within the RBD (AA 330–530) with at least two cases in the population and non-zero GBPM average score in the ACE2/Spike interaction models. The asterisk (*) indicates a stop codon. A lower GBPM score indicates a stronger effect in the ACE2/Spike interaction. The graphical inspection of the PDB structures revealed that Spike Asparagine (N) 439, raked at GBPM Q2, is mainly involved in intra-protein interaction. In fact, by means of its backbone sp2 oxygen atom, N439 accepts one hydrogen bond from Spike Serine 443 side chain and, by its side chain amide group, donates one hydrogen bond to the Spike Proline 499 backbone: all these AAs are located into a random coil loop of Spike so the N439K could minimally modify the Spike-ACE2 recognition. On the other hand, after the theoretical mutation of the Asparagine 439 with a Lysine, it is possible to predict a productive electrostatic interaction between the new net positively charged residue and the ACE2 Glutamate 329. Such a long-distance interaction could improve the stabilization of the complex with respect to the Spike wild type (Supporting information Figure S1). A similar effect could be addressed to the mutation at position 477. Serine (S) 477 is a weak contributor to the complex interaction. In all PDB entries we selected, Serine 477 is located into a solvent exposed random coil loop. No interaction with ACE2 or Spike residues can be observed. Actually, the GBPM analysis included such a residue in Q2. Conversely, its mutation to Asparagine (S477N), in our in silico model, revealed the possibility to establish hydrogen bond to the ACE2 Serine 19 that can clearly result in a stabilization of the complex (Supporting information Figure S2). Moreover, position 477 is also affected by three other events with lower occurrence: S477I, S477R and S477G, with 6, 2 and 2 observations (Table 4). Among all, the S447R could be the most interesting one. Actually, a net positively charged residue, such as Arginine (R), can establish a weak electrostatic interaction to ACE2 Glutamate 87, as suggested by a theoretical model we built. The S477I and S477G could modify the conformation of a random coil segment, so it does not appear very relevant. Conversely, S477N and S477G could productively contribute to the Spike ACE2 complex stabilization. Of course, deeper theoretical and experimental investigations should be carried out to confirm this hypothesis. Unfortunately, full-scale simulations cannot be rigorously performed today because the available 3 D structural models report only fragments of the complex between Spike and ACE2. The third most common mutation, N501Y (Figure 3), targets an AA predicted to have a strong role in the interaction in all four models, sitting in the GBPM Q1. N501Y was detected in 4921 patients (1.67% of the dataset): the majority of which were located in the United Kingdom (Shu & McCauley, 2017). From a structural point of view, we predict that a substitution, at position 501, of an Asparagine (N) with a Tyrosine (Y) may have an effect: their Total Polar Surface Area (TPSA), equal to 101.29 and to 78.43 Å2 respectively, is different, however both their side chains can donate/accept a hydrogen bond. Therefore, their contribution to complex stabilization may be slightly different, also taking into account the chemical environment. In fact, the wild type Asparagine 501 donates one hydrogen bond to ACE2 Tyrosine 41: such an interaction could be possible also for N501Y mutant or, as we observed in our theoretical model, it could be replaced by pi–pi stacking (Supporting information Figure S3). The rapid increase in frequency of mutation N501Y has been recently observed in the United Kingdom and other countries, as it is one of the variants characterizing lineage B1.1.7 (Preliminary genomic characterisation of an emergent SARS-CoV-2 lineage in the UK defined by a novel set of spike mutations - SARS-CoV-2 coronavirus/nCoV-2019 Genomic Epidemiology - Virological, 2021). The Asparagine/Tyrosine substitution in Spike position 501 could contribute to determine an evolutionary advantage for this lineage, based on differential affinity for the human receptor ACE2 (Fratev, 2020; Leung et al., 2020). A less frequent mutation amongst those predicted to contribute to the ACE2/Spike interaction is G476S, detected in 43 samples (0.02%), and supported by three out of four structural models (Table 1, Figure 4(B)). The Glycine (G) 476 was included by GBPM analysis in Q2: its contribution to the complex stabilization is weak. Conversely to the other mutation described here, the replacement of Glycine 476 with a Serine (S) could have more evident effects on Spike ACE2 molecular recognition. In fact, in all PDB entries, the alpha carbon of this Glycine is very close, about 4 Å, to the side chain amide group of the ACE2 Glutamine 24. Between these two AAs no productive interaction can be established but the substitution of the Spike Glycine with a Serine could allow one inter-protein hydrogen bond to ACE2 Glutamine 24. Moreover, G476S could establish the same interaction with Spike Glutamine 478 that could stabilize the conformation of a random coil segment of the viral protein resulting in a better pre-organization to the ACE2 recognition (Supporting information Figure S4). (A) Occurrence of AA-changing variants on SARS-CoV-2 Spike protein. X-axis indicates the position of the affected AA. Y-axis indicates the log10 of the number of occurrences of the variant in the SARS-CoV-2 dataset. Labels indicate variants affecting ACE2/Spike binding and detected in at least five SARS-CoV-2 sequences. Vertical dashed lines indicate crystalized region analyzed (aa 330 – 530). The D614G variant, located outside the RBD, is also indicated. (B) Scatter plot indicating the occurrence of the variant in the population (x-axis) and the GBPM score of the reference AA in the model (y-axis). Mutations with non-zero GBPM score are indicated. CC indicates the Pearson correlation coefficient and p indicates the p-value of the CC. Another Spike residue, predicted by our analysis for playing a relevant role in ACE2 recognition, is the Glutamine 493 (Table 1). The GISAID data revealed that such an aminoacid is rarely replaced by a Leucine (Q493L) or by an Arginine (Q493R). These mutations could affect the recognition of ACE2 in an opposite way. Spike Glutamine 493 is involved in hydrogen bond with ACE2 Glutamate 35. The mutation Q493L cannot establish such a productive contribution and could only hydrophobically interact to Spike Leucine 455. Conversely, Q493R could locate its net positively charged side chain into an ACE2 pocket delimited by Aspartate 30, Histidine 34 and Glutamate 35. Such a positioning could produce a remarkable electrostatic stabilization of the complex (Supporting information Figure S5). Frequency of mutations on ACE2. X-axis indicates the AA position in isoform 1 (UniProt id Q9BYF1-1). Y-axis indicates the allele frequency in the global population according to the GNOMAD v3 database. Labels indicate AA changes observed in the human population with non-zero GBPM average score in the ACE2/Spike interaction models. Vertical dashed lines indicate the crystalized region analyzed in this study (aa 15 – 615). In general, we could observe that AAs with the strongest evidence for interaction contribution in the Spike/ACE2 interface tend not to diverge from the reference (Figure 4(B)), which may indicate a solid evolutionary constraint to maintain the interface residues unchanged. For example, one of the most relevant 1st quartile AA in the ACE2/Spike interaction, Glutamine (Q) 493, is rarely mutated, with 12 cases of Q493L, 4 of Q493* (the substitution of Q493 with a stop codon), three of Q493K, and one of Q493R and Q493H. One possible exception is the aforementioned Spike mutation N501Y, located in the strongest 1st quartile GBPM-predicted AA for ACE2 binding, which was found in the considerable number of 4921 different patients.

Mutational analysis of human ACE2

We also investigated the variants of human ACE2, since these could constitute the basis for patient-specific COVID-19 susceptibility and severity. ACE2 protein sequence is highly conserved across vertebrates (Guzzi et al., 2020) and also within the human species (Cao et al., 2020), with the most frequent missense mutation (rs41303171, N720D) present in 1.5% of the world population (Supplementary File 2). Our analysis shows that only five variants of ACE2 detected in the human population are also located in the ACE2/Spike direct binding interface (Table 5 and Figure 5). Of these, rs73635825 (causing a S19P AA variant) is both the most frequent in the population (0.06%) and the most relevant in the interaction with the viral protein, with a GBPM score of −47.6175 (Q1) and support from all four models (Table 2). The rs73635825 SNP frequency is higher in the population of African descent (0.2%). The second SNP, rs143936283 (E329G, Table 5) is a very rare allele (0.0066%) in the European (non-Finnish) Asian population. The rs766996587 (M82I) SNP is also a very rare allele (0.0066%) found in the African population. E37K (rs146676783) is more frequent in the Finnish (0.03%) and G352V (rs370610075) in the European non-Finnish (0. 007%) population. None of these five SNPs have a reported clinical significance, according to dbSNP and literature search (Sherry et al., 2001).
Table 5.

ACE2 variants with non-zero GBPM score in the Spike interaction model.

VariantrsIDAllele frequencyGBPM
Average scoreQuartile
S19Prs736358250.000655–47.62Q1
E329Grs1439362836.63E-05–4.31Q3
M82Irs7669965876.62E-05–3.09Q3
E37Krs1466767835.68E-05–16.07Q2
G352Vrs3706100753.8E-05–8.46Q2
Figure 5.

Frequency of mutations on ACE2. X-axis indicates the AA position in isoform 1 (UniProt id Q9BYF1-1). Y-axis indicates the allele frequency in the global population according to the GNOMAD v3 database. Labels indicate AA changes observed in the human population with non-zero GBPM average score in the ACE2/Spike interaction models. Vertical dashed lines indicate the crystalized region analyzed in this study (aa 15 – 615).

ACE2 variants with non-zero GBPM score in the Spike interaction model. It must be mentioned that M82I, together with S19P, has been predicted to adversely affect ACE2 stability (Hussain, 2020). M82I, together with E329G, has been simulated to increase binding affinity with Spike when compared to wild type ACE2, hypothesizing greater susceptibility to SARS-CoV-2 for patients carrying these variants (Wang, Xu et al., 2020). Instead, E37K (Wang, Xu et al., 2020) and G352V (Darbani, 2020) were predicted to possess a lower affinity with Spike, suggesting lower susceptibility to the infection. However, while describing potential explanations to the existence of a possible predisposing genetic background to infection, all these studies remain inconclusive in linking allele variants to COVID-19 susceptibility. Structurally, the S19P variant may greatly differ from the reference sequence in the interaction with ACE2: Serine (S) is a polar residue, able to accept and donate, by means of its side chain alcoholic group, a hydrogen bond. Proline (P), on the other hand, cannot be involved in hydrogen bonding, and therefore should establish a weaker interaction with Spike. In fact, ACE2 Serine 19 side chain donates a hydrogen bond to Spike Alanine 475 backbone (Supporting information Figure S6) and potentially could establish the same interaction with Spike Glycine (G) 476, which could also be mutated (Table 4). Both Methionine (M) 82 and Glutamate (E) 329 are in Q3 minimally contributing to Spike ACE2 recognition (Supporting information Figures S7 and S8). They are located within two alpha helices so their mutation could modify the secondary structure of ACE2 corresponding to a different affinity against Spike. Such a possibility should be more evident in the case of E329G because Glutamate 329 side chain is involved in hydrogen bond with ACE-2 Glutamine 325.

Discussion

SARS-CoV-2 Spike evolved through a series of adaptive mutations that increased its affinity for the human ACE2 receptor (Ortega et al., 2020). There is no reason to believe that the evolution and adaptation of the virus will stop, making continuous sequencing and mutational tracking studies of paramount importance to strategically contain COVID-19 (Meredith et al., 2020). In our study, we highlighted which specific locations of Spike can influence the ACE2 molecular recognition, required for the viral entry into the host cell (Hoffmann et al., 2020). We further showed that some mutations are already present in the SARS-CoV-2 population that may weakly affect the interaction with the human receptor, specifically Spike N439K, S477N and N501Y. These mutations are rising in the viral population (>1%) and in particular N501Y is one of the key mutations characterizing lineage B.1.1.7 (Leung et al., 2020), which has seen a recent dramatic increase in frequency in the United Kingdom (Preliminary genomic characterisation of an emergent SARS-CoV-2 lineage in the UK defined by a novel set of spike mutations - SARS-CoV-2 coronavirus/nCoV-2019 Genomic Epidemiology - Virological , XXXX). Having identified this mutation proves that our combination of targeted mutation frequency and GBPM is a useful pipeline to monitor events in the key region used by SARS-CoV-2 to recognize and enter human bronchial cells. The same approach can be used to monitor, in the future, if any of these events will increase in frequency, suggesting an adaptation to the human host leveraging a higher affinity with ACE2. On the other hand, we studied the variants in the human ACE2 population, identifying five loci that can affect the binding with SARS-CoV-2 Spike. They are all rare variants, with the most frequent, S19P, present in 0.06% of the population, and with no known clinical significance. However, other in silico studies have predicted their role in decreasing ACE2 stability (S19P and M82I) (Hussain, 2020), and in altering the affinity with Spike (increasing it: M82I and E329G (Wang, Xu et al., 2020); decreasing it: E37K (Wang, Xu et al., 2020) and G352V (Darbani, 2020)). The most common ACE2 variant, rs41303171 (N720D), is not located in the binding region, and so far its predicted effects on the etiopathology of COVID-19 are still largely conjectural and associated to neurological complications via mechanisms probably independent from direct interaction with Spike (Strafella et al., 2020). It remains to be seen whether, in the future, the combination of Spike and ACE2 sequences will produce novel and unexpected COVID-19 specificities, that will require granular efforts in developing wider-spectrum anti-SARS-CoV-2 strategies, such as vaccines or antiviral drugs. So far, our analysis has shown a location on the Spike/ACE2 complex where both proteins vary in the viral/human population, specifically on ACE2 S19 and Spike A475/G476. While, as described in our Results, these mutations on Spike are not likely to strongly affect the interaction surface, future combinations of ACE2/Spike variants may have peculiar effects that will require constant mutation monitoring. Identifying single or multiple AAs involved in this viral entry interaction will allow for personalized diagnosis and clinical prediction based on the specific combination of SARS-CoV-2 strain and ACE2 variant. Personalized COVID-19 treatment will require targeted sequencing of the patient ACE2 and Spike, to identify the combination causing the specific case. This technical obstacle can be further complicated by the intra-host genetic variability of SARS-CoV-2, which has recently been reported from RNA-Sequencing studies (Shen et al., 2020). Structural investigation will benefit, in the next future, from the availability of experimental structural models reporting the complete sequence of both Spike and ACE2, or at least Spike. This will allow more rigorous computational analyses (i.e. molecular dynamics simulation, free energy perturbation) on the effect of mutations on the Spike/ACE2 recognition. Beyond the complex investigated in this manuscript, our approach can be fully extended to any other partners in the SARS-CoV-2/human interactome, for example the recently discovered interaction between viral protease NSP5 (Gordon et al., 2020) and human histone deacetylase HDAC2 (Milazzo et al., 2020), which is indirectly responsible for the transcriptional activation of pro-inflammatory genes. Our approach can also be extended to other viruses exploiting human receptors as an entry mechanism, such as CD4 for the Human Immunodeficiency Virus (HIV) or TIM-1 for the Ebola virus (Grove & Marsh, 2011).

Materials and methods

Structural analysis

The PDB (Berman et al., 2000) was searched for high-resolution Spike/ACE2 complexes. PDB entries 6LZG (Wang, Zhang et al., 2020), 6M0J (Lan et al., 2020) and 6VW1 (Shang et al., 2020), reporting the Spike RBD interacting to ACE2, have been retrieved and taken into account for our GBPM analysis (Ortuso et al., 2006). Such a computational approach compares GRID (Goodford, 1985) molecular interaction fields (MIFs) computed on a generic complex (A) and on its host (B) and guest (C) components, separately. Actually, MIFs describe the interaction between a certain probe and a certain target. If the target is represented by a complex, depending on the selected area, the MIF energies can be referred to the interaction between the probe and one of the complex subunits or, at the host/guest interface, with both of them. The GBPM analysis, objectively, highlights these last. Five steps are required: (1) the complex A is disassembled in its subunits B and C; (2) MIFs are computed on A, B and C by using the most appropriate GRID probes. A hydrogen bond acceptor/donor and a generic hydrophobic probe can describe the basic interaction. Because GRID MIFs are stored as a 3 D matrix of interaction energy points (IEP), the same box dimensions are adopted in all calculations; (3) each IEP of B is compared with respect to the equivalent point of A generating a new MIFs named D. The following algorithm, available into the GRAB tool, is applied: if IEP(A) > 0 and IEP(B) > 0 then IEP(D) = 0; if IEP(A) > 0 and IEP(B) < 0 then IEP(D) = IEP(B); if IEP(A) < 0 and IEP(B) > 0 then IEP(D) = -IEP(A); if IEP(A) < 0 and IEP(B) < 0 then IEP(D) = IEP(A)-IEP(B). The resulting MIF D reports as negative energy values the productive interaction between the GRID probe and B and the interface A and B; (4) in order to obscure the interaction between the probe and B, MIFs D and C are compared, by using the GRAB approach, producing to a new MIF E; (5) the most relevant interaction points (GBPM features) of the MIF E are, finally, selected taking into account an energy cutoff 15% above the global minimum. Supplementary figures focusing on the most relevant mutation are available in Supplementary File 3. Before starting the GBPM analysis, co-crystalized water molecules were removed from PDB structures. In 6VW1, showing two Spike-ACE2 complexes, namely chains A-E and B-F, both structures have been investigated and further reported as model A and B, respectively. All selected complexes have been conformationally compared with each other by alignment and computing the RMSd on the cartesian coordinates of equivalent non hydrogen atoms. DRY, N1 and O original GRID probes have been used to highlight hydrophobic, hydrogen bond donors and acceptors areas. In order to identify the most relevant residues of both Spike and ACE2, we conceptually and technically extended the GBPM algorithm, originally designed for drug/target interactions (Ortuso et al., 2006). In the GBPM analysis presented here, the two interacting proteins have been considered either as host and guest units, and relevant AAs were selected if their distance from GBPM features was lower or equal to 3 Å. For each PDB model, the selected residues were scored as summa of the corresponding GBPM features interaction energy. In order to prevent unrealistic distortion of the Spike-ACE2 complex, due to the usage of structures not covering the full length of the interacting proteins, the mutations effect has been qualitatively estimated by means of the mutagenesis tool implemented in PyMol software (PyMOL, 2017). Wild type residues have been replaced by the mutation and the new side chain conformations have been optimized taking into account the neighboring AAs. The graphical analysis was carried out onto the predicted most populated rotamers. On the basis of its better X-ray resolution, the 6M0J PDB structure has been selected for the above reported investigation.

Genetical analysis

SARS-CoV-2 genome sequences from human hosts and accounting for a total of 145,201 submissions were obtained from the GISAID database on 15 October 2020 (Shu & McCauley, 2017). Low quality (with more than 5% uncharacterized nucleotides) and incomplete (<29,000 nucleotides, based on a total reference length of 29,903) sequences were removed. The resulting 135,591 genome sequences were aligned on the reference SARS-CoV-2 Wuhan genome (NCBI entry NC_045512.2) using the NUCMER algorithm (Marçais et al., 2018). Position-specific nucleotide differences were merged for neighboring events and converted into protein mutations using the coronapp annotator (Mercatelli, Triboli et al., 2020). The results were further filtered for AA-changing mutations targeting the Spike protein. ACE2 variants in the human population were extracted from the gnomAD database, v3, 18 July 2020 (Karczewski, et al., 2020). We considered only missense variants affecting specific AAs in the protein sequence, for a total of 155 entries (Supplementary File 2). Graph generation was performed with the R statistical software and the corto package v1.1.2 (Mercatelli, Lopez-Garcia et al., 2020).
  9 in total

Review 1.  Structural and functional insights into the spike protein mutations of emerging SARS-CoV-2 variants.

Authors:  Deepali Gupta; Priyanka Sharma; Mandeep Singh; Mukesh Kumar; A S Ethayathulla; Punit Kaur
Journal:  Cell Mol Life Sci       Date:  2021-11-03       Impact factor: 9.261

2.  Analyzing host-viral interactome of SARS-CoV-2 for identifying vulnerable host proteins during COVID-19 pathogenesis.

Authors:  Jayanta Kumar Das; Swarup Roy; Pietro Hiram Guzzi
Journal:  Infect Genet Evol       Date:  2021-05-15       Impact factor: 3.342

3.  Preliminary report on severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) Spike mutation T478K.

Authors:  Simone Di Giacomo; Daniele Mercatelli; Amir Rakhimov; Federico M Giorgi
Journal:  J Med Virol       Date:  2021-05-15       Impact factor: 20.693

Review 4.  Neutralising antibody escape of SARS-CoV-2 spike protein: Risk assessment for antibody-based Covid-19 therapeutics and vaccines.

Authors:  Daniele Focosi; Fabrizio Maggi
Journal:  Rev Med Virol       Date:  2021-03-16       Impact factor: 11.043

5.  Detection of SARS-CoV-2 variant 501Y.V2 in Comoros Islands in January 2021.

Authors:  Charles N Agoti; George Githinji; Khadija S Mohammed; Arnold W Lambisia; Zaydah R de Laurent; Maureen W Mburu; Edidah M Ong'era; John M Morobe; Edward Otieno; Hamza Abdou Azali; Kamal Said Abdallah; Abdoulaye Diarra; Ali Ahmed Yahaya; Peter Borus; Nicksy Gumede Moeletsi; Dratibi Fred Athanasius; Benjamin Tsofa; Philip Bejon; D James Nokes; Lynette Isabella Ochola-Oyier
Journal:  Wellcome Open Res       Date:  2021-07-28

Review 6.  Molecular variants of SARS-CoV-2: antigenic properties and current vaccine efficacy.

Authors:  Amirmasoud Rayati Damavandi; Razieh Dowran; Sarah Al Sharif; Fatah Kashanchi; Reza Jafari
Journal:  Med Microbiol Immunol       Date:  2022-03-02       Impact factor: 4.148

7.  Investigation of nonsynonymous mutations in the spike protein of SARS-CoV-2 and its interaction with the ACE2 receptor by molecular docking and MM/GBSA approach.

Authors:  Reem Y Aljindan; Abeer M Al-Subaie; Ahoud I Al-Ohali; Thirumal Kumar D; George Priya Doss C; Balu Kamaraj
Journal:  Comput Biol Med       Date:  2021-07-16       Impact factor: 4.589

Review 8.  The Role of Immunogenetics in COVID-19.

Authors:  Fanny Pojero; Giuseppina Candore; Calogero Caruso; Danilo Di Bona; David A Groneberg; Mattia E Ligotti; Giulia Accardi; Anna Aiello
Journal:  Int J Mol Sci       Date:  2021-03-05       Impact factor: 5.923

9.  Socio-demographic Heterogeneity in Prevalence of SARS-COV-2 Infection and Death Rate: Relevance to Black College Student Knowledge of COVID-19 and SARS-COV-2.

Authors:  Jahangir Emrani; Elia Nichelle Hefner
Journal:  J Racial Ethn Health Disparities       Date:  2022-02-04
  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.