Literature DB >> 33679194

Molecular adaptive evolution of SARS-COV-2 spike protein in Saudi Arabia.

Islam Nour1, Ibrahim O Alenazi2, Atif Hanif1, Saleh Eifan1.   

Abstract

The sequences of SARS-CoV-2 spike (S) from Saudi Arabia along with SARS-CoV and bat SARS-like CoVs were obtained. Positive selection analysis and secondary structure investigation of spike sequences were performed. Adaptive molecular evolution was observed in SARS-CoV-2 displayed by positive selection pressure at N-terminal domain (NTD; codons 41, 163, 174 and 218), Receptor binding domain (RBD; codons 378 and 404) and S1/S2 Cleavage site (codon 690). Furthermore, the spike protein secondary structure depicted by the homo-trimer structure showed a high similarity between Saudi SARS-CoV-2 isolate and the parental strain (bat SL-COVZC45). Despite the high similarity depicted in the spike sequence model alignment, it displayed a significant difference when each chain was treated solely owing to 7 motif differences in the three composing chains. In addition, SARS-CoV-2 S trimer model uncovered the presence of N-acetyl glucosamine ligands. Eventually, 3C-like proteinase cleavage site was observed in S2 domain could be used as a site for drug discovery. Genetics and molecular evolutionary facts are useful for assessment of evolution, host adaptation and epidemic patterns ultimately helpful for adaptation of control strategies.
© 2021 Published by Elsevier B.V. on behalf of King Saud University.

Entities:  

Keywords:  Cleavage site; SARS-CoV-2; Saudi Arabia; Secondary structure; Selection pressure; Spike

Year:  2021        PMID: 33679194      PMCID: PMC7923870          DOI: 10.1016/j.sjbs.2021.02.077

Source DB:  PubMed          Journal:  Saudi J Biol Sci        ISSN: 2213-7106            Impact factor:   4.219


Introduction

Novel Coronavirus is termed as Severe Acute Respiratory Syndrome (SARS-CoV-2) originated from Hubei, china in December 2019 and a vast spread was observed throughout the world (Wang et al., 2020). This infectious disease is named as COVID-19 and human to human transmission has been established (Chanet al., 2020). The disease symptoms depicted in SARS-CoV-2 infections were found similar to the infections caused by SARS coronavirus (SARS-CoV) in 2003 (Peiriset al., 2004). Coronaviruses are enveloped, non-segmented, positive sense single stranded RNA viruses with genome size of 26 kb to 32 kb, responsible for respiratory diseases in various animals as well as human beings. Human coronaviruses like SARS-CoV, MERS-CoV and SARS-CoV-2 are zoonotic pathogens (Chenet al., 2020). Previous sequence analyses showed high percentage of similarity among SARS-CoV-2, SARS-CoV and Bat corona viruses (Luet al., 2020). In addition, coronaviruses are found in a wide range of animal hosts like bats, camels, cats, mice and dogs (Tennantet al., 1993). Once the COVID-19 causing agent was identified in January 2020, SARS-CoV-2 nucleotides sequence alignments were performed to clarify the origin and the incident virus evolution as well as the prediction of any probable intermediate host. Coronaviruses contain mainly four types of structural and nearly 16 types of non-structural proteins. Spike protein is a principal structural protein that mediates recognition and attachment of coronaviruses to the host cell receptor termed as angiotensin-converting enzyme 2 (Li et al., 2020, Donnelly et al., 2004). SARS-CoV-2 genome analysis depicted a similarity index of 79.5% with SARS-CoV and 96% resemblance with bat coronavirus (Chen et al., 2020). Sequence alignments of coronaviruses provide information with respect to the genetic characteristics of different viruses and sequence dependent data can be employed for accurate diagnostic purposes of etiological agents and adaptation of effective control measures. Super spreading of COVID-19 has been reported globally and high incidence rates are recorded in multiple regions of the world (World Health Organization, 2020). The growing number of infections over the times may result in emergence of several different types of variants due to mutations and recombination. So, genome sequence tracking and characterization is important to analyze different variants. Emergence of SARS-CoV-2 as global pandemic not only affected the human health but also drastically affected the global economy. Furthermore, analysis of mutations occurring in the S sequence representing the most frequently variable region of SARS-CoV-2 sequences will help us to understand their high inter-human transmissibility and evolution patterns of corona viruses. The data generated will be helpful for development of effective strategies to deal with existing and future epidemics.

Methods

Sequences gathering

The GISAID Epiflu Database comprises a COVID-19 related page (https://www.epicov.org/), where COVID-19 genome sequences are accessible. The current research was intended to compare Saudi Arabia SARS-CoV-2 spike sequences to that of the previously occurring SARS-CoV and bat-like SARS CoV. Thus, only three submitted sequences of Saudi Arabia SARS-CoV-2 were used. In addition, 7 bat SARS-COV sequences collected from 2011 to 2017 and two human SARS-CoV sequences were added from NCBI GenBank. Accession number, location and collection dates are shown in table 1.
Table 1

List of genome sequences used in phylogenetic analysis.

AccessionNumberSequence nameDataSourceStrain locationDate ofcollection
EPI_ISL_416432hCoV-19/Saudi Arabia/KAIMRC- Alghoribi/2020GISAIDRiyadh/Saudi Arabia3/7/2020
EPI_ISL_416521hCoV-19/SaudiArabia/SCDC-3321/2020GISAIDRiyadh/SaudiArabia3/10/2020
EPI_ISL_416522hCoV-19/SaudiArabia/SCDC-3324/2020GISAIDRiyadh/SaudiArabia3/10/2020
AVP78031.1bat-SL-CoVZC45NCBIZhoushancity/Zhejiang province/China2/2017
AID16716.1bat-SL-CoV_Longquan-140NCBIGuizhouprovince/China2012
ATO98218.1bat-SL-CoV_Rs7327NCBIYunnanProvince/China10/24/2014
ATO98145.1bat-SL-CoV_Rf4092NCBIYunnanProvince/China9/18/2012
ATO98108.1bat-SL-CoV_As6526NCBIYunnanProvince/China5/12/2014
AGZ48806.1bat-SL-CoV_RsSHC014NCBIYunnanProvince/China4/17/2011
AKZ19087.1bat-SL-CoV_YNLF_34CNCBIChina5/23/2013
AY278487.3Hu-SARS-CoV_BJ02NCBIChina6/5/2003
AAP51227.1Hu-SARS-CoV_GD01NCBIChina6/5/2003
List of genome sequences used in phylogenetic analysis.

Single likelihood ancestor counting (SLAC)

To investigate this adaptive evolution, SLAC method was used to ensure the selection pressures acting on the COVID-19 S gene. Global non-synonymous to synonymous (dN/dS) rate ratios were estimated with the SLAC method (Kosakovsky Pond and Frost, 2005) in Datamonkey (http://www.datamonkey.org). Briefly, the alignment sequences were uploaded to a full automated workflow-dependent Datamonkey that recognized codons and lineages under selection even when recombinant sequences existed. Duplicated sequences were removed prior to analysis execution (Spielman et al., 2019).

Secondary structure investigation of spike protein

To comprehend the COVID-19 structure, comprising any possible deviation from a previously emerging related SARS-CoV as well as the detection of distinctive residues orientation that could be involved in target binding, the COVID-19 spike glycoprotein homo-trimer structure (target) was modeled via SWISS-MODEL (https://swissmodel.expasy.org/) (Waterhouse et al., 2018) against the structure of bat-SL-COVZC45 (template). Moreover, the obtained model for both was used for alignment and comparison using CLC Main Workbench V20.0 (QIAGEN).

Investigation of 3C-like proteinase cleavage sites

Coronaviruses have 3C or 3C-like proteases (3Cpro or 3CLpro, respectively), that comprise a standard chymotrypsin-like fold and a catalytic triad with a Cys residue as a nucleophile (12). These 3Cpro or 3CLpro, which contain conserved principle sites, may act as attractive targets for the design of essential antivirals for multiple viruses in the supercluster. Thus, 3C-like proteinase cleavage sites were investigated using a server (https://services.healthtech.dtu.dk/) that includes NetCorona service. NetCorona 1.0 predicts coronavirus 3C-like proteinase (or protease) cleavage sites using artificial neural networks on amino acid sequences (Kiemer et al., 2004).

Results

To investigate this apparent transition further, the single likelihood ancestor counting (SLAC) method was implemented to compare the selection pressures acting on the SARS-C0V-2 Spike sequence. It was observed that selection pressure was positive (dN/dS = 3.3981), occurring at 7 codons with p value threshold of 0.1 at codons 41, 163, 174, 218, 378, 404 and 690 and all sites were parsimony informative where minimum of two dissimilar nucleotides existed at least twice (Table 2). Four of them were at the N-terminal domain (NTD) of S1 subunit causing non-conservative missense mutation, involving F41C (p = 0.041), N163Y (p = 0.022), F174I (p = 0.032) and H218P (p = 0.083) owing to substitution mutation that mostly occurred by transversion (T > G, A > T and A > C at sites 41, 163 and 218, respectively), however a single transition mutation occurred at 174 (T > C). Moreover, two positive selection occurred at the receptor-binding domain (RBD) of S1 subunit leading to a single non-conservative missense mutation (C378L, p = 0.092) and a silent mutation (L404L, p = 0.054). However, the last positive selection pressure occurred at the S1/S2 cleavage site resulting in a conservative missense mutation (S690C, p = 0.098) owing to a AGC > GCC transition substitution at the first nucleotide (Fig. 1).
Table 2

Sites showing signatures of positive selection in S as determined by dN/dS analysis.

CodonPartitionSNdSdNSelection detected?
4110.0008.0000.0003.970Pos. p = 0.041
16310.00011.0000.0006.022Pos. p = 0.022
17410.0009.0000.0004.428Pos. p = 0.032
21810.0007.0000.0003.355Pos. p = 0.083
37810.0006.0000.0002.976Pos. p = 0.092
40410.00011.0000.0005.224Pos. p = 0.054
69010.0006.0000.0002.948Pos. p = 0.098
Fig. 1

Amino acid sequence alignment of Saudi Arabia SARS-CoV-2 isolates, bat SARS-like CoV isolates and SARS-CoV isolates. The red arrows represent the sites of positive selection. The blue arrows display the S1/S2 cleavage site.

Sites showing signatures of positive selection in S as determined by dN/dS analysis. Amino acid sequence alignment of Saudi Arabia SARS-CoV-2 isolates, bat SARS-like CoV isolates and SARS-CoV isolates. The red arrows represent the sites of positive selection. The blue arrows display the S1/S2 cleavage site. To better understand the structure of COVID-19 spike, including the deviation from a previously emerging related SARS-like CoV that was observed to be a probable major parental strain as obtained from RDP analysis, hCoV-19/Saudi Arabia/KAIMRC-Alghoribi (used as the Saudi hCoV- 19 representative) and Bat-SL-COVZC45 (the major parent) spike sequences were translated and used for examining the best templates based on target-template alignment features. Consequently, the templates with the highest quality have then been selected for model building to construct the COVID-19 spike glycoprotein homo-trimer structure (Fig. 2A) as well as that of Bat-SL-COVZC45 (Fig. 2B). The model was found to consist of 3 Chains (A, B and C). Full model alignment (Fig. 2.C) have revealed a high similarity, however when the 3 chains were aligned and compared solely, it displayed a significant difference owing to two motif differences (amino acids 151–277 and 451–536) in Chain A (Fig. 2.D), two motif differences (amino acids 202–227, 361–448) in chain B (Fig. 2.E) and three motif differences (amino acids 148–170, 243–264 and 441–480) in chain C (Fig. 2F). Moreover, SWISS-model has displayed the presence of N-acetyl glucosamine ligands in chain A (Fig. 3A), Chain B (Fig. 3B) and for interaction with Chain B (Fig. 3C) as well as Chain C (Fig. 3D) of hCoV-19/Saudi Arabia/KAIMRC-Alghoribi (Fig. 3).
Fig. 2

Spike glycoprotein homo-trimer structure modeling. SWISS model was used to get the best template and the fitting model (GQME = 0.75, QMEAN = -2.07, 0.94 coverage, 95% identity for hCoV-19/SA/KAIMRC-Alghoribi and GQME = 0.78, QMEAN = -3.87, 0.94 coverage and 76.73% identity for bat-ST-COVZC45) followed by importing the model to QIAGEN CLC Main Workbench V20.0 to obtain the final homo-trimer of (A) hCoV-19/SA/KAIMRC-Alghoribi and (B) bat-ST-COVZC45 and (C) models alignment. Three chains were observed for each of hCoV- 19 and bat-SL-CoV, thus each chain was aligned on basis of their amino acid sequence and their 3D structure, involving (D) Chain A, (E) Chain B and (F) Chain C.

Fig. 3

N-acetyl glucosamine ligands linked to (A) Chain A, (B) Chain B, and interacting with (C) Chain B and (D) Chain C, resolved by using SWISS model.

Spike glycoprotein homo-trimer structure modeling. SWISS model was used to get the best template and the fitting model (GQME = 0.75, QMEAN = -2.07, 0.94 coverage, 95% identity for hCoV-19/SA/KAIMRC-Alghoribi and GQME = 0.78, QMEAN = -3.87, 0.94 coverage and 76.73% identity for bat-ST-COVZC45) followed by importing the model to QIAGEN CLC Main Workbench V20.0 to obtain the final homo-trimer of (A) hCoV-19/SA/KAIMRC-Alghoribi and (B) bat-ST-COVZC45 and (C) models alignment. Three chains were observed for each of hCoV- 19 and bat-SL-CoV, thus each chain was aligned on basis of their amino acid sequence and their 3D structure, involving (D) Chain A, (E) Chain B and (F) Chain C. N-acetyl glucosamine ligands linked to (A) Chain A, (B) Chain B, and interacting with (C) Chain B and (D) Chain C, resolved by using SWISS model. Moreover, 3C-like proteinase cleavage site (TGRLQ^SLQTY) was recognized at amino acids 992–1002 in the spike glycoprotein of SARS-CoV-2 (Fig. 4) that is needed for cell–cell fusion and consequently could be used as a target site for blocking to cease viral infection.
Fig. 4

Predicted 3C-like proteinase cleavage site in SARS-CoV-2 Spike glycoprotein.

Predicted 3C-like proteinase cleavage site in SARS-CoV-2 Spike glycoprotein.

Discussion

Our knowledge is limited about SARS-COV-2 regarding basic and intermediate host species, evolution and genetic variation in relation to other corona viruses like MERS-COV and SARS-COV. Genetic map analysis of these corona viruses can help us to understand the evolutionary process and may be helpful to understand and control the ongoing epidemic of SARS-COV-2. The virus is spreading globally and with increased number of infections, virus evolutionary rate should be considered with significant priority. Further studies are needed from more closely related animal and human viruses and especially the divergence in host tropism. The S protein mediates both receptor binding and membrane fusion (Li, 2016) and is crucial for defining host tropism and transmissibility patterns (Luet al., 2015). Therefore, the current study was focused on inspection of S sequences rather than the whole genome. Positively selected sites were found to be located in NTD and RBD of SARS-CoV-2 (Tagliamonte et al., 2020). The current study observed positive selection at NTD, RBD and S1/S2 cleavage site of Saudi SARS-CoV-2 isolates which approves the previous study. These positive selection residues and most especially in RBD that was observed in our study to result in non-conservative missense mutation could lead to higher receptor affinity and consequently higher infectivity and broader transmission pattern as shown in previous literature (Islam et al., 2020, Kaushal et al., 2020, Wrapp et al., 2020). For instance, the U.K SARS-CoV-2 variant B.1.1.7 was depicted to have novel 17 mutations including a notable mutation in the RBD resulting in being 70–80% more transmissible (Kupferschmidt, 2021, Leung et al., 2020). Moreover, substitution mutation was observed in S1/S2 cleavage site that was recognized as a conserved region elsewhere (Zhanget al., 2020). It has a key function in defining virus infectivity and host range owing to significant cleavage by furin and other proteases (Andersenet al., 2020). Follis, et al observed that furin cleavage site insertion at the S1/S2 junction boosts cell–cell fusion without influencing virus entry (Folliset al., 2006). Furthermore, effectual MERS-CoV S protein cleavage empowers bat MERS-like coronaviruses to infect humans (Menacheryet al., 2020). Likely, low-pathogenicity avian influenza viruses are converted into highly pathogenic forms upon attainment of polybasic cleavage sites in hemagglutinin, by insertion or recombination (Alexander and Brown, 2009). In addition, these above described positively selected sites were parsimony informative which may have affected the resultant protein. Ghosh and Chakraborty detected 28 parsimony-informative sites out of 119 variable regions in 82 SARS-CoV2 genomes resulting in 79 variable sites occurred in the yielded proteins (Ghosh and Chakraborty, 2020). Moreover, the spike protein secondary structure depicted by the homo-trimer structure showed a high similarity between Saudi SARS-CoV-2 isolate and the major parental strain (bat SL- COVZC45). However, on the individual chain sub-level, there was an observed difference due to 7 motif differences in the 3 chains composing the homo-trimer structure. Notably, the homo- trimer structure of SARS-CoV-2 displays an open conformation of S1 domains, which allows better interaction with target host proteins, which is in line with a recent study (Vankadari and Wilce, 2020). Since this homo-trimer glycoprotein is surface-exposed and consequently facilitates viral entry, it is considered as a crucial target for the neutralizing antibodies (Abs) upon infection. Therefore, S trimers are intensively decorated with N-linked glycans that is essential for carrying out appropriate folding (Wallset al., 2020) and for moderating accessibility to host proteases needed for entry and cell–cell fusion and neutralizing Abs (Walls et al., 2019, Xiong et al., 2018, Yang et al., 2015). This further approves the present study findings that showed the presence of N-acetyl glucosamine ligands in the three chains composing SARS-CoV-2 S trimer. On the other hand, the current study determined a 3C-like proteinase cleavage site in S2 domain of spike glycoprotein which approves another study elsewhere (Vankadari and Wilce, 2020). Moreover, this observed 3C-like proteinase cleavage site can be used as a potential drug discovery site (Grum-Tokars et al., 2008) in the same manner as presently being suggested for SARS-CoV-2 (Rothan and Teoh, 2021). In this regard, boceprevir and calpain inhibitor XII were recently reported as 3CLpro selective inhibitors (Liuet al., 2021). Conclusively, analysis of three Saudi hCoV-19 and seven representative Bat SARS-like CoVs, and 2 human SARS-CoV describes about the virus origin, selection pressures and receptor binding properties as well as possible target for drug discovery. Nonetheless, efficient strategies based on continuous screening of SARS-CoV-2 mutations and associated structural influence analysis could assist in earlier vaccines efficacy re-evaluation and development if needed (Mehmood et al., 2021, Plante et al., 2020). Genetics and molecular evolutionary facts are considered as useful tools to understand the epidemic pattern and transmission dynamics which are ultimately useful to adopt preventive measures.

CRediT authorship contribution statement

Islam Nour: Conceptualization, Formal analysis, Investigation, Methodology, Validation, Writing - original draft, Writing - review & editing. Ibrahim O. Alanazi: Conceptualization, Formal analysis, Investigation, Methodology, Validation, Writing - review & editing. Atif Hanif: Conceptualization, Funding acquisition, Writing - original draft, Writing - review & editing. Saleh Eifan: Conceptualization, Data curation, Supervision, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Isolate ID.Country.Collection.date.Isolate name.Originating.Laboratory.Submitting.Laboratory.Authors.

EPI_ISL_416521.Saudi Arabia.2020-03-10.hCoV-19/Saudi Arabia/SCDC-3321/2020.Public Health Laboratory.Public Health Laboratory, Saudi CDC.Albarrag, A.
EPI_ISL_416522.Saudi Arabia.2020-03-10.hCoV-19/Saudi Arabia/SCDC-3324/2020.Public Health Laboratory Saudi CDC.Public Health Laboratory Saudi CDC.Albarrag, A.
EPI_ISL_416432.Saudi Arabia.2020-03-07.hCoV-19/Saudi.Arabia/KAIMRC- Alghoribi/2020.Clinical Microbiology Lab.Infectious Disease Research Department, King Abdullah International Medical Research Center (KAIMRC)Majed Alghoribi, Sadeem Alhayli, Abdulrahman Alswaji, Liliane Okdah, Sameera Al Johani, Michel Doumith.
  34 in total

1.  History of highly pathogenic avian influenza.

Authors:  D J Alexander; I H Brown
Journal:  Rev Sci Tech       Date:  2009-04       Impact factor: 1.181

2.  Evolution of Viral Genomes: Interplay Between Selection, Recombination, and Other Forces.

Authors:  Stephanie J Spielman; Steven Weaver; Stephen D Shank; Brittany Rife Magalis; Michael Li; Sergei L Kosakovsky Pond
Journal:  Methods Mol Biol       Date:  2019

3.  Fast-spreading U.K. virus variant raises alarms.

Authors:  Kai Kupferschmidt
Journal:  Science       Date:  2021-01-01       Impact factor: 47.728

4.  Glycan Shield and Fusion Activation of a Deltacoronavirus Spike Glycoprotein Fine-Tuned for Enteric Infections.

Authors:  Xiaoli Xiong; M Alejandra Tortorici; Joost Snijder; Craig Yoshioka; Alexandra C Walls; Wentao Li; Andrew T McGuire; Félix A Rey; Berend-Jan Bosch; David Veesler
Journal:  J Virol       Date:  2018-01-30       Impact factor: 5.103

5.  Evaluating the 3C-like protease activity of SARS-Coronavirus: recommendations for standardized assays for drug discovery.

Authors:  Valerie Grum-Tokars; Kiira Ratia; Adrian Begaye; Susan C Baker; Andrew D Mesecar
Journal:  Virus Res       Date:  2007-03-29       Impact factor: 3.303

6.  SWISS-MODEL: homology modelling of protein structures and complexes.

Authors:  Andrew Waterhouse; Martino Bertoni; Stefan Bienert; Gabriel Studer; Gerardo Tauriello; Rafal Gumienny; Florian T Heer; Tjaart A P de Beer; Christine Rempfer; Lorenza Bordoli; Rosalba Lepore; Torsten Schwede
Journal:  Nucleic Acids Res       Date:  2018-07-02       Impact factor: 16.971

7.  Unexpected Receptor Functional Mimicry Elucidates Activation of Coronavirus Fusion.

Authors:  Alexandra C Walls; Xiaoli Xiong; Young-Jun Park; M Alejandra Tortorici; Joost Snijder; Joel Quispe; Elisabetta Cameroni; Robin Gopal; Mian Dai; Antonio Lanzavecchia; Maria Zambon; Félix A Rey; Davide Corti; David Veesler
Journal:  Cell       Date:  2019-01-31       Impact factor: 66.850

Review 8.  SARS-CoV-2: An Update on Genomics, Risk Assessment, Potential Therapeutics and Vaccine Development.

Authors:  Iqra Mehmood; Munazza Ijaz; Sajjad Ahmad; Temoor Ahmed; Amna Bari; Asma Abro; Khaled S Allemailem; Ahmad Almatroudi; Muhammad Tahir Ul Qamar
Journal:  Int J Environ Res Public Health       Date:  2021-02-08       Impact factor: 3.390

9.  Probable Pangolin Origin of SARS-CoV-2 Associated with the COVID-19 Outbreak.

Authors:  Tao Zhang; Qunfu Wu; Zhigang Zhang
Journal:  Curr Biol       Date:  2020-03-19       Impact factor: 10.834

10.  Analysis of angiotensin-converting enzyme 2 (ACE2) from different species sheds some light on cross-species receptor usage of a novel coronavirus 2019-nCoV.

Authors:  Rui Li; Songlin Qiao; Gaiping Zhang
Journal:  J Infect       Date:  2020-02-21       Impact factor: 38.637

View more
  1 in total

1.  Novel insights of waterborne human rotavirus A in Riyadh (Saudi Arabia) involving G2 predominance and emergence of a thermotolerant sequence.

Authors:  Islam Nour; Atif Hanif; Ibrahim O Alanazi; Ibrahim Al-Ashkar; Abdulkarim Alhetheel; Saleh Eifan
Journal:  Sci Rep       Date:  2021-06-09       Impact factor: 4.379

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.