Mandeep Kaur1, Akanksha Sharma2, Santosh Kumar3, Gurpal Singh4, Ravi P Barnwal5. 1. Department of Biophysics, Panjab University, Chandigarh 160014, India. 2. Department of Biophysics, Panjab University, Chandigarh 160014, India; UIPS, Panjab University, Chandigarh 160014, India. 3. Department of Biotechnology, Panjab University, Chandigarh 160014, India. 4. UIPS, Panjab University, Chandigarh 160014, India. 5. Department of Biophysics, Panjab University, Chandigarh 160014, India. Electronic address: barnwal@pu.ac.in.
Abstract
Globally, SARS-CoV-2 has emerged as threat to life and economy. Researchers are trying to find a cure against this pathogen but without much success. Several attempts have been made to understand the atomic level details of SARS-CoV-2 in the past few months. However, one review with all structural details for drug and vaccine development has been missing. Hence, this review aims to summarize key functional roles played by various domains of SARS-CoV-2 genome during its entry into the host, replication, repression of host immune response and overall viral life cycle. Additionally, various proteins of SARS-CoV-2 for finding a potent inhibitor have also been highlighted. To mitigate this deadly virus, an understanding of atomic level information, pathogenicity mechanisms and functions of different proteins in causing the infection is imperative. Thus, these structural details would finally pave the way for development of a potential drug/vaccine against the disease caused by SARS-CoV-2.
Globally, SARS-CoV-2 has emerged as threat to life and economy. Researchers are trying to find a cure against this pathogen but without much success. Several attempts have been made to understand the atomic level details of SARS-CoV-2 in the past few months. However, one review with all structural details for drug and vaccine development has been missing. Hence, this review aims to summarize key functional roles played by various domains of SARS-CoV-2 genome during its entry into the host, replication, repression of host immune response and overall viral life cycle. Additionally, various proteins of SARS-CoV-2 for finding a potent inhibitor have also been highlighted. To mitigate this deadly virus, an understanding of atomic level information, pathogenicity mechanisms and functions of different proteins in causing the infection is imperative. Thus, these structural details would finally pave the way for development of a potential drug/vaccine against the disease caused by SARS-CoV-2.
The Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) caused a highly infectious viral disease referred to as coronavirus disease (COVID-19). Its first outbreak in Wuhan city of Hubei province of China in 2019 has been responsible for an unprecedented pandemic globally [1]. After spreading rapidly in the China, this deadly virus has now gripped whole world. Despite best efforts made by the science community to discover a potential cure against this disease, no specific vaccine and drug are accessible either to control the spread or for the treatment of COVID-19 till date.Coronaviruses (CoVs) are single stranded positive sense RNA viruses belonging to the Order Nidovirales, family Coronaviridae, sub-family Coronavirinae. Further these can be classified into four genera- Alphacoronavirus, Betacoronavirus, Gammacoronavirus and Deltacoronavirus [2]. The viruses belonging to the family Coronaviridae are the largest of the RNA viruses and are known to target the respiratory tract leading to mild infections in humans. These are also known to cause illness ranging from common cold to more severe diseases such as Severe Acute Respiratory Syndrome (SARS) and Middle East Respiratory Syndrome (MERS). Both SARS-CoV and MERS-CoV have been classified under the genus Betacoronavirus. Usually these viruses cause mild respiratory diseases but the past episodes of SARS-CoV and MERS-CoV have proved that members of this family could be potentially fatal and have resulted in endemics [3].For the last two decades, coronaviruses have been a constant threat for humans; for instance, in November 2002, the first case of Severe Acquired Respiratory Syndrome Coronavirus (SARS-CoV) infection in humans was reported in Foshan, China, later plaguing more than 30 countries worldwide and causing around 774 deaths out of a total 8096 reported cases. Palm civets and racoon dogs were identified as hosts for SARS-CoV [4]. In the year 2012, a severe respiratory infection caused by Middle East Respiratory Syndrome (MERS-CoV) was first reported in a patient from Jeddah, Saudi Arabia. The total number of cases reported by WHO globally including 27 countries were 2279, out of which 806 people lost their lives [5]. The mode of transmission of MERS-CoV as confirmed by RNA sequencing was from dromedary camel to humans. These variants are zoonotic in origin and have genome similarity with bat coronavirus and SARS-CoV-2.SARS-CoV-2 has been categorized under Betacoronavirus along with SARS-CoV and MERS-CoV. It has a genome size of 30 kb. Currently, the ongoing SARS-CoV-2 outbreak is assumed to have originated from the Huanan seafood market which sells seafood, marmots, birds, bats and other wild animals in Wuhan, China [2,6]. The virus has been studied to be of zoonotic origin and preliminary evidence suggest the possible mode of transmission from bats to humans, even though the Malayan pangolins (Manis javanica) also in light, are considered to serve as a reservoir of the virus [7].Human to human transmission occurs from an infected person to a healthy person via droplets that may be released on coughing, sneezing in the open and talking without covering the mouth, and simply by being in close contact [8]. Several modes of transmission of this virus in humans are reported as shown in Fig. 1
[9,10]. Droplet inhalation is one of the most common ways through which the virus enters the body and starts the pathogenic mechanism of infection as it reaches the respiratory tract followed by its attack on the epithelial cells. After entry, the virus replicates and causes infection, leading to various local inflammatory changes that damage the lung tissues as well as essential immune cells such as T cells and macrophages [11]. The clinical symptoms reported are fever, breathing difficulties (dyspnoea), headache, muscular soreness, dry cough and pneumonia. Other atypical symptoms are diarrhea and vomiting.
Fig. 1
Modes of transmission of SARS-CoV-2 in humans through various routes.
Modes of transmission of SARS-CoV-2 in humans through various routes.In addition, COVID-19 infection is also exaggerated by release of pro-inflammatory cytokines [such as Interferons (IFN)-γ, Interleukin (IL)-1B, Monocyte Chemoattractant Protein (MCP1)] in an event called ‘cytokine storm’, indicating hyperactive response and excessive inflammatory reaction by host immune system. Recently, several studies suggested that cytokine storm is directly responsible for lung damage and multi-organ failure in COVID-19 patients [12].This pandemic is growing rapidly throughout the world leading to high mortality. According to the WHO reports, the top 5 highly infected countries include the United States of America, Brazil, India, Russian federation and Peru. Along with this, in more than hundred countries, areas and territories on the list, every day new cases of COVID-19 patients are being confirmed. The world map highlighting the top 20 countries which are reported to have the highest number of COVID-19 cases as per WHO is shown in Fig. 2
.
Fig. 2
The map depicts the statistics reported for number of cases among the top 20 countries across the globe severely affected by COVID-19 till date. The race for developing a cure intensifies as cases continue to soar each day. The data as on Sep 1, 2020 has been obtained from WHO website (https://covid19.who.int/table) and the map has been created using Mapchart.net. The top 5 highly affected countries include the United States of America, Brazil, India, Russian federation and Peru which are highlighted using shapes like diagonal stripes, patch, texture, vertical stripes and crosshatch respectively.
The map depicts the statistics reported for number of cases among the top 20 countries across the globe severely affected by COVID-19 till date. The race for developing a cure intensifies as cases continue to soar each day. The data as on Sep 1, 2020 has been obtained from WHO website (https://covid19.who.int/table) and the map has been created using Mapchart.net. The top 5 highly affected countries include the United States of America, Brazil, India, Russian federation and Peru which are highlighted using shapes like diagonal stripes, patch, texture, vertical stripes and crosshatch respectively.A recent research proposes that the full length genomic sequence of human SARS-CoV-2 shares 79.6% identity with SARS-CoV whereas 96% identity with the whole-genome sequence of a bat coronavirus [13]. Another report suggests 89.1% nucleotide identity of SARS-CoV-2 with SARS-like coronaviruses (genus Betacoronavirus, subgenus Sarbecovirus) on the basis of phylogenetic analysis [14]. These findings of correlation of SARS-CoV-2 with SARS-CoV based on the full-length genome using phylogenetic analysis, the putatively similar cell entry mechanism and human cell receptor usage could be helpful in designing drugs and pre-clinical vaccines to treat this disease [15,16]. The structural and functional relatedness of SARS-CoV-2 with SARS-CoV is well established now [17]. Moreover, it has been found that the novel SARS-CoV-2 uses Angiotensin converting enzyme 2 (ACE2) to target the host, just like SARS-CoV [13].In this direction, we have performed phylogenetic analysis using NCBI BLAST pairwise alignment to confirm the relatedness of SARS-CoV-2 with other coronaviruses as shown in Fig. 3
. BLASTn was performed to compute pairwise alignment between SARS-CoV-2 full genome sequence isolated from a human patient from Wuhan, China (NCBI Reference Sequence: NC_045512.2), and 15 sequences from different CoVs like SARS coronavirus, bat SARS-like coronavirus, pangolin coronavirus, duck coronavirus and SARS coronavirus civet. Five of these full genome sequences were isolated from human SARS-CoV-2 patients belonging to some of the topmost countries affected by this virus. The statistical method used was fast minimum evolution with a maximum sequence difference of 0.75.
Fig. 3
The tree was constructed using 16 sequences from different coronaviruses to infer the evolutionary history of SARS-CoV-2 by means of fast minimum evolution method having maximum sequence difference of 0.75. NCBI BLAST pairwise sequence alignment was used to generate the tree. Human isolates from different countries have nearly complete identity to the genome isolated from Wuhan in China (in yellow highlight), followed by the Pangolin coronavirus isolate MP789 (MT121216.1) with 90.11% identity. All the 15 sequences were downloaded from NCBI and were used to generate a phylogenetic tree to study the evolutionary relationship.
The tree was constructed using 16 sequences from different coronaviruses to infer the evolutionary history of SARS-CoV-2 by means of fast minimum evolution method having maximum sequence difference of 0.75. NCBI BLAST pairwise sequence alignment was used to generate the tree. Human isolates from different countries have nearly complete identity to the genome isolated from Wuhan in China (in yellow highlight), followed by the Pangolin coronavirus isolate MP789 (MT121216.1) with 90.11% identity. All the 15 sequences were downloaded from NCBI and were used to generate a phylogenetic tree to study the evolutionary relationship.The SARS-CoV-2 genomes from USA (MT893659.1), India (MT434758.1), Russia (MT890462.1), South Africa (MT324062.1) and Brazil (MT126808.1) showed high sequence identity, between 99.5 and 100% to genomic sequence of a patient from Wuhan. Out of these, the sequence isolated from Brazil exhibits the highest genome identity (99.99%), followed by South Africa (99.98%), Russia (99.97%), USA (99.96%) and India (99.95%). The phylogenetic tree clearly reflected that the closest relative of SARS-CoV-2 is a pangolin. The Pangolin coronavirus isolates MP789 (MT121216.1) shares 90.11% whereas isolate PCoV_GX-P1E (MT040334.1) shares around 85.95% identity with SARS-CoV-2. The SARS coronavirus HSR 1 (AY323977.2), SARS coronavirus Tor2 (NC_004718.3), SARS coronavirus TWK (AP006559.1) and SARS coronavirus BJ162 (AY864805.1) showed ~82.30% identity to each other. Other related CoVs according to our analysis are SARS coronavirus civet007 (AY572034.1/82.27%), Bat SARS-like coronavirus (MG772933.1/89.12%, MG772934.1/88.65%). Besides, the Duck coronavirus (NC_048214.1/73.71%) being an outgroup is just 73.71% identical to SARS-CoV-2.This review aims to facilitate the process of finding a potential vaccine against COVID-19 by providing information about domain organization and the overall structure of SARS-CoV-2 along with its various proteins and their essential roles in causing viral infection. Further, we outline the protein domains which can prove to be good target for designing drugs or vaccines to save thousands of life under risk globally.
The domain organization of SARS-CoV-2
With a genome size of about 30 kb, the SARS-CoV-2 genes from 5′-end to 3′-end consists of various domains that includes open reading frames (ORFs): ORF1a/b, ORF 2 (referred to as Spike protein), ORF3a, ORF3b, ORF4 (also known as Envelope protein), ORF5 (known as Membrane protein), ORF6, ORF7a, ORF7b, ORF8, ORF9 (referred to as Nucleocapsid protein), and ORF10 [18,19]. A complete genome organization from 5′-end to 3′-end is depicted in Fig. 4
[19]. These ORFs are further categorized into structural proteins, non-structural proteins (nsps) and accessory proteins as shown in Fig. 4. The ORF1a/b is cleaved into 16 nsps by viral proteases- Chymotrypsin like protease (3CLpro), also called main protease (Mpro) and papain like protease (PLpro) [17]. After the fusion of virus, the viral genetic material enters the cytoplasm and mRNA then synthesizes two polyproteins- pp1a and pp1ab. These proteins code for these non-structural proteins [20]. The structural proteins consist of Spike (S) protein, Envelope (E) protein, Membrane (M) protein and Nucleocapsid (N) protein whereas accessory proteins consist of ORF 3a, 3b, 6, 7a, 7b, 8a, 8b, 9b and 10. In addition, various functions of these protein factors of SARS-CoV-2 emphasizing their importance for the viral life cycle are described in Fig. 5
.
Fig. 4
Complete domain organization of SARS-CoV-2 genome. This represents the genomic arrangement of various proteins of the SARS-CoV-2 from 5′-end to 3′-end.
Fig. 5
Categorization of SARS-CoV-2 proteins into three types- Open Reading Frames (ORFs), structural proteins and accessory proteins. Different proteins factors under these three categories are described along with their respective functions in the viral genome.
Complete domain organization of SARS-CoV-2 genome. This represents the genomic arrangement of various proteins of the SARS-CoV-2 from 5′-end to 3′-end.Categorization of SARS-CoV-2 proteins into three types- Open Reading Frames (ORFs), structural proteins and accessory proteins. Different proteins factors under these three categories are described along with their respective functions in the viral genome.
ORF1a/b
The virus's ORFs code for various non-structural proteins (nsps). The first open reading frame (ORF 1A/B) occupies two-third of the total size of the viral genome and encodes two translational products that are polyproteins 1a and 1ab (pp1a and pp1ab). Here, the main protease (Mpro), also called 3CLpro and papain-like protease (PLpro) are cysteine proteases which are reported to be involved in processing pp1a and pp1ab into mature non-structural proteins [13,21]. There are 16 nsps that play a key role in viral RNA synthesis via forming the replication-transcription complexes as described in the following section (Supplementary Fig. 1) [22].The putative nsp1 of the SARS-CoV-2 is known as leader protein and participates in suppressing host gene expression as well as promotes degradation of host mRNA [23]. Moreover, nsp1 facilitates viral replication and has the ability to evade the host immune response [23]. This nsp is among the first protein to be expressed after viral infection. The cryo-EM structure of SARS-CoV-2 nsp1 and rabbit 40S ribosome complex (PDB code 7JQB) confirmed its function in effective translation of viral protein and provided insights into the mechanism via which it blocks the translation of host mRNA [23]. In SARS-CoV nsp1 was also proven to inhibit host gene expression via targeting 40S ribosome, which is crucial for host protein synthesis and alters the host mRNAs to promote their degradation [24]. The SARS-CoV-2 nsp2 has been reported to bind with two proteins of the host namely prohibitin 1 and prohibitin 2. These two host proteins show involvement in cell migration, progression of cell cycle, apoptosis, cellular differentiation and mitochondrial biogenesis. It is anticipated that the direct contact of nsp2 with proteins prohibitin 1 and prohibitin 2 has some role in disturbing the environment of host cell [25]. The SARS-CoV nsp2 does not involve in the process of viral replication [25]. Further, its examination has accentuated it to be unnecessary for viral replication [26].Nsp3, a viral cysteine protease also referred as papain-like protease (PLpro), has a molecular mass of approximately 200 kDa. This protein possesses various conserved domains such as protease (papain like protease), transmembrane and ssRNA binding domain etc. The presence of a tetrapeptide motif LXGG has been discovered between the proteins nsp1-2, nsp2-3 and nsp3-4, where X is N, L and L, respectively. This motif is recognized by PLpro which further hydrolyses the G peptide bond present at the carboxyl site, resulting in the release of nsp1, nsp2, nsp3 and nsp4 (Supplementary Fig. 1) [27]. Furthermore, these four nsps have been reported to be necessary for the replication of virus [28,29]. Notably, the high-resolution structural analysis of nsp3 macro1 viral domain (mac1) in SARS-CoV-2 (PDB code 6WEY) revealed its binding with ADP-ribose [30]. Here, mac1 domain takes part in removing ADP-ribose from different target proteins; this ability is expected to be associated with cytokine storm syndrome observed in severe cases of COVID-19 [30]. Interestingly, the involvement of SARS-CoV nsp3 has been predicted to hinder the chemokine and cytokine production which results in evading the host immune response by the virus. This makes nsp3 an excellent target for designing antiviral therapeutics against COVID-19 [31]. Recently, two potent inhibitors referred as VIR250 and VIR251 have been developed with high selectivity and are shown to successfully inhibit PLpro of SARS-CoV-2. The structures of PLpro with VIR250 (PDB code 6WUU) and VIR251 (PDB code 6WX4) underlined the inhibitory mechanism that would be helpful in facilitating future antiviral peptides development against nsp3 [31].The nsp5, also known as 3C-like proteinase or chymotrypsin-like or main protease (Mpro), cleaves the nsp polyprotein at 11 sites leading to the formation of other intermediates or mature non-structural proteins [18,32]. It has been characterized as the best target for drug discovery because of its essential enzymatic ability to process the polypeptides translated from the CoV RNA [33]. In humans, there are no shreds of confirmation regarding presence of proteases with similar cleavage specificity like SARS-CoV-2 main protease. The inhibition of Mpro enzyme activity results in blocking virus replication [33]. Additionally, SARS-CoV-2 Mpro in apo-form (PDB code 7JPY) has been visualized using X-ray crystallography [34]. Further, crystallographic analysis of this SARS-CoV-2 Mpro bound to seven inhibitors accentuated two potent inhibitors i.e., MP15 (PDB code 7JQ2) and MP18 (PDB code 7JQ5) which effectively prevented the cytopathogenic effect of SARS-CoV-2 in Vero E6 cells [34]. The nsp6 is a multiple-spanning transmembrane protein that plays a role in generating autophagosomes and formation of double membrane vesicles. Besides, nsp6 forms a complex with nsp3 and nsp4 [35]. The yeast two hybrid analysis showed the interaction of nsp6 with nsp3 through N-terminal region [36]. The nsp7 has been reported to form a heterodimer with nsp8, which further makes a complex with nsp12 resulting in activating its RNA polymerase activity [37,38].The non-structural protein 9 (nsp9) of SARS-CoV-2 is involved in viral replication and reproduction of the viral genome. This protein has its counterpart present in the SARS-CoV family. The crystal structure of apo-Nsp9 from SARS-CoV-2 (PDB code 6WXD) has been described in effort to understand its function during viral life cycle [39]. The SARS-CoV-2 Nsp9 shares 97% sequence similarity with nsp9 of SARS-CoV, which has been previously reported to be essential for virulence of SARS-CoV. Based on high sequence similarity shared by SARS-COV-2 and SARS-CoV nsp9, they are anticipated to show functional conservation [39].Nsp10 is exclusively present among the viruses; it contains 140 amino acids in case of SARS-CoV-2. The SARS-CoV nsp10 analysis underlined the necessity of this protein for ensuring stimulation of other proteins such as nsp14 and nsp16, thereby acting as a scaffolding protein [40]. Its role in stimulating the 2'-O-MTase activity of nsp16 and exoribonuclease activity of nsp14 is well known [40]. Moreover, nsp10 plays pivotal role in crucial RNA methylation and in replication of CoV virus. In a recent study, 1.6 Å crystal structure of SARS-CoV-2 nsp10 (PDB code 6ZPE) is reported along with its comparison with SARS-CoV (PDB code 2FYG) and other pathogenic CoV strains [40]. The structural comparison of nsp10 from SARS-CoV-2 and SARS-CoV displayed high similarity with rmsd of ~0.5 Å for all the atoms in the structure. In addition, sequence similarity of nsp10 from SARS-CoV-2 reflected it to be a new variant of SARS-CoV, with the exception of two amino acids i.e., Pro at position 23 mutated to Ala and Arg at 113 mutated to Lys [40].The nsp12 shows RNA-dependent RNA polymerase (RdRp) activity and can carry out polymerase reaction independently, although the efficacy of this reaction is quite low. However, in the presence of the other two nsps, nsp7 and nsp8, its polymerase activity gets stimulated. Additionally, earlier investigations revealed that a complex of nsp12-nsp7-nsp8 mediates the synthesis of coronavirus RNA [37,38]. Also, nsp12 acts as a key component for directing SARS-CoV-2 replication and transcription. The cryo-EM structure of SARS-CoV-2 nsp12-nsp8-nsp7 complex (PDB code 6M71) affirms its structure to be similar to that of SARS-CoV (PDB code 2AHM) with rmsd value ~0.82 Å, obtained for 1078 Cα atoms. These structural details would be helpful for designing antiviral drugs to target RdRp [41].Intriguingly, SARS-CoV-2 takes maximum advantage of the cell's environment for its replication and to ensure the stability of viral RNA during its life cycle. The genomic RNA stability is crucial for effective translation and survival of SARS-CoV-2 inside the cell [42]. A type 1 cap (Supplementary Fig. 2) with N-methylated guanosine triphosphate and C2'-O-methyl-ribosyladenine is added at the 5′-end of the viral RNA to ensure stability and prevent its degradation. Coronaviruses possess their own enzymes to facilitate this capping at the transcribed viral mRNA [42]. In case of SARS-CoV-2, the cap installation involves the participation of nsp10, nsp13, nsp14, and nsp16 proteins. During the replication process of the virus, nsp13 aids in the unwinding of viral RNA, thus acting as a helicase [42]. In addition, nsp13 cleaves monophosphate at the nascent RNA 5′-end to generate a diphosphate using its triphosphatase activity [42]. Besides, nsp13 has also been discovered to be involved in RNA synthesis as it shows RNA helicase activity. However, how this process takes place is still not clear [43].The bimodular protein nsp14 exhibits N7-methyltransferase and 3′- to 5′- exoribonuclease activities. Further, in order to stimulate exoribonuclease activity of nsp14, its binding to nsp10 has been examined to be significant [40]. The nsp14 is predicted to directly make contact with the nsp12 to form an elongated complex and direct precise RNA synthesis. Notably, nsp16 has been predicted to form a stable holoenzyme and with the contribution of nsp10 and nsp14, it sets up the capping components in transcription [44,45]. Additionally, nsp15 exhibiting endonuclease activity is important for the biology of coronavirus as it partakes to hide the viral RNA from the defense system of host by degrading it [46]. Two crystal structures of SARS-CoV-2 nsp15 in its apo- and citrate-bound form are available with PDB codes 6VWW and 6W01, respectively [46]. The structure comparison of SARS-CoV-2 nsp15 (citrate-bound form) with nsp15 of SARS-CoV (PDB code 2H85) [47] revealed high similarity with rmsd value 0.52 Å. Additionally, the sequence similarity between nsp15 of SARS-CoV and SARS-CoV-2 was observed to be 88%, suggesting SARS-CoV-2 nsp15 to be close homolog of SARS-CoV and proposing that “inhibitors for SARS-CoV may be effective for SARS-CoV-2 as well” [46]. The viral replication and transcription have been observed to be mediated by a multi-subunit polymerase complex made by the assembly of these non-structural proteins. Moreover, the complete transcription as well as replication of the viral genome requires other nsps such as nsp10, nsp14, nsp13 and nsp16 etc., whose exact function during viral RNA synthesis is yet to be determined [18,43]. The key functions of these nsps have been mentioned in Fig. 5. Here, we are also delineating the vital roles/functions of individual nsps during SARS-CoV-2 viral life cycle in Table 1.
Table 1
Essential functions of all the non-structural proteins.
Non-structural protein
Function
Nsp1
Suppresses host gene expression and promotes mRNA degradation
Nsp2
Binds with prohibitin 1 and prohibitin 2
Nsp3
Mediates the release of nsp1, 2 and 3 (PLpro motif)
Nsp4
Involved in Complex formation with nsp3 and nsp6
Nsp5
3C-like protease (3CLpro) cleaves nsp polyprotein at 11 sites
Nsp6
Generates autophagosomes and double membrane vesicles
Nsp7
Complex formation with nsp8
Nsp8
Complex formation with nsp7
Nsp9
Binds DNA/RNA
Nsp10
Stimulates nsp16 methyltransferase activity
Nsp11
Short peptide at ORF1a domain end; function unknown
Nsp12
RNA-dependent RNA polymerase activity
Nsp13
Involvement in RNA synthesis due to RNA helicase activity
Nsp14
Complexes with nsp12 and directs RNA synthesis
Nsp15
Interferes with host defense system by degradation of viral RNA via endonuclease activity
Nsp16
2′-O-ribose methyltransferase
Essential functions of all the non-structural proteins.
Spike protein
The spike protein encoded by the S gene is a large glycoprotein with a molecular mass of 180 kDa. It is known to bind to the angiotensin receptor and doesn't show much homology to other coronaviruses but shows 93.1% identity with the bat coronavirus RaTG13. The ACE2 used for cell entry by SARS-CoV-2, the causative agent for COVID-19, served as a receptor for SARS-CoV also [48]. The S protein consists of two domains- S1 and S2. The S1 region further possesses an N-terminal and a C-terminal subdomain which bind sialic acid and a proteinaceous receptor, respectively whereas the S2 subunit of this protein comprises of subdomains that are the fusion peptide region (FP), heptad-repeat 1 region (HR1), heptad-repeat-2 region (HR2), the transmembrane (TM) and the C-terminal endodomain (E) as shown in Fig. 6
[6].
Fig. 6
The domain organization and cryo-EM structure of Spike protein of SARS-CoV-2. Two subunits S1 (dark green) and S2 (light blue) along with the different subdomains are depicted wherein 22 N-glycosites are depicted. The NTD domain, RBD/RBM of S1 are shown in purple and light green/yellow color, respectively whereas FP, HR1, HR2, TM and E region are displayed in punch, light purple, red, orange, and magenta/maroon color, respectively [6,49]. The cryo-EM structure colors are presented according to the color assigned to the domains (PDB code 6VSB); here the observed N-glycosites (16 nos) are displayed with hotpink spheres [49].
The domain organization and cryo-EM structure of Spike protein of SARS-CoV-2. Two subunits S1 (dark green) and S2 (light blue) along with the different subdomains are depicted wherein 22 N-glycosites are depicted. The NTD domain, RBD/RBM of S1 are shown in purple and light green/yellow color, respectively whereas FP, HR1, HR2, TM and E region are displayed in punch, light purple, red, orange, and magenta/maroon color, respectively [6,49]. The cryo-EM structure colors are presented according to the color assigned to the domains (PDB code 6VSB); here the observed N-glycosites (16 nos) are displayed with hotpink spheres [49].The transmembrane spike protein is known to exist as homotrimer on the viral surface. The S1 domain is assumed to facilitate the receptor binding whereas the S2 subunit mediates fusion of the virus to the cell membrane of the host by means of heptad repeats HR1 and HR2 [50]. The cryo-electron microscopy structural analysis of the SARS-CoV-2 Spike ectodomain trimer revealed that it exists in perfusion conformation which is observed to undergo extensive reorganization of its structure for fusion of the virus membrane with the host cell membrane [49]. The S1 subunit binding to the host cell receptor is anticipated to trigger this process of fusion. This receptor binding results in destabilization of the perfusion trimer conformation that leads to shedding of S1 subunit. Further, S2 subunit undergoes a transition to form a stable conformation post fusion. Interestingly, it is noticed for all CoVs that boundary between the two subunits of spike protein- S1 and S2 is the point where cleavage occurs. The distal S1 subunit contains RBD which engages the host cell receptor by undergoing movements adopting hinge-like conformations. There are two states proposed which are referred to as up and down conformation. In the up conformational state the receptor site is accessible, whereas down conformation state represents the state in which the receptor site is inaccessible [49]. In addition, the spike protein is reported to be cleaved via host proteases at site S2' which is positioned upstream of the fusion peptide in most CoVs. This cleavage activates the spike protein for membrane fusion by promoting various irreversible changes in the conformation of the structure [51]. Overall the viral entry into the host cell is a complex process that requires concerted actions such as receptor binding and proteolytic processing of the spike protein to happen either concurrently or independently. The aforementioned information about essential functions of spike protein makes it indispensable. All these details of spike protein of SARS-CoV-2 prove that it is a potential target for antibody-mediated neutralization and its atomic level information can be helpful in designing vaccine and/or for drug development [49,51].The receptor binding domain (RBD) of the S protein of the virus is crucial for interaction with the ACE2 receptor of the host. The structural features of this hot spot including relatively flat surface, glycosylation free site and the presence of salt bridge surrounded by hydrophobic tunnel wall altogether favor the virus easy accessibility and binding [11].Mass spectroscopy characterization of intact glycopeptides has revealed 22 N-glycosites on the surface of SARS-CoV-2 S protein, which are also found to be preserved in 753 genome sequences of SARS-CoV-2 (Fig. 6). The glycans of spike protein have been highlighted to be extremely significant because of their pivotal role in mechanism by which the human ACE2 receptor attaches with virus. The S protein N-glycosylation plays essential role in directing the proper folding of glycoprotein and priming of protein by host proteases. The identification of site-specific N-glycosylation information might be useful for immunogen design, developing therapeutic antibodies or drug and underlining the precise viral invasion mechanism [52].Studies have provided the evidence that RBD is capable of folding independently of the S protein and possesses all the information needed for binding to the host receptor [53]. The ACE2 proteins throng the epithelial cells of lung alveoli and enterocytes of the small intestine. Further, the presence of ACE2 has also been confirmed in endothelial cells and smooth muscle cells [54]. Thus, it reiterates the abundance and importance of this protein in humans and its possible implications of binding with SARS-CoV-2 can be far-reaching.
ORF3
ORF3a
ORF3a encodes an accessory transmembrane protein located between the structural proteins S and E. It is a hydrophobic protein and contains several conserved regions including a cysteine rich domain which provides it ion-channel activity; a tyrosine based sorting domain Yxxϕ and a diacidic EXD domain having its roles in intracellular trafficking. Besides, these domains also regulate the localization of the SARS-CoV ORF3a in the cell. Even though the Yxxϕ and cysteine rich domains are conserved in SARS-CoV-2, the EXD domain is not reported, in its place, a SGD domain is present [55,56]. It has been shown to form homodimers, homotetramers and oligomers of 31 kDa in SARS-CoV. Its ion channel activity has been reported in Xenopus oocytes [57]. Further, it has been confirmed that ORF3a induces cells to undergo programmed cell death or apoptosis to control the viral spread through several studies carried out in SARS-CoV. Furthermore, research carried out in different cell lines proposes that caspase-3 levels are elevated in the presence of ORF3a. Hence, it is a marker of apoptosis promoted by caspase [56,58]. The cryo-EM structure highlighted that the 3a domain of SARS-CoV-2 forms a dimer which is ~70 Å in length (Supplementary Fig. 3). It possesses a 40 Å long transmembrane region (TR) and a cytosolic region (CR) that extends up to 30 Å; both these are connected through the turn-helix-turn motif. Every protomer in TR consists of three α-helices α1, α2 and α3. Short intercellular linkers connect α1 to α2 whereas short extracellular linkers connect α2 to α3. The monomer in brown color consists of inner core β3, β5, β8, β4 and β7 which in turn interact with another monomer in yellow, possessing several residues such as V168, V225, F230 and I233 in the inner side forming a hydrophobic core. Each protomer with the help of β sandwiches interacts with the other monomer, thus stabilizing the dimer by forming a stable link as shown in Fig. 7
. The structural characterization of ORF 3a from SARS-CoV-2 reveals that it is a class III viroporin [59]. Viroporins are a group of proteins important for viral pathogenesis and formation of viral particles. These are categorized into different classes based on the number of transmembrane domains. The structural studies provide the evidence that ORF3a possesses three transmembrane domains [60]. Further, it provides better understanding of channel activity in proteoliposomes, which may facilitate development of vaccine using this domain [59].
Fig. 7
The top panel represents the cryo-EM structure of ORF3a from SARS-CoV-2 (PDB code 6XDC). The interaction of two monomers (one in yellow and other in brown) results in formation of a stable structure. The three α-helices of the transmembrane region are labelled as α1, α2 and α3. The arrangement of β sandwiches containing 8 β sheets (β1- β8) are also shown [59]. The ORF3a* represents the residues labelled as V168, V225, F230 and I233 forming a hydrophobic core in ORF3a. With the help of β sandwiches each protomer interacts with the other monomer, thus forming a dimer. In the bottom panel, crystal structure of ORF 7a of SARS-CoV-2 with seven β-strands core (PDB code 6W37) is shown in bottom left side [64] whereas the bottom right side depicts the X-ray structure of ORF9b from SARS-CoV-2 (PDB code 6Z4U) (manuscript unpublished) [65].
The top panel represents the cryo-EM structure of ORF3a from SARS-CoV-2 (PDB code 6XDC). The interaction of two monomers (one in yellow and other in brown) results in formation of a stable structure. The three α-helices of the transmembrane region are labelled as α1, α2 and α3. The arrangement of β sandwiches containing 8 β sheets (β1- β8) are also shown [59]. The ORF3a* represents the residues labelled as V168, V225, F230 and I233 forming a hydrophobic core in ORF3a. With the help of β sandwiches each protomer interacts with the other monomer, thus forming a dimer. In the bottom panel, crystal structure of ORF 7a of SARS-CoV-2 with seven β-strands core (PDB code 6W37) is shown in bottom left side [64] whereas the bottom right side depicts the X-ray structure of ORF9b from SARS-CoV-2 (PDB code 6Z4U) (manuscript unpublished) [65].
ORF3b
SARS-CoV ORF3b along with ORF6 and nucleocapsid protein is capable of inhibiting the production of type I IFN. This ORF isolated from SARS-CoV-2 is shorter than the one obtained from SARS-CoV. In fact, the length of its C terminal region is linked with the effectiveness of the IFN inhibition. It has been observed that the inhibition of type I IFN response is considerably higher by SARS-CoV-2 than that observed for SARS-CoV [61]. SARS-CoV-2 has been reported to be located in the plasma membrane [56] while SARS-CoV ORF3b containing 154 amino acids is localized in nucleolus and mitochondria [62]. It promotes necrosis and apoptosis as observed in many cell lines and hinders cell division by inhibiting the transition of cell from G0 or G1 phase to S phase [63].
Envelope protein
The Envelope (E) protein is the smallest structural protein with ample expression in the infected cells. This protein is mostly confined around the ER, Golgi complex and ER-Golgi intermediate compartment (ERGIC), where it is involved in assembly of the virus particles and budding [66]. Though it is not a major component of the viral genome, its roles in various stages of replication and infectivity have been reported along with its importance in maturation of the virus [67].It is a hydrophobic protein and has been shown to undergo various post translational modifications for the functions it performs residing in the ER and ERGIC. This specifically includes palmitoylation and ubiquitination. Addition of palmitoyl group supports in anchoring of the membrane and such proteins are abundant among enveloped viruses [67,68]. The SARS-CoV E-protein has been known to be ubiquitinated and interacts with nsp3 via its ubiquitin-like domain 1 [69]. E-protein may act as an ion channel, specifically like viroporin, which then oligomerizes and enhances membrane permeability [70]. It is crucial for apoptosis and possesses an N-terminal, a transmembrane domain and a C-terminal. Further, the transmembrane region of the protein is responsible for homotypic interactions of the protein as analyzed by molecular dynamics (MD) simulations. It was observed that the dimeric, trimeric and pentameric models were conserved through evolution [71]. The experimental structure for SARS-CoV-2 E protein has not been elucidated yet but a homology model of the protein from SARS-CoV-2 which assembles into a viroporin like pentamer has been obtained by Bianchi and co-workers [72].The viral envelope consists mostly of Spike (S) and Membrane (M) proteins while only a small proportion of the E protein is incorporated in the envelope as observed in coronaviruses like mouse hepatitis virus (MHV), infectious bronchitis virus (IBV) and transmissible gastroenteritis virus (TGEV) [[73], [74], [75]]. The role of E protein in virulence is clear from an observation whereby its deletion led to weakening of the SARS-CoV in both in vivo as well as in vitro conditions. Such viruses can prove to be good vaccine candidates against SARS-CoV-2 since BALB/C mice immunized with “SARS-CoV sans E protein (rSARS CoV-ΔE)” are protected against MA15 virus-mediated clinical diseases including respiratory and lung damage and show reduced replication of virus [76].
Membrane protein
Membrane (M) protein determines the shape of the viral envelope. It possesses three transmembrane domains [77], is the most abundant protein in the viral genome [78] and can bind all other structural proteins [79]. Binding to M protein helps not only in stabilizing the nucleocapsid, but also the N protein-RNA complex, thus completing viral assembly [80]. It is a glycosylated protein and plays a role in activation of Interferon α response [81]. The M protein has been found to be conserved among the β coronaviruses.Extensive literature suggests that M protein supports assembly of virus particles by interacting with other proteins like ribonucleoprotein and the S protein as well as through M-M protein interactions [82]. The interaction between S and M protein is vital for S protein's retention in ERGIC and ultimately for its integration into virions [83].The interaction between E and M protein is one of the most studied interactions among different proteins of CoVs. This interaction takes place at the cytoplasmic side of the ERGIC by means of C-termini of E and M proteins. Both these domains are vital for formation of virus-like particles (VLPs), since their deletion reduces the VLP count [84,85]. The actual structure of the protein has not been determined yet. However, a low resolution approximate structure has been predicted using ab initio techniques [72].
ORF6
The accessory protein ORF6 of SARS-CoV is localized within the Golgi apparatus and ER. This protein is among the minor components of SARS-CoV responsible for virulence, but not necessary for replication of the virus. The domain organization of ORF6 contains N-terminal amphipathic motif from 2 to 37 amino acids (aa) which has a role in inducing the rearrangement of intracellular membrane as well as in double membrane vesicles formation [[86], [87], [88]]. The ORF6 protein has been reported to activate the synthesis of DNA and inhibits the co-transfected plasmid expression. Previous examination of the ORF6 has also revealed its crucial function in viral pathogenesis [89].Moreover, ORF6 localizes along with nsp3 (Papain like proteinase) and acts as a marker for the replication complex [88,89]. Interestingly, ORF6 makes physical interaction with nsp8 as affirmed using co-immunoprecipitation and yeast two hybrid analysis [90]. Additionally, the interaction of ORF6 with 9b accessory protein has also been reported [91], however, how these findings are related to replication of virus and pathogenicity induced by SARS-CoV is yet to be understood.ORF6 isolated from SARS-CoV-2 inhibits the signaling pathway of type I IFN, (specifically IFN-β), which is a crucial component of host defense system. This ORF is also known to target the Interferon-stimulated response element (ISRE). The inhibition of interferon activity will ultimately promote replication of the virus [92].
ORF7
ORF7a
Sequence analysis of ORF7a isolated from SARS-CoV has revealed that it codes for 122 amino acids long, type I transmembrane domain with an N terminal signal peptide and a C terminal domain. Further, it also possesses a distinct luminal domain. This domain folds into a seven-stranded sandwich of β strands, which shows a similar arrangement as seen in the members of Ig superfamily. Previous investigations carried on its structure have revealed that it has also been isolated from other coronaviruses [93,94]. There is little clarity about whether ORF7a localizes with the ERGIC or just the Golgi complex inside the cell. Though its specific function is not reported, it is known to be incorporated inside the mature virion particles and may have some role to play in viral assembly [94]. Its overexpression increases apoptosis and activates nuclear factor kappa B (NF-κB) as well as c-Jun N terminal kinase (JNK) mediated signaling pathways. Both of these promote the production of pro-inflammatory cytokines like IL-8 [95], thus highlighting the role of ORF7a in virus-host interactions. The structure for SARS-CoV-2 ORF7a has been recently determined as shown in Fig. 7 but detailed information is yet to be published.In Dali server [96], the SARS-CoV ORF7a encoded X4 protein (PDB code 1YO4) showed maximum similarity of 89% to SARS-CoV-2 ORF7a (PDB code 6W37) with r.m.s.d value of 1.0 Å and z score 10.3, followed by SARS-CoV ORF 7a (PDB code 1XAK) with similarity of 88%, r.m.s.d of 0.4 Ǻ and z score 14.1. The crystal structure of its N terminal ectodomain from SARS-CoV has been determined [93]. The transmembrane protein contains a 15 residue long N terminal signal peptide, a luminal domain, a transmembrane segment and a cytoplasmic tail. It has been studied that the ectodomain contains four residues of cysteine. The luminal domain is composed of seven β-stranded sandwich consisting of two β sheets, organized in an immunoglobulin type domain. The identity to members of Ig superfamily as found by Dali search was in the range of 2–16%. This fold has been observed to be significant for many proteins. Due to the disulphide bond between cysteine residues, both the β sheets lead to the formation of a hydrophobic pocket near the middle region of the β1 strand. The cytoplasmic tail contains lysine, arginine and lysine at positions 103, 104 and 105, respectively. The lysine at positions 103 and 105 are needed for exit of this ORF from ER. The structural details do not clearly define the functional significance of ORF7a for this virus [93].
ORF7b
The ORF7b codes for a 44 amino acid long protein which is hydrophobic in nature. The ORF7b start codon overlaps with the ORF7a stop codon [94]. The Golgi apparatus has been said to be its site of localization. It codes for a transmembrane protein with an N terminal as well as a cytoplasmic C- terminal. The transmembrane domain of this protein is crucial for localization in the Golgi apparatus. It has been studied that this ORF is not required for viral replication in vivo as well as in vitro [97].
ORF8
Among all the proteins from SARS-CoV-2, the one encoded by ORF8 show only <20% sequence similarity to the SARS-CoV [98]. In a recent study, it has been suggested that overexpression of ORF8 from SARS-CoV-2 contributes to a significant reduction in MHC-I molecules, which serve as hallmarks of the immune system. The ORF8 was shown to localize along with MHC-I and through immunoprecipitation, it was further confirmed that ORF8 binds to exogenous or endogenous MHC-I and promotes degradation via lysosomes. This decline in MHC-I levels seems to be specific to this novel coronavirus [99]. Furthermore, this ORF has not been found to possess any functional domain and based on predictions of the secondary structure it is strikingly suggested to give rise to a protein with an alpha helix followed by a beta sheet with six strands [17]. Although this ORF doesn't hold any importance for replication of the pathogen's genome [100], the exact reason for frequent mutations in this region is a cause of speculation. Just like protein from ORF 6, its overexpression promotes synthesis of DNA [98,101].
Nucleocapsid protein
The Nucleocapsid (N) protein is bound to the viral nucleic acid, the most crucial component of the SARS-CoV located in the Endoplasmic Reticulum and the Golgi complex inside the cell. It thus attaches to the RNA. The N protein is important for replication of the virus and host response to viral infections. It undergoes various post translational modifications that majorly include phosphorylation. This is responsible for structural changes that ultimately increase the protein's affinity for genomic RNA [102]. Further, the SARS-CoV-2 N protein consists of three domains, an N-terminal domain (NTD), a central S/R rich disordered linker region and a C-terminal domain (CTD) (Fig. 8
) [103].
Fig. 8
The domain organization and structure of SARS-CoV-2 N protein. Top panel represents three domains of N protein: NTD (N-terminal RNA binding domain), S/R-rich linker and a CTD (C-terminal domain), shown in magenta, light yellow and purple color, respectively [103]. The crystal structure of the N-terminal RNA binding domain of the N protein of SARS-CoV-2 (PDB code 6M3M) [103], its conserved residues S52, R89, Y110, Y112, and R150 of NTD for ribonucleotide binding site and the cartoon representation of three-dimensional crystal structure of the C-terminal domain of the N protein (PDB code 6WZO) [104] are shown in boxes. In NTD, three O-glycosylation sites, T148, T165 and T166, are represented using red sticks whereas in CTD and N-glycosylation site N269 is shown in blue stick.
The domain organization and structure of SARS-CoV-2 N protein. Top panel represents three domains of N protein: NTD (N-terminal RNA binding domain), S/R-rich linker and a CTD (C-terminal domain), shown in magenta, light yellow and purple color, respectively [103]. The crystal structure of the N-terminal RNA binding domain of the N protein of SARS-CoV-2 (PDB code 6M3M) [103], its conserved residues S52, R89, Y110, Y112, and R150 of NTD for ribonucleotide binding site and the cartoon representation of three-dimensional crystal structure of the C-terminal domain of the N protein (PDB code 6WZO) [104] are shown in boxes. In NTD, three O-glycosylation sites, T148, T165 and T166, are represented using red sticks whereas in CTD and N-glycosylation site N269 is shown in blue stick.These three domains play crucial roles as outlined by previous research [22,[102], [103], [104]]. The N-terminal domain serves a role in RNA-binding whereas the C-terminal domain is responsible for oligomerization. Accumulated knowledge from the previously solved crystal structure of N-terminal domain of N protein of various CoVs such as SARS-CoV, HCoV-OC43 and infectious bronchitis virus suggested some essential functions of this domain. This domain associates with 3′-end of the RNA genome of the virus via electrostatic interactions. Also, R76 and Y94 are involved in binding with RNA by the NTD of SARS-CoV N protein as well as causing viral infection. Substitution of these amino acids by an Alanine residue significantly reduced RNA binding. The middle linker region rich in Ser/Arg (shown in Fig. 8) is necessary for primary phosphorylation [103,105].The N protein of the virus serves multiple functions during its life cycle that include RNA binding, chaperone activities, direct metabolism of the infected cell and CoV RNA transcription as well as replication. The initial role of this protein is binding with the RNA genome of the virus and forming the helical ribonucleoprotein complex (RNP complex) during packing of the genome [106]. In vitro and in vivo analysis suggests that SARS-CoV N protein has affinity to bind with the leader RNA; this ensures maintenance of appropriate RNA conformation which, in turn, facilitates viral RNA genome transcription and replication [103,107]. The expression of N protein is observed to be high during infection. It reacts with sera obtained from patients suffering with SARS-CoV. Distinctively, antibodies to N protein are also observed to form earlier than that for S protein [108]. It also possesses the ability to induce a protective immune response in the human body against SARS-CoV and SARS-CoV-2 [103,109].Recently, LC-MS studies of the N-protein revealed the presence of N and O-glycans. The N-glycans of high mannose, hybrid and complex type are reported in this study. N-protein NTD has one N-glycosylation site located at N47 and three O-glycosylation sites that are T148, T165 and T166 which regulate the RNA-binding activity. The middle SR-rich linker region of N-protein has N192 and N196 glycosylation sites. The CTD of N protein has two O-glycosylation sites (T245 and T247) and N-glycosylation site at N269 (Fig. 8). The roles of various domains in N-protein emphasize the importance of future investigation on the role played by glycosylation when viral pathogenesis occur as well as its use in vaccine or drug design [110].Recently, the three-dimensional structure of the SARS-CoV-2 N-terminal domain of the nucleocapsid protein was elucidated using X-ray crystallography [103]. The structural analysis highlighted that one asymmetric unit of this domain consists of four N-NTD monomers packed in orthorhombic crystal form. The arrangement of the orthorhombic crystal form having right handed loops- five antiparallel β-strands- loops of one monomer out of four monomers is depicted in Fig. 8. Here, the β-sheet core of NTD domain had a single 310 helix just before the β2 strand. In this structure, the β-hairpin extended between β2 and β5 has been suggested to be essential for RNA binding. The NTD with its aromatic and basic residues folds in right-hand shape resembling a hand having a basic palm and an acidic wrist; here β hairpins are like fingers. In addition, the interfacial interaction in the NTD of the N protein mostly occurs via interaction of its β-hairpin fingers with the palm region [103,105]. Additionally, this SARS-CoV-2 N-NTD (N-terminal domain of N protein) structure superimposition with N-NTD of SARS-CoV, MERS-CoV, and HCoV-OC43 showed considerable structural similarities. However, several differences in the movement of β-hairpin region backward to the opposite side and forward to the nucleotide binding site in SARS-CoV and MERS-CoV, respectively were seen whereas β-hairpin region is extended less in HCoV-OC43. These differences change the surface charge distribution of protein which brings about flexibility in RNA binding cleft for the RNA based genome. Moreover, multiple sequence alignment of these three CoVs with SARS-CoV-2 N-NTD had revealed conserved residues such as S51, R89, Y110, Y112 and R149 as shown in Fig. 8 [103].The CTD of SARS-CoV-2 N protein is closely related to homologous domains of various coronaviruses such as SARS-CoV, MERS-CoV, HCoV-NL63 etc. that affirmed the structure of this domain to be a homodimer [104]. The overall structure has ten α helices and four β sheet strands surrounded by 310 helices as shown in Fig. 8. The structure closely resembles the SARS-CoV CTD domain which has compact intertwined dimer structure possessing four β sheets that are centrally located and antiparallel, forming the dimer interface. At the interface, each protomer has two β strands and short α helix extended towards the protomer opposite to it facilitating the packing against the hydrophobic core. The sequence identity for CTD of SARS-CoV-2 with SARS-CoV has been observed to be 96% except for notable changes in five residues; Q268, D291, H335, Q346 and N350 of SARS-CoV to A267, E290, T334, Q345 and N348 in case of SARS-CoV-2, respectively. In order to elucidate the structural similarities, one group overlaid the structure of the SARS-CoV-2 N-protein CTD domain determined by them with CTD structures available with PDB code 7C22 [111] (unpublished manuscript). The outcome revealed high structural similarity with r.m.s.d value of Cα in the range 0.15–0.31 Å. Additionally, a positively charged surface area present in the CTD domain has been suggested to facilitate RNA binding. However, the binding affinity of RNA is assumed to vary according to variation in the viral isoform from one another [104].
ORF9b
This ORF is located in the Nucleocapsid gene and possesses 98 amino acids. Structural studies carried out for SARS-CoV ORF9b provide information that it specifically binds to lipid molecules by means of a hydrophobic cavity located in the center of the structure. Its expression carried out in mammalian cell culture suggests it is membrane bound and associated with vesicular structures inside the cell [112].The GFP tagged ORF9b fusion protein has been reported to localize in the mitochondria. The same fusion complex expressed in different cell lines has been associated with considerable reduction in Dynamein related protein (DRP) 1 levels which ultimately causes elongation of mitochondria. This can be implicated to the fact that mitochondria protect against SARS-CoV, linked to interferon signaling, which is limited by DRP1 as a defense mechanism [113]. Furthermore, its presence has been reported in VLPs as well as in virions of SARS-CoV. Since both E as well as M proteins are sufficient for formation of VLPs, both these proteins promote the incorporation of ORF9b into these particles [114].The manuscript elucidating the atomic details of the crystal structure (PDB code 6Z4U) for ORF9b (Fig. 7) and other related aspects is yet to be published. However, on Dali server [96], the SARS-CoV ORF9b (PDB code 2CME) exhibits 81% identity to the SARS-CoV-2 ORF9b with r.m.s.d of 2.4 Å and z score of 9.9. The crystal structure is composed of a symmetrical dimer consisting of two folds made from two adjacent β sheets [112]. Every sheet is made up of β strands which, in turn, are formed by both the monomers resulting in the formation of an intertwined structure. The dimer thus formed is composed of anti-parallel β-sheets in which the monomers bind each other firmly.The surface properties analysis of ORF9b showed that one region of the molecule is positively charged while the other is negatively charged. It has a dimeric, tent-like β structure and a hydrophobic cavity in the centre which is surrounded by hydrophobic side chains. This cavity binds lipid molecules and has a unique fold. It is responsible for membrane attachment during viral assembly [112].
ORF10
SARS-CoV-2 ORF10 is present at 3ˈ-end of the genome. It has been anticipated to possess 38 amino acids and is predicted to code for a transmembrane domain in SARS-CoV-2 [115]. This observation further suggests that SARS-CoV-2 ORF10 might code for a functional protein. However, this hypothesis needs experimental and genomic analysis for clarity. This ORF is particularly peculiar because of its dissimilarity to other proteins in the NCBI database [116]. Its analysis in SARS-CoV lineage suggests that a stop codon disrupts its reading frame [115].
Potential targets for vaccine design
Novel coronavirus being highly contagious with a high mortality rate calls for the need to discover vaccines to curb its worldwide spread. To achieve this, researchers across the globe are putting efforts to discover inhibitors against this viral pathogenesis in humans. SARS-CoV-2 has various proteins as mentioned before and any natural product, antiviral compound, nanoparticle derived drugs, peptides etc. which can target these viral proteins can serve as promising drug candidates. This section aims to summarize all the domains of SARS-CoV-2 which are potential targets for vaccine design as suggested by extensive research carried out till date. Presently, the efforts to synthesize antiviral drugs against lethal COVID-19 infection mostly target the spike protein region, the 3C-like protease and papain like protease. Apart from these proteins, RNA polymerase, E protein and helicase proteins of SARS-CoV-2 are also considered to be potential targets for antiviral vaccine or drug development [117,118]. Besides, repurposing of drugs against SARS-CoV-2 is needed but the interaction of such drugs with the host, specifically, drugs used against other human coronaviruses (HCoV) requires in-depth research. Several such immunosuppressive drugs like sirolimus (rapamycin) and agents preventing inflammation like mesalazine etc. are being proposed as treatment options for SARS-CoV-2 [119].Past experiences with influenza vaccine stress upon the need for development of (a) whole inactivated virus vaccines and (b) live attenuated viral vaccines for SARS-CoV-2 [120]. Agents like formaldehyde, UV irradiation etc. have been successfully used for inactivation of the pathogen in vaccine preparations everywhere. Since all of the structural proteins S, E, M and N stimulate humoral as well as cell mediated immune responses involving production of antibodies as well as CD4+ and CD8+ T cells, vaccine development may be facilitated [121,122]. Specifically, targeting the S protein as well as the RBD of the virus may lead to generation of neutralizing antibodies, thus providing immunity as studied for SARS-CoV [123]. Moreover, these structural proteins could be used as ideal targets for anti-viral compounds [123].DNA, RNA and protein based vaccines are being tested against COVID-19 all across the globe. In a recent study, an S protein based mRNA vaccine (mRNA-1273) encoding the spike pre-fusion complex has shown promising results in Phase 1 clinical trials. It is a lipid nanoparticle based vaccine; immune responses against the pathogen were detected in all the patients and no ill effects were noted. As mentioned above, keeping in view the importance of spike protein in leading the viral entry inside the host, many of the vaccines are being developed to target this protein [124]. Most of the nucleic acid based vaccines as well as many recombinant protein vaccines are aiming to target this protein and are in clinical trials [125].As is widely known, the virus uses its S protein for entry into the host cell which then attaches to ACE2. Some of the vaccines in clinical trials include INO-4800 which is a DNA vaccine and ChAdOx1 nCoV-19, which showed good results in Phase 1 and 2 clinical trials. It is a non-replicating adenovirus based vaccine [125].Based on these findings, recombinant soluble ACE2 isolated from humans was used to inhibit SARS-CoV-2 in Vero cell lines. This particular ACE2 has earlier been in phase 1 as well as phase 2 testing and is now being thought to be used as a treatment for the current outbreak [126]. The Transmembrane protease serine 2 (TMPRSS2) is known to prepare the S protein and interact with the ACE2. Camostat mesylate is a protease inhibitor and clinically approved to impede the TMPRSS2 activity. As shown recently, camostat mesylate blocked the SARS-CoV-2 entry into cells and hence can be developed as a vaccine candidate or anti-viral compound [127].The E protein has been implicated in the virulence of this pathogen and its deletion severely cripples the CoV pathogenicity. By appropriately mutating this protein, the feasibility of developing a live attenuated vaccine for SARS-CoV and MERS-CoV has been explored [128]. A similar approach can be used for SARS-CoV-2. In a study by Pang et. al., it was found that the antiserum raised against M protein of SARS-CoV had a high neutralization titre. This suggests its possible development as a vaccine for SARS-CoV-2, since it has been found to be evolutionarily conserved [129].Additionally, finding inhibitors for protease activity of nsp3 can prove to be helpful in vaccine development. For instance, a natural product class called tanshinones has been recently reported to inhibit the protease activity of nsp3 [18,29].Nsp5 also referred to as Mpro is responsible to cleave the viral polyproteins at 11 sites releasing various intermediate as well as mature non-structural proteins, which have crucial roles in virus replication. Therefore, it can also be used as a potent target for antiviral drug development [130]. Previously, inhibitors have been developed to target viral proteases, some of which include peptide aldehyde and its derivatives. The inhibitors like GC373 and GC376 have been used to treat Feline Coronavirus (FcoV) as well as mink and ferret coronaviruses. GC376 covalently binds with cysteine residues of the MERS-CoV Mpro. A latest study reports that both bind and reversibly inhibit Mpro of SARS-CoV-2 and can effectively be used for antiviral therapy [131]. Moreover, Ebselen (in clinical trial), N3 (pre-clinical but animal testing not done), 13b (pre-clinical; animal testing yet to be done) and 11b (pre-clinical) are examples of already reported inhibitors of SARS-CoV-2 Mpro [130,132,133].RNA-dependent RNA polymerase (Nsp12) [41] of SARS-CoV-2 possesses a groove type structural region which has been suggested to be an active center for RNA synthesis as well as replication of virus. Evidences showed that SARS-CoV-2 RdRp has a sequence similarity of 96% with SARS-CoV, which proposes the inhibitor against the SARS-CoV RdRp could also show similar inhibitory activity against RdRp of SARS-CoV-2 [19,134]. Notably, Nsp12 viral polymerase appears to be a good target for remdesivir [41]. This drug has also been proven as a lead inhibitor according to in vitro studies and animal study in rhesus monkeys [135], even though its use against SARS-CoV-2 is still skeptical.Humoral as well as cellular immune response are heightened during infection in body. The role of T cells is especially coming to the fore because of the generation of a cytokine storm triggered by SARS-CoV-2 in human host. Several studies are already underway to pin-point how these cells can be used for development of an effective vaccine. Both CD4 as well as CD8 cells isolated from convalescent patients have been reported to respond to the structural as well as non-structural region of the SARS-CoV-2. Further, memory T cells have been detected for SARS-CoV even after several years of infection that also exhibited cross-reactivity to SARS-CoV-2 [136], which gives hope for the on-going vaccine development efforts. Furthermore, trials on artificial Antigen Presenting Cells (aAPCs) with and without cytotoxic (TC) cells are in progress [137].Aforementioned many ORFs are involved in the suppression of host defense response, for instance, ORF3b is involved in suppressing the vital type I IFN response. Similarly, ORF8 is associated with reduced MHC-I levels. Such immune molecules are crucial to the host in fight against this pathogen. If one could target these ORFs and inhibit its activity, it would prove to be helpful for disease management.
Future perspectives
Designing an effective vaccine to curb the deadly SARS-CoV-2 is the need of the hour. Many vaccine manufacturers are racing against time to deliver a cure for this disease which has claimed many lives across the globe. A plethora of in vivo and in vitro studies are being carried out simultaneously using various tools and techniques by researchers to unleash the atomic level information, structural as well as biological aspects including epidemiology and prognosis etc. of SARS-CoV-2. The knowledge of correct architecture, identification of biochemical, genetic information as well as structural properties of the SARS-CoV-2 would be helpful in understanding various virus-host interactions. Besides, it will reveal mechanistic details of interactions of different proteins of SARS-CoV-2. Therefore, future studies should focus on determining the structures of those domains which are still not known. This will further pave way for designing potential drugs to effectively treat the disease and reduce mortality.There is a need to develop proper strategies for treating asymptomatic patients too as otherwise most such strategies are limited to symptomatic patients. Repurposing of drugs used against a variety of other diseases is being proposed as a good strategy to contain the spread of the disease. For instance, the effectiveness of Bacillus Calmette–Guérin (BCG) vaccine used for tuberculosis (TB) treatment is being tested. The study of SARS-CoV-2 at the atomic levels is bound to improve our insights about various virus-host interactions that are involved in disease progression, and to limit harmful consequences and raise awareness about future risks associated with such fatal disease outbreaks. If the immune cells can be boosted in certain ways to respond in an effective manner against the disease caused by this RNA pathogen, it would prove to be a great boon for the suffering masses. Developing a vaccine against this kind of pathogen would be a challenge due to the high mutability of RNA viruses, nevertheless keeping a track of SARS-CoV-2 through extensive research would definitely go a long way in eradicating this virus. Further, diagnostic tests with enhanced sensitivity for detection are the need of the hour and would be key for an effective treatment.
CRediT authorship contribution statement
M.K. and A.S. wrote the initial draft. S.K. and G.S. provided feedback. RPB reviewed, provided feedback and edited.
Authors: Boyd Yount; Rhonda S Roberts; Amy C Sims; Damon Deming; Matthew B Frieman; Jennifer Sparks; Mark R Denison; Nancy Davis; Ralph S Baric Journal: J Virol Date: 2005-12 Impact factor: 5.103
Authors: Jason Netland; Marta L DeDiego; Jincun Zhao; Craig Fett; Enrique Álvarez; José L Nieto-Torres; Luis Enjuanes; Stanley Perlman Journal: Virology Date: 2010-01-27 Impact factor: 3.616
Authors: Pui Ying Peggy Law; Yuet-Man Liu; Hua Geng; Ka Ho Kwan; Mary Miu-Yee Waye; Yuan-Yuan Ho Journal: FEBS Lett Date: 2006-06-02 Impact factor: 4.124
Authors: Youngchang Kim; Robert Jedrzejczak; Natalia I Maltseva; Mateusz Wilamowski; Michael Endres; Adam Godzik; Karolina Michalska; Andrzej Joachimiak Journal: Protein Sci Date: 2020-05-02 Impact factor: 6.993
Authors: Markus Hoffmann; Hannah Kleine-Weber; Simon Schroeder; Nadine Krüger; Tanja Herrler; Sandra Erichsen; Tobias S Schiergens; Georg Herrler; Nai-Huei Wu; Andreas Nitsche; Marcel A Müller; Christian Drosten; Stefan Pöhlmann Journal: Cell Date: 2020-03-05 Impact factor: 41.582