Mohammed Elimam Ahamed Mohammed1,2. 1. Department of Chemistry, Faculty of Science, King Khalid University, Abha, Saudi Arabia. 2. Unit of Bee Research and Honey Production, Faculty of Science, King Khalid University, Abha, Saudi Arabia.
Abstract
There are three types of proteins in coronaviruses: nonstructural, structural, and accessory proteins. Coronavirus proteins are essential for viral replication and for the binding and invasion of hosts and the regulation of host cell metabolism and immunity. This study investigated the amino acid sequence similarity and identity percentages of 10 proteins in SARS-CoV-2, SARS-CoV and the Rhinolophus affinis bat coronavirus (BatCoV RaTG13). The investigated proteins were the 1ab polyprotein, spike protein, orf3a, the envelope protein, the membrane protein, orf6, orf7a, orf7b, orf8, and the nucleocapsid protein. The online sequence alignment service of The European Molecular Biology Open Software Suite (EMBOSS) was used to determine the percentages of protein similarity and identity in the three viruses. The results showed that the similarity and identity percentages of the SARS-CoV-2 and BatCoV RaTG13 proteins were both greater than 95%, while the identity and similarity percentages of SARS-CoV-2 and SARS-CoV were both greater than 38%. The proteins of SARS-CoV-2 and BatCoV RaTG13 have high identity and similarity compared to those of SARS-CoV-2 and SARS-CoV. GRAPHIC ABSTRACT: The proteins of the SARS-CoV-2 are most identical and similar to those of BatCoV RaTG13 than to the proteins of SARS-CoV. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s42485-021-00060-3.
There are three types of proteins in coronaviruses: nonstructural, structural, and accessory proteins. Coronavirus proteins are essential for viral replication and for the binding and invasion of hosts and the regulation of host cell metabolism and immunity. This study investigated the amino acid sequence similarity and identity percentages of 10 proteins in SARS-CoV-2, SARS-CoV and the Rhinolophus affinis bat coronavirus (BatCoV RaTG13). The investigated proteins were the 1ab polyprotein, spike protein, orf3a, the envelope protein, the membrane protein, orf6, orf7a, orf7b, orf8, and the nucleocapsid protein. The online sequence alignment service of The European Molecular Biology Open Software Suite (EMBOSS) was used to determine the percentages of protein similarity and identity in the three viruses. The results showed that the similarity and identity percentages of the SARS-CoV-2 and BatCoV RaTG13 proteins were both greater than 95%, while the identity and similarity percentages of SARS-CoV-2 and SARS-CoV were both greater than 38%. The proteins of SARS-CoV-2 and BatCoV RaTG13 have high identity and similarity compared to those of SARS-CoV-2 and SARS-CoV. GRAPHIC ABSTRACT: The proteins of the SARS-CoV-2 are most identical and similar to those of BatCoV RaTG13 than to the proteins of SARS-CoV. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s42485-021-00060-3.
Coronavirus disease 2019 (COVID-19) originated from a seafood market in Wuhan city (the capital of Hubei Province in southeastern China) and spread rapidly in more than 200 countries. By 2 Jul 2020, the total confirmed cases had reached more than 10.5 million, and 512,000 deaths had been reported. The symptoms of COVID-19 include cough, fever, headache, fatigue, sore throat, and malaise. The disease can lead to complications, such as pneumonia and severe acute respiratory syndrome (WHO 2020; Ahmad et al. 2020; Velavan and Meyer 2020). COVID-19 is transmitted through direct or indirect contact with respiratory droplets and biological samples such as urine, saliva, and stool (Shereen and Khan 2020). However, some studies proved the presence of the virus in air samples, and one study stated that the virus in air samples is viable for up to 3 h (Cheng et al. 2019; Ong et al. 2020; Liu et al. 2020; Doremalen et al. 2020).Coronavirus 19 was named by the WHO and the International Committee on Taxonomy of Viruses (ICTV) as SARS-CoV-2, which is grouped in the same class of SARS-CoV (International Committee on Taxonomy of Viruses (ICTV) 2020). The two viruses belong to the family Coronaviridae, subfamily Orthocoronavirinae, genus Betacoronavirus; and subgenus Sarbecovirus, and the species is severe acute respiratory syndrome-related coronavirus. Bat coronavirus (BatCoV RaTG13) was isolated from animals of genus Rhinolophus affinis. Similar to SARS-CoV-2 and SARS-CoV, bat coronavirus BatCoV RaTG13 belongs to the Betacoronavirus family and has 96% genome sequence identity with the genome of SARS-CoV-2 (Zhou et al. 2020).SARS-CoV-2, SARS-CoV, and bat coronavirus BatCoV RaTG13 have the same virion structure. They are RNA viruses with a nucleocapsid protein and an envelope. The viral envelope contains a bi-lipid membrane and three proteins: the spike protein, an envelope protein, and a membrane protein (Perlman and Netland 2009).The three viruses contain two major genes: orf1ab and orf1a (comprising two-thirds of the total) and the structural and accessory protein genes (comprising one-third of the total). The Orf1ab and Orf1a genes are translated and hydrolysed to produce 16 nonstructural proteins (nsp1–nsp16), while the translation of the second gene produces the structural proteins spike (S), envelope (E), membrane (M), and nucleocapsid (N) and the accessory proteins orf3a, orf3b, orf6, orf7a, orf7b, orf8a, orf8b, orf9b and orf10. The number and type of accessory proteins differ according to the virus (Zhou et al. 2020; Yoshimoto 2020; Wang et al. 2020; Khailany et al. 2020; Wong et al. 2019; GenBank 2020).Regarding NS3, NS6, NS7a, NS7b and NS8 of BatCoV RaTG13, some published articles named them nonstructural proteins, and others named them accessory proteins (GenBank 2020; Fahmi et al. 2020; Tang et al. 2020; Li et al. 2020). These proteins are encoded by genes similar to those of structural and accessory proteins, and because they are comparable to the accessory proteins of SARS-CoV-2, they are considered accessory proteins.This article investigated the protein sequence identity and similarity percentages of SARS-CoV-2 and compared them to the proteins of SARS-CoV and the BatCoV RaTG13.
Materials and methods
Study proteins
This 1ab polyprotein of SARS-CoV-2, SARS-CoV, and BatCoV RaTG13 was studied. Additionally, the structural and accessory proteins found in SARS-CoV-2 and BatCoV RaTG13 were studied, including the spike protein (S), orf3, envelope protein (E), membrane protein (M), orf6, orf7a, orf7b, orf8, and nucleocapsid protein (N) (Table 2). The amino acid sequences were obtained from the National Center for Biotechnology Information (NCBI) site (https://www.ncbi.nlm.nih.gov/protein) (Table 1).
Table 2
The percentages of identity and similarity of the SARS-CoV-2 proteins compared to those of SARS-CoV and RaTG13 (bat coronavirus)
Protein
Identity %
Similarity %
1
1 ab polyprotein
SARS-CoV-2 and SARS-CoV
86.2
92.9
SARS-CoV-2 and RaTG13
98.5
99.1
2
Spike protein
SARS-CoV-2 and SARS-CoV
76
86
SARS-CoV-2 and RaTG13
97.4
98.4
3
Orf3a (NS3)
SARS-CoV-2 and SARS-CoV
72.4
85.1
SARS-CoV-2 and RaTG13
97.8
98.9
4
E protein
SARS-CoV-2 and SARS-CoV
94.7
96.1
SARS-CoV-2 and RaTG13
100
100
5
M protein
SARS-CoV-2 and SARS-CoV
90.5
96.4
SARS-CoV-2 and RaTG13
99.5
99.5
6
Orf6 (NS6)
SARS-CoV-2 and SARS-CoV
68.9
88.5
SARS-CoV-2 and RaTG13
100
100
7
Orf7a (NS7a)
SARS-CoV-2 and SARS-CoV
85.2
90.2
SARS-CoV-2 and RaTG13
97.5
99.2
8
Orf7b (NS7b)
SARS-CoV-2 and SARS-CoV
85.4
90.2
SARS-CoV-2 and RaTG13
97.7
97.7
9
Orf8 (NS8)
SARS-CoV-2 and SARS-CoV
38.9
44.4
77.8
66.7
SARS-CoV-2 and RaTG13
95
95.9
10
N protein
SARS-CoV-2 and SARS-CoV
90.5
94.3
Table 1
The studied proteins of the three viruses
Protein
SARS-CoV-2
SARS-CoV
RaTG13
1
1ab polyprotein
NCBI Code
YP_009724389.1
NP_828849.7
QHR63299.1
Gene location
266..21555
265..21485
251..21537
Amino acid number
7096
7073
7095
2
S protein
NCBI Code
YP_009724390.1
YP_009825051.1
QHR63300.2
Gene location
21492..25259
21492..25259
21545..25354
Amino acid number
1273
1255
1269
3
Orf3
NCBI Code
YP_009724391.1
YP_009825052.1
QHR63301.1
Gene location
25393..26220
25268..26092
25363..26190
Amino acid number
275
274
275
4
E protein
NCBI Code
YP_009724392.1
YP_009825054.1
QHR63302.1
Gene location
26245..26472
26117..26347
26215..26442
Amino acid number
75
76
75
5
M protein
NCBI Code
YP_009724393.1
YP_009825055.1
QHR63303.1
Gene location
26523..27191
26398..27063
26493..27158
Amino acid number
222
221
221
6
Orf6
NCBI Code
YP_009724394.1
YP_009825056.1
QHR63304.1
Gene location
27202..27387
26913..27265
27169..27354
Amino acid number
61
63
61
7
Orf7a
NCBI Code
YP_009724395.1
YP_009825057.1
QHR63305.1
Gene location
27394..27759
27273..27641
27360..27725
Amino acid number
121
122
121
8
Orf7b
NCBI Code
YP_009725318.1
YP_009825058.1
QHR63306.1
Gene location
27756..27887
27638..27772
27722..27853
Amino acid number
43
44
43
9
Orf8
NCBI Code
YP_009724396.1
YP_009825059.1 YP_009825060.1
QHR63307.1
Gene location
27894..28259
27779..27898, 27864..28118
27860..28225
Amino acid number
121
39, 84
121
10
N protein
NCBI Code
YP_009724397.2
YP_009825061.1
QHR63308.1
Gene location
28274..29533
28120..29388
28240..29499
Amino acid number
419
422
419
The number of amino acids and their sequences for the proteins were obtained from the National Center for Biotechnology Information (NCBI) site accessible at https://www.ncbi.nlm.nih.gov/protein
The studied proteins of the three virusesThe number of amino acids and their sequences for the proteins were obtained from the National Center for Biotechnology Information (NCBI) site accessible at https://www.ncbi.nlm.nih.gov/protein
Sequence alignment
The online sequence alignment service of The European Molecular Biology Open Software Suite (EMBOSS) was used to determine the percentages of protein similarity and identity of SARS-CoV-2, SARS-CoV, and RaTG13. The matrix of the sequence alignment was EBLOSUM62, and the gap extends penalties were 14 and 4. The sequence alignment service of the EMBOSS can be accessed at https://www.bioinformatics.nl/cgi-bin/emboss/matcher. As a confirmatory test, a coloured alignment display was generated for each protein using the service of multiple sequence alignment of the European Molecular Biology Laboratory—European Bioinformatics Institute (EMBL-EBI) available at https://www.ebi.ac.uk/Tools/msa/clustalo/.
Results and discussion
This study reports differences in the identity and similarity percentage of the proteins of SARS-CoV-2 versus SARS-CoV and of SARS-CoV-2 versus the bat coronavirus RaTG13. The differences suggest a bat origin over a SARS-CoV origin, and these differences were caused by different types of mutations including deletions, insertions and substitutions [Annex 1, Annex 2 in ESM, Figs. 1, 2, 3, 4, 5, 6, 7 and 8].
Fig. 1
Sequence alignment of the orf3a accessory protein of SARS-CoV-2, SARS-CoV and the bat coronavirus RaTG13
Fig. 2
Sequence alignment of the envelope structural proteins of SARS-CoV-2, SARS-CoV and the bat coronavirus RatG13
Fig. 3
The membrane proteins of SARS-CoV-2, SARS-CoV and bat coronavirus RaTG13 coloured according to the sequence alignment
Fig. 4
Sequence alignment of the orf6 accessory protein of SARS-CoV-2, SARS-CoV and the bat coronavirus RaTG13
Fig. 5
The orf7a accessory protein of SARS-CoV-2, SARS-CoV and bat coronavirus RaTG13 and its alignment
Fig. 6
Sequence alignment of the orf7b accessory protein of SARS-CoV-2, SARS-CoV and the bat coronavirus RaTG13
Fig. 7
The orf8 accessory protein of SARS-CoV-2, SARS-CoV and bat coronavirus RaTG13 and its alignment
Fig. 8
The nucleocapsid structural protein of SARS-CoV-2, SARS-CoV and bat coronavirus RaTG13 and the coloured sequence alignment
Sequence alignment of the orf3a accessory protein of SARS-CoV-2, SARS-CoV and the bat coronavirus RaTG13Sequence alignment of the envelope structural proteins of SARS-CoV-2, SARS-CoV and the bat coronavirus RatG13The membrane proteins of SARS-CoV-2, SARS-CoV and bat coronavirus RaTG13 coloured according to the sequence alignmentSequence alignment of the orf6 accessory protein of SARS-CoV-2, SARS-CoV and the bat coronavirus RaTG13The orf7a accessory protein of SARS-CoV-2, SARS-CoV and bat coronavirus RaTG13 and its alignmentSequence alignment of the orf7b accessory protein of SARS-CoV-2, SARS-CoV and the bat coronavirus RaTG13The orf8 accessory protein of SARS-CoV-2, SARS-CoV and bat coronavirus RaTG13 and its alignmentThe nucleocapsid structural protein of SARS-CoV-2, SARS-CoV and bat coronavirus RaTG13 and the coloured sequence alignment
The 1ab polyprotein
The 1ab polyprotein of SARS-CoV-2, SARS-CoV and BatCoV RaTG13 is composed of 7096, 7073, and 7095 amino acids, respectively (Table 1). The amino acid sequence identity and similarity of the 1ab polyprotein of SARS-CoV-2 and BatCoV RaTG13 were 98.5% and 99.1%, respectively. The percentages of identity and similarity of the 1ab polyprotein of SARS-CoV-2 and SARS-CoV were 86.2% and 92.9%, respectively. The results show that SARS-CoV-2 most likely originates from the Rhinolophus affinis bat, not from a laboratory-modified SARS-CoV variant (Table 2). Large-scale mutations were reported for the 1ab protein of SARS-CoV-2, SARS-CoV, and the bat coronavirus RaTG13. However, more mutations in the 1ab polyprotein of SARS-CoV-2 and SARS-CoV were shared in common than those of SARS-CoV-2 and bat coronavirus RaTG13 [Annex 1].The percentages of identity and similarity of the SARS-CoV-2 proteins compared to those of SARS-CoV and RaTG13 (bat coronavirus)38.944.477.866.7After the production of the 1ab polyprotein, some endopeptidases produce the 1a polyprotein and 16 nonstructural proteins (Snijder et al. 2016). The cleavage products of the 1ab polyprotein carry out a wide range of activities associated with the replication of the virus. The activities include binding and breakdown of ATP to produce ADP and phosphate, and the activities of different endopeptidases lead to the formation of nonstructural proteins (such as nonstructural proteins nsp3 and nsp5); furthermore, ribose-5-phosphate is produced through exonuclease activity, and new nucleotides are synthesized in association with methyltransferase, RNA polymerase and helicase functions for viral replication and prevention of supertwisting, and transcription is regulated through zinc finger proteins (Snijder et al. 2016).
The spike protein
The spike protein of SARS-CoV-2 contains 1273 amino acids, while the spike protein of SARS-CoV contains 1255 amino acids and that of BatCoV RaTG13 contains 1269 amino acids (Table 1). The spike protein of SARS-CoV-2 and that of SARS-CoV has an identity percentage of 76% and a similarity percentage of 86% (Table 2). The identity and similarity percentages of the spike protein of SARS-CoV-2 and the spike protein of RaTG13 are 97.4% and 98.4%, respectively (Table 2) [Annex 2 in ESM]. The identity and similarity percentages of the spike protein of SARS-CoV-2 and RaTG13 are higher than those of the spike protein of SARS-CoV-2 and SARS-CoV.The spike protein of coronaviruses consists of three polypeptide chains with two domains: S1 and S2. The S1 and S2 domains are critical for binding host cell receptors (S1) and for fusing the virus with the membrane of the host cell. There is a hinge region between S1 and S2 that is a target for host cell proteases (Li 2016; Bosch et al. 2003). The spike protein of SARS-CoV-2 has a furin cleavage site in the hinge region. The furin cleavage site is composed of four amino acids (681–684). The presence of the furin cleavage site may be critical for the high transmission rate of SARS-CoV-2 compared to other coronaviruses (Walls et al. 2020).
Orf3a
The accessory protein orf3a of SARS-CoV-2 contains 275 amino acids, and its gene (25393.0.26220) is located between the spike and E protein genes. The orf3a protein of SARS-CoV contains 274 amino acids, while the NS3 of BatCoV RaTG13 is composed of 275 amino acids (Table 1). The amino sequence alignment of orf3a of SARS-CoV-2 and SARS-CoV showed that the sequence identity was 72.4% and that the sequence similarity was 85.1%. The similarity percentage of orf3a in SARS-CoV-2 and SARS-CoV was 90.2, not 85.1% as reported by Yashimito (2020), which may be due to the different software programs used in the two studies (Yoshimoto 2020). Orf3a (SARS-CoV-2) and NS3 (BatCoV RaTG13) were characterized by 97.8% identity and 98.9% similarity (Table 2, Fig. 1).Orf3a plays different roles in the virus including 1) viral envelope assembly and 2) host cell binding and infusion by interacting with the structural proteins (M, S, and E) and the accessory protein (7a) of SARS-CoV (Brunn et al. 2007). In host organisms, the highest immunogenicity of the N-terminus of orf3a is known to have a strong protective effect on humoural immunity (Zhong et al. 2006). Orf3a has a cysteine-rich domain that possesses potassium ion channel activity by interacting with the S and E proteins (Brunn et al. 2007; Zeng et al. 2004). The C-terminus of orf3a arrests the host cell cycle by depleting cyclin D3 and facilitates apoptosis of host cells by interacting with the M protein (Yuan et al. 2007; Marra et al. 2003; Law et al. 2005).
Envelope protein (E protein)
The E protein of SARS-CoV-2, SARS-CoV, and BatCoV RaTG13 consists of 75, 76, and 75 amino acids, respectively (Table 1). The percentages of the identity and similarity of the E protein inSARS-CoV-2 and SARS-CoV are 94.7 and 96.1, respectively; these percentages were 94.7 and 97.4 in Yoshimoto (2020) (Table 2, Fig. 2). The E protein of SARS-CoV-2 and the E protein of BatCoV RaTG13 are 100% identical and similar. The results strongly favour a bat origin of SARS-CoV-2 over a SARS-CoV origin. The E protein contains three domains, the C-terminus, N-terminus, and transmembrane, with different functions in the virus and in host cells (Schoeman and Fielding 2019).The E protein plays different functions in viral replication and the interaction of the virus with host organisms and cells, such as assembly of the virion envelope, suppression of host cell stress responses, facilitation of viral replication and vitality, and as an ion channel to induce the release of virions from host cells (Nieto-Torres et al. 2011; Álvarez et al. 2010; Corse and Machamer 2003; Yuan et al. 2006; Ruch and Machamer 2012).
Membrane protein
The membrane protein (M) of SARS-CoV-2, SARS-CoV, and BatCoV RaTG13 is composed of 222, 221, and 221 amino acids, respectively (Table 1). The percentages of identity and similarity of the amino acid sequences of the M protein of SARS-CoV-2 and SARS-CoV are 90.5 and 96.4, respectively, while those of the M protein of SARS-CoV-2 and BatCoV RaTG13 are both 99.5% (Table 2, Fig. 3). The M protein has three domains, the N-terminus, C-terminus, and transmembrane, with different functions (Neuman et al. 2010).The M protein is important for the assembly, transport, and release of the virus from host cell organelles (Ma et al. 2008; Siu et al. 2008). The M protein of SARS-CoV inhibits the transcription of interferon-1, which leads to the inhibition of the innate immunity of host organisms (Siu et al. 2009).
Orf6
The orf6 protein of SARS-CoV-2 and BatCoV RaTG13 contains 61 amino acids, while it contains 63 amino acids in SARS-CoV (Table 1). The orf6 protein of SARS-CoV-2 and SARS-CoV is characterized by an identity percentage of 68.9% and a similarity percentage of 88.5%. The percentages of orf6 protein identity and similarity in SARS-CoV-2 and BatCoV RaTG13 are both 100% (Table 2, Fig. 4).The functions of orf6 include (1) participation in the formation of replication/transcription to facilitate viral replication, (2) induction of an increase in the number of virions during infection, (3) contribution to virus evasion of the host immune system and (4) involvement in the formation of double-membrane vesicles (DMVs) in host cells to ensure virus assembly (Kumar et al. 2007; Narayanan et al. 2008; Gunalan et al. 2011).
Orf7a
Orf7a of SARS-CoV-2 and BatCoV RaTG13 contains 121 amino acids, while orf7a of SARS-CoV contains 122 amino acids (Table 1). The percentages of orf7a identity and similarity in SARS-CoV-2 and SARS-CoV are 85.2 and 90.2, respectively. The orf7a protein of SARS-CoV-2 and the orf7a protein of the BatCoV RaTG13 share an identity percentage of 97.5% and similarity percentage of 99.2% (Table 2, Fig. 5).Orf 7a of SARS-CoV is a transmembrane protein divided into four regions from the N-terminus: (1) the first 15 amino acids are broken down by the infected host cells; (2) amino acids 16–96 form the intracellular domain; (3) amino acids 97–117 with a collective hydrophobic nature form the transmembrane domain; and 4) the C-terminus consists of the last five amino acids (Liu et al. 2014).Orf7a plays a role in virus binding to and invasion of host cells by interacting with the S, M, E, and orf3a proteins (Narayanan et al. 2008; Tan et al. 2006). Orf7a does not contribute to the replication of the virus (Liu et al. 2014; Tan et al. 2006; Yount et al. 2005; Schaecher et al. 2007). Orf7a plays some roles in host cells, such as triggering apoptosis, downregulating protein synthesis, arresting the cell cycle at the G0/G1 phase, and activating cytokine production (Narayanan et al. 2008; Liu et al. 2014; Tan et al. 2006; Schaecher et al. 2007).
Orf7b
The orf7b protein in both SARS-CoV-2 and BatCoV RaTG13 contains 43 amino acids, while orf7b in SARS-CoV contains 44 amino acids (Table 1). The orf7b protein of SARS-CoV-2 and the orf7b protein of SARS-CoV are characterized by an identity percentage of 85.4 and a similarity percentage of 90.2. On the other hand, the identity and similarity percentages of the orf7b protein of SARS-CoV-2 and that of BatCoV RaTG13 are 97.7% each (Table 2, Fig. 6).The orf7b protein contains three domains: an N-terminal domain (external), a C-terminal domain (in the cytoplasm), and a transmembrane hydrophobic domain (Liu et al. 2014).It has been reported that orf7b is not involved in virus replication (Liu et al. 2014; Tan et al. 2006; Yount et al. 2005; Schaecher et al. 2007). The anti-orf7b antibody concentration is increased in SARS-CoV patients, which shows that orf7b is highly immunogenic and can be used in vaccination trials (Schaecher et al. 2007; Guo et al. 2004).
Orf8
The identity and similarity of orf8 in the SARS-CoV-2 and orf8a in SARS-CoV are 38.9 and 77.8, respectively. The orf8 protein of SARS-CoV-2 is 44.4% identical and 66.7% similar to orf8b of SARS-CoV (Fig. 10). The identity and similarity of SARS-CoV-2 and BatCoV RaTG13 orf8 are 95% and 95.9%, respectively (Table 2, Fig. 7). However, the orf8 protein of SARS-CoV-2 has 121 amino acids compared to 39 and 84 amino acids in the orf8a protein variants of SARS-CoV, and 121 amino acids for orf8 of BatCoV RaTG13 (Table 1).Orf8a and orf8b of SARS-CoV are not needed for viral replication. In host cells, they are localized in vesicle-like structures in mitochondria, the endoplasmic reticulum, cytosol and nucleus of host cells. orf8a and orf8b of SARS-CoV stimulate cellular DNA synthesis and caspase-dependent apoptosis (Keng and Tan 2009).
Nucleocapsid protein (N protein)
The N protein of SARS-CoV-2 and BatCoV RatG13 consists of 419 amino acids, while that of SARS-CoV consists of 422 amino acids (Table 1). The N protein of SARS-CoV-2 and the N protein of SARS-CoV are 90.5% identical and 94.3% similar, while those of SARS-CoV-2 and BatCoV RaTG13 are 99% identical and similar (Table 2, Fig. 8). The N-protein is an RNA-binding protein with three domains: an N-terminal domain that binds RNA, a C-terminal domain critical for dimerization, and a disordered central region rich in serine and arginine (SR) (Kang et al. 2020).The N protein is essential for the formation of helical viral RNA, induction of the replication and transcription of the virus, and control of host cell metabolism to ensure the viral replication process and to regulate host cell apoptosis and the cell cycle (Kang et al. 2020; Cong et al. 2020; Surjit et al. 2006). Moreover, the N protein is very immunogenic and induces the host immune system to respond against SARS-CoV (Lin et al. 2003).
Conclusion
The SARS-CoV-2 proteins and the BatCoV RaTG13 share high identity and similarity compared to the SARS-CoV-2 and SARS-CoV proteins. The findings of this study proved the usefulness of determining the percentages of protein identity and similarity in determining the origin of viruses.Below is the link to the electronic supplementary material.Supplementary file1 (PDF 184 KB)Supplementary file2 (PDF 68 KB)
Authors: Neeltje van Doremalen; Trenton Bushmaker; Dylan H Morris; Myndi G Holbrook; Amandine Gamble; Brandi N Williamson; Azaibi Tamin; Jennifer L Harcourt; Natalie J Thornburg; Susan I Gerber; James O Lloyd-Smith; Emmie de Wit; Vincent J Munster Journal: N Engl J Med Date: 2020-03-17 Impact factor: 91.245