Sk Sarif Hassan1, Pabitra Pal Choudhury2, Guy W Dayhoff3, Alaa A A Aljabali4, Bruce D Uhal5, Kenneth Lundstrom6, Nima Rezaei7, Damiano Pizzol8, Parise Adadi9, Amos Lal10, Antonio Soares11, Tarek Mohamed Abd El-Aziz12, Adam M Brufsky13, Gajendra Kumar Azad14, Samendra P Sherchan15, Wagner Baetas-da-Cruz16, Kazuo Takayama17, Ãngel Serrano-Aroca18, Gaurav Chauhan19, Giorgio Palu20, Yogendra Kumar Mishra21, Debmalya Barh22, Raner Jośe Santana Silva23, Bruno Silva Andrade24, Vasco Azevedo25, Aristóteles Góes-Neto26, Nicolas G Bazan27, Elrashdy M Redwan28, Murtaza Tambuwala29, Vladimir N Uversky30. 1. Department of Mathematics, Pingla Thana Mahavidyalaya, Maligram, 721140, India. Electronic address: sarimif@gmail.com. 2. Applied Statistics Unit, Indian Statistical Institute, Kolkata, 700108, West Bengal, India. 3. Department of Chemistry, College of Art and Sciences, University of South Florida, Tampa, FL, 33620, USA. 4. Department of Pharmaceutics and Pharmaceutical Technology, Yarmouk University-Faculty of Pharmacy, Irbid, 566, Jordan. 5. Department of Physiology, Michigan State University, East Lansing, MI, 48824, USA. 6. PanTherapeutics, Rte de Lavaux 49, CH1095, Lutry, Switzerland. Electronic address: lundstromkenneth@gmail.com. 7. Research Center for Immunodeficiencies, Pediatrics Center of Excellence, Children's Medical Center, Tehran University of Medical Sciences, Tehran, Iran; Network of Immunity in Infection, Malignancy and Autoimmunity (NIIMA), Universal Scientific Education and Research Network (USERN), Stockholm, Sweden. 8. Italian Agency for Development Cooperation - Khartoum, Sudan Street 33, Al Amarat, Sudan. 9. Department of Food Science, University of Otago, Dunedin, 9054, New Zealand. 10. Division of Pulmonary and Critical Care Medicine, Mayo Clinic, Rochester, MN, USA. 11. Department of Cellular and Integrative Physiology, University of Texas Health Science Center at San Antonio, 7703 Floyd Curl Dr, San Antonio, TX, 78229-3900, USA. 12. Department of Cellular and Integrative Physiology, University of Texas Health Science Center at San Antonio, 7703 Floyd Curl Dr, San Antonio, TX, 78229-3900, USA; Zoology Department, Faculty of Science, Minia University, El-Minia, 61519, Egypt. 13. University of Pittsburgh School of Medicine, Department of Medicine, Division of Hematology/Oncology, UPMC Hillman Cancer Center, Pittsburgh, PA, USA. 14. Department of Zoology, Patna University, Patna, 800005, Bihar, India. 15. Department of Environmental Health Sciences, Tulane University, New Orleans, LA, 70112, USA. 16. Translational Laboratory in Molecular Physiology, Centre for Experimental Surgery, College of Medicine, Federal University of Rio de Janeiro (UFRJ), Rio de Janeiro, Brazil. 17. Center for iPS Cell Research and Application, Kyoto University, Japan. 18. Biomaterial and Bioengineering Lab, Translational Research Centre San Alberto Magno, Catholic University of Valencia San Vicente M'artir, c/Guillem de Castro 94, 46001, Valencia, Spain. 19. School of Engineering and Sciences, Tecnologico de Monterrey, Av. Eugenio Garza Sada 2501 Sur, 64849, Monterrey, Nuevo León, Mexico. 20. Department of Molecular Medicine, University of Padova, Via Gabelli 63, 35121, Padova, Italy. 21. University of Southern Denmark, Mads Clausen Institute, NanoSYD, Alsion 2, 6400, Sønderborg, Denmark. 22. Centre for Genomics and Applied Gene Technology, Institute of Integrative Omics and Applied Biotechnology (IIOAB), Nonakuri, Purba Medinipur, WB, India; Departamento de Genética, Ecologia e Evolucao, Instituto de Cîencias Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil. 23. Departamento de Ciencias Biologicas (DCB), Programa de Pos-Graduacao em Genetica e Biologia Molecular (PPGGBM), Universidade Estadual de Santa Cruz (UESC), Rodovia Ilheus-Itabuna, km 16, 45662-900, Ilheus, BA, Brazil. 24. Laboratório de Bioinformática e Química Computacional, Departamento de Ciências Biológicas, Universidade Estadual do Sudoeste da Bahia (UESB), Jequié, 45206-190, Brazil. 25. Departamento de Genética, Ecologia e Evolucao, Instituto de Cîencias Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil. 26. Laboratório de Biologia Molecular e Computacional de Fungos, Departamento de Microbiologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais (UFMG), Belo Horizonte, Minas Gerais, Brazil. 27. Neuroscience Center of Excellence, School of Medicine, LSU Health New Orleans, New Orleans, LA, 70112, USA. 28. King Abdulaz University, Faculty of Science, Department of Biological Science, Saudi Arabia. 29. School of Pharmacy and Pharmaceutical Science, Ulster University, Coleraine, BT52 1SA, Northern Ireland, UK. 30. Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL, 33612, USA; Center for Molecular Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology, Institutskiy pereulok, 9, Dolgoprudny, 141700, Moscow region, Russia. Electronic address: vuversky@usf.edu.
Abstract
The coronavirus disease 2019 (COVID-19) is caused by the Severe Acute Respiratory Syndrome Coronavirus-2 (SARS- CoV-2) with an estimated fatality rate of less than 1%. The SARS-CoV-2 accessory proteins ORF3a, ORF6, ORF7a, ORF7b, ORF8, and ORF10 possess putative functions to manipulate host immune mechanisms. These involve interferons, which appear as a consensus function, immune signaling receptor NLRP3 (NLR family pyrin domain-containing 3) inflammasome, and inflammatory cytokines such as interleukin 1β (IL-1β) and are critical in COVID-19 pathology. Outspread variations of each of the six accessory proteins were observed across six continents of all complete SARS-CoV-2 proteomes based on the data reported before November 2020. A decreasing order of percentage of unique variations in the accessory proteins was determined as ORF3a > ORF8 > ORF7a > ORF6 > ORF10 > ORF7b across all continents. The highest and lowest unique variations of ORF3a were observed in South America and Oceania, respectively. These findings suggest that the wide variations in accessory proteins seem to affect the pathogenicity of SARS-CoV-2.
The coronavirus disease 2019 (COVID-19) is caused by the Severe Acute Respiratory Syndrome Coronavirus-2 (SARS- CoV-2) with an estimated fatality rate of less than 1%. The SARS-CoV-2 accessory proteins ORF3a, ORF6, ORF7a, ORF7b, ORF8, and ORF10 possess putative functions to manipulate host immune mechanisms. These involve interferons, which appear as a consensus function, immune signaling receptor NLRP3 (NLR family pyrin domain-containing 3) inflammasome, and inflammatory cytokines such as interleukin 1β (IL-1β) and are critical in COVID-19 pathology. Outspread variations of each of the six accessory proteins were observed across six continents of all complete SARS-CoV-2 proteomes based on the data reported before November 2020. A decreasing order of percentage of unique variations in the accessory proteins was determined as ORF3a > ORF8 > ORF7a > ORF6 > ORF10 > ORF7b across all continents. The highest and lowest unique variations of ORF3a were observed in South America and Oceania, respectively. These findings suggest that the wide variations in accessory proteins seem to affect the pathogenicity of SARS-CoV-2.
SARS-CoV-2 accessory proteins ORF3a, ORF6, ORF7a, ORF7b, ORF8, and ORF10 have putative functions to manipulate the host immune system.Inflammatory cytokines, such as interleukin 1β (IL-1β), IL-6, and TNF are critical in COVID-19 pathology.Extensive heterogeneity was found around six continents for each of the six accessory proteins of all the sequenced SARS-CoV-2 proteomes
Introduction
SARS-CoV-2 (Severe Acute Respiratory Syndrome Coronavirus-2) is the causative agent of the coronavirus disease 2019 (COVID-19) pandemic with an estimated fatality rate of less than 1% [1]. However, Dr Michael Ryan, Executive Director of the Health Emergencies Program at the World Health Organization (WHO), indicated in October 2020 that 760 million people might have been infected by SARS-CoV-2, which gives a hypothetical fatality rate of 0.14%, with approximately one million lives lost. SARS-CoV-2 is a member of the Betacoronavirus (lineage B) genus. The Sarbecovirus subgenus was suggested to had diverged from the lineage of Bat Coronavirus (BatCoV) RaTG13 in 1969 with the 95% highest posterior density interval of the years 1930–2000 [2]. Among previously identified human coronaviruses (HCoVs), Severe Acute Respiratory Syndrome-Coronavirus (SARS-CoV) causing the SARS epidemic in 2002–2004 is the closest member to SARS-CoV-2 [2,3]. SARS-CoV possesses eight open reading frames (ORFs), ORF3a, ORF3b, ORF6, ORF7a, ORF7b, ORF8a, ORF8b, and ORF9b, which were suggested to have more intrinsic and secondary roles other than the primary roles described for cellular entry in the viral life cycle [4,5]. For instance, the ORFs are transcribed throughout the second phase of replication by the positive strand subgenomic mRNA using a negative-sense viral RNA template [6]. Thus, due to their intrinsic nature, accessory proteins are not targets for positive-selection such as the extrinsic and primary functional Spike (S) protein containing the receptor-binding domain (RBD) and protease cleavage sites [7]. High-frequency non-synonymous mutations, such as D614G in the S protein detected in clinical SARS-CoV-2 isolates have increased host cell entry via the angiotensin converting enzyme 2 (ACE2) receptor and transmembrane protease serine 2 (TMPRSS2) [8]. Therefore, due to the intrinsic nature and secondary order in viral transcription, a less selective pressure to induce mutations in accessory proteins is expected. Thus, despite the 19–89 years of estimated genomic divergence between RaTG13 and SARS-CoV-2, the sequence identity between their accessory proteins is very high, being 98.5% for ORF3, 100% for ORF6, 97.5% for ORF7a, 97.6% for ORF7b, 95% for ORF8, and 100% for ORF10. This is indicative of that somehow the direct ancestor of SARS-CoV-2 had been exposed to almost no selection pressure to manipulate its intermediate host immunity for many years until the primary human infection occurred in Wuhan in 2019 (Fig. 1, Fig. 2, Fig. 3, Fig. 4, Fig. 5, Fig. 6
) [2]. SARS-CoV-2 and SARS-CoV accessory proteins have differences such as the putative ORF10 protein missing from SARS-CoV and the absence of the ORF3b and ORF9b proteins in SARS-CoV-2 [9,10]. Very little is known about the functions of the accessory proteins of SARS-CoV-2, although crystal or cryo-EM structures were solved for some of them. Examples are given by the Cryo-EM structure of SARS-CoV-2 ORF3a ion channel in lipid nanodiscs (PDB ID: 7KJR) {Kern, 2021 #58}, the X-ray crystal structure of the SARS-CoV-2 ORF7a ectodomain (PDB ID: 7CI3) {Zhou, 2021 #59}, and the crystal structure of the dimeric form of SARS-CoV-2 ORF8 accessory protein (PDB ID: 7JTL) {Flower, 2021 #61}.
Fig. 1
ClustalW alignment of SARS-CoV-2 and RaTG13 ORF3 proteins shows 98.5% sequence identity.
Fig. 2
ClustalW alignment of SARS-CoV-2 (NCBI GenBank ID BCA87365.1) and RaTG13 (NCBI GenBank ID MN996532.2, translated 5′3′ frame 1) ORF6 proteins show 100% sequence identity, despite up to 89 years of genetic diversion.
Fig. 3
ClustalW alignment of SARS-CoV-2 (NCBI GenBank ID BCA87366.1) and RaTG13 (NCBI GenBank ID MN996532.2, translated 5′3′ frame 2) The ORF7a proteins show 97.5% sequence identity, despite up to 89 years of genetic diversion.
Fig. 4
ClustalW alignment of SARS-CoV-2 (NCBI GenBank ID BCB15096.1) and Ratg13 (NCBI GenBank ID MN996532.2, translated 5′3′ frame 2) ORF7b proteins shows 97.6% sequence identity, despite up to 89 years of genetic diversion.
Fig. 5
ClustalW alignment of SARS-CoV-2 (NCBI GenBank ID BCA87366.1) and RaTG13 (NCBI GenBank ID MN996532.2, translated 5′3′ frame 2) ORF8 proteins show a 95% sequence identity, despite up to 89 years of genetic diversion.
Fig. 6
ClustalW alignment of SARS-CoV-2 (NCBI GenBank ID BCA87369.1) and RaTG13 (NCBI GenBank ID MN996532.2, translated 5′3′ frame 2) ORF10 proteins show a 97.3% sequence identity, despite up to 89 years of genetic diversion.
ClustalW alignment of SARS-CoV-2 and RaTG13 ORF3 proteins shows 98.5% sequence identity.The objectives of the present study were to depict the unique variability of all accessory proteins and their possible contributions to virus pathogenicity.
Materials and methods
Data acquisition
Sequences for accessory proteins ORF3a, ORF6, ORF7a, ORF7b, ORF8, and ORF10 were downloaded from the complete SARS-CoV-2 proteomes on the National Center for Biotechnology Information (NCBI) database (http://www.ncbi.nlm.nih.gov/)(Table 1
).
Table 1
Total number of six accessory proteins of complete SARS-CoV-2 proteomes.
Proteins
Africa
Asia
Europe
North America
Oceania
South America
ORF3a
280
1175
442
12734
4106
122
ORF6
280
1181
441
12732
4106
122
ORF10
280
1174
442
12733
4106
122
ORF7a
280
1179
440
12723
4106
122
ORF7b
280
1138
436
12568
4106
121
ORF8
280
1172
442
12726
4106
122
Note that all partial accessory proteins and sequences with ambiguous amino acids were excluded from the present study.
Total number of six accessory proteins of complete SARS-CoV-2 proteomes.Note that all partial accessory proteins and sequences with ambiguous amino acids were excluded from the present study.Furthermore, the unique accessory protein sequences were extracted for each continent. The unique protein accessions were renamed for each accessory protein as S1, S2, … etc., as shown in the Supplementary Tables (S1–S6). There were 510, 72, 158, 37, 190, and 44 unique accessory proteins available for ORF3a, ORF6, ORF7a, ORF7b, ORF8, and ORF10, respectively. For each continent, ranges and names of sequences are presented in Table 2
.
Table 2
Ranges and naming of unique sequences (continent-wise) for each accessory protein of SARS-CoV-2.
Continent
ORF3a
ORF6
ORf7a
ORF7b
ORF8
ORF10
Africa
S1 to S7
S1 to S3
S1 to S6
S1 to S2
S1 to S5
S1
Asia
S8 to S85
S4 to S13
S7 to S25
S3 to S9
S6 to S31
S2 to S8
Europe
S86 to S115
S14 to S19
S26
S10 to S11
S32 to S41
S9 to S12
North America
S116 to S442
S20 to S58
S27 to S126
S12 to S30
S42 to S165
S13 to S36
Oceania
S443 to S495
S59 to S69
S127 to S153
S31 to S36
S166 to S186
S37 to S42
South America
S496 to S510
S70 to S72
S154 to S158
S37
S187 to S190
S43 to S44
Ranges and naming of unique sequences (continent-wise) for each accessory protein of SARS-CoV-2.
Evaluation of the per-residue predisposition of SARS-CoV-2 accessory proteins and their natural variants for intrinsic disorder
Per-residue disorder distribution within the amino acid sequences of SARS-CoV-2 accessory proteins ORF3a, ORF6, ORF7a, ORF7b, ORF8, and ORF10 and their natural variants was evaluated by PONDR® VSL2, which is one of the more accurate standalone disorder predictors [[11], [12], [13], [14]]. The per-residue disorder predisposition scores are on a scale from 0.0 to 1.0, where 0.0 indicates fully ordered residues, and 1.0 indicates fully disordered residues. Values above the threshold of 0.5 are considered disordered residues, whereas residues with disorder scores between 0.25 and 0.5 are considered highly flexible, and residues with disorder scores between 0.1 and 0.25 are listed as moderately flexible.
Phylogenetic analysis
In a first step, the SARS-CoV-2 amino acid sequences of each ORF were initially filtered to remove redundant sequences and sequences with low quality (unknown amino acids “X”) by using the SeqKit program [15], with the tools fx2tab and rmdup. At this stage, the sequences which presented one or more “X” characters in their composition were removed, as well as redundant sequences (100% identical). Thereafter, amino acid sequences of each ORF group were aligned using the MegaX program [16], applying the MUSCLE algorithm for selection [17]. For all phylogeny estimation the Neighbor-joining method was used, as well as each input alignment was submitted to the phyloXML [18] program, with the multiple alignment inference option, maximum allowed gaps ratio 0.5 and minimum allowed non-gap sequence length 50 with distance calculator Kimura correction. In a last step, phylogenetic trees were analyzed and edited using the phyloXML tool [18].
Results and discussion
The essential known features of the six accessory proteins from SARS-CoV-2 are summarized below.ORF3a protein: The ORF3a is the largest SARS-CoV-2 accessory protein (275 amino acids long). It has 72.4% sequence identity with SARS-CoV ORF3a protein and 98.5% sequence identity with the Bat-CoV RaTG13 ORF3a protein [19,20] (Fig. 1). ORF3a is involved in virulence, infectivity, ion channel activity, morphogenesis, and virus release [21]. In SARS-CoV, ORF3a is a multifunctional protein co-localized with the E, M, and S proteins, forming a homo-tetrameric complex as a potassium-ion channel on the host cell membrane during viral assembly [5]. In SARS-CoV-2, the function of the ion-channel proteins (viroporins) ORF3a, ORF8a, and E is critical in tissue inflammation caused by CoVs [6].Viroporin-mediated lysosomal disruption, and ion-redistribution activate the innate immune signaling receptor NLRP3 (NLR family pyrin domain-containing 3) inflammasome that leads to the expression of inflammatory cytokines such as interleukin 1β (IL-1β), IL-6, and tumor necrosis factor (TNF), causing tissue inflammation during respiratory illness [6] From another pathway, ORF3a interacts with TNF receptor-associated factor (TRAF3) protein with its protein binding domains, which leads to ASC ubiquitination, caspase 1 activation, and IL-1β maturation [22].Additionally, ORF3a and ORF7a combined with E, S, NSP1 proteins, and MAPK pathway proteins (MAPK8, MAPK14, and MAP3K7) trigger proinflammatory cytokine signaling transcription factors such as STAT1, STAT2, IRF9, and NFKB1 [6]. Additionally, the SARS-CoV-2 ORF3a protein interacts with heme oxygenase-1 (HMOX1) that has a role in heme catabolism and the anti-inflammatory system [6]. ORF3a inhibits cGAS-STING in chicken, mouse and man in a unique fashion and blocks the nuclear accumulation of p65 to inhibit nuclear factor-κB signaling.Due to more effective innate immune suppression, it may allow more efficient SARS-CoV-2 replication in vivo. However, ORF3a was ineffective against the pathways associated with the RIG-I-like receptors (RLRs, which are a family of cytosolic pattern recognition receptors that are essential for detecting viral RNA and initiating the innate immune response) in contrast to the SARS-CoV-2 N protein, which showed strong inhibition of the RLR pathway [23]. The ion channel activity of the SARS-CoV-2 ORF3a, E and M proteins interferes with apoptotic pathways [19]. In a similar scenario, ORF3a of SARS-CoV increases the mRNA expression levels of all three subunits of fibrinogen, thus promoting fibrosis, one of the serious pathogenic aspects of SARS [24]. The expression of NFκB, IL8, and JNK, all involved in the inflammatory responses are also enhanced. Both SARS-CoV-2 ORF3a and ORF3b have showed ability to antagonize type-I interferon activation [25]. Interestingly, potent and durable antibody responses against IFN antagonist SARS-CoV-2 ORF3a, ORF3b, ORF7a and ORF8 proteins have been detected in children [26], which may explain why children are more resistant to SARS-CoV-2 infections [27]. However, it also raises the question, whether the mutations/truncations associated with those accessory proteins will influence the resistance seen in children? Similar to ORF8, ORDF3b is an immune-dominant protein that has been shown to induce high levels of antibody production during SARS-CoV-2 infections [28]. Sequence analysis of ORF3b identified a natural variant with a longer ORF3b reading frame in two patients with severe COVID-19, which enhanced interferon suppression and was potentially linked to viral pathogenesis and severity of COVID-19 [29].ORF6 protein: SARS-CoV-2 ORF6 is a 61 amino acid long membrane-associated interferon (IFN) antagonist protein. ORF6 interacts with the karyopherin import complex that limits transcription factor STAT1, which down-regulates the IFN pathway [5]. ORF6 is internalized from the plasma membrane into endosomal vesicles. The SARS-CoV-2 ORF6 has a 68.9% sequence identity with the SARS-CoV ORF6 protein and a 100% sequence identity with the BatCoV RaTG13 ORF6 protein [5] (Fig. 2
). SARS-CoV ORF6 and ORF3a, in association with other proteins such as M, NSP1 and NSP3 inhibit IRF3 signaling, repress interferon expression and stimulate the degradation of IFNAR1 and STAT1 [6].ClustalW alignment of SARS-CoV-2 (NCBI GenBank ID BCA87365.1) and RaTG13 (NCBI GenBank ID MN996532.2, translated 5′3′ frame 1) ORF6 proteins show 100% sequence identity, despite up to 89 years of genetic diversion.The SARS-CoV-2 ORF6 interacts with the NSP8 protein, and it can increase early infection at a low multiplicity with an increase in RNA polymerase activity [30]. It has been reported that ORF6 and ORF8 can inhibit the type-I IFN signaling pathway [30]. The ORF6 protein with the lysosomal targeting motif (YSEL) and diacidic motif (DDEE) induces intracellular membrane rearrangements resulting in a vesicular population and endosomal internalization of viral protein into infected cells increasing replication [31].ORF7a and ORF7b proteins: ORF7a, a 121 aa type I transmembrane protein, interacts with SARS-CoV-2 structural proteins M, E, and S, which are essential for viral assembly. Hence, ORF7a is involved in viral replication, and virion-associated ORF7a protein may function during early infection. It has an 85.2% sequence identity with the SARS-CoV ORF7a protein and has a 97.5% sequence identity with BatCoV RaTG13 ORF7a protein [5] (Fig. 3).ClustalW alignment of SARS-CoV-2 (NCBI GenBank ID BCA87366.1) and RaTG13 (NCBI GenBank ID MN996532.2, translated 5′3′ frame 2) The ORF7a proteins show 97.5% sequence identity, despite up to 89 years of genetic diversion.ORF7a interacts with the SARS-CoV-2 M, E and S structural proteins, which are essential for viral assembly, and hence ORF7a is involved in viral replication, and virion-associated ORF7a protein may function during early infection [[32], [33], [34]]. ORF7a induces pro-inflammatory cytokines and chemokines, such as IL-8 and RANTES [5]. SARS-CoV-2 ORF7a in combination with the E protein activates apoptosis by suppressing anti-apoptotic proteins [6]. While ORF7b is a 43 aa protein found in association with intracellular viral particles, it is also present in purified virions in the Golgi compartment. The SARS-CoV-2 ORF7b has an 85.4% sequence identity with SARS-CoV ORF7b protein and has a 97.6% sequence identity, with BatCoV RaTG13 ORF7a protein [5] (Fig. 4
).ClustalW alignment of SARS-CoV-2 (NCBI GenBank ID BCB15096.1) and Ratg13 (NCBI GenBank ID MN996532.2, translated 5′3′ frame 2) ORF7b proteins shows 97.6% sequence identity, despite up to 89 years of genetic diversion.ORF7b is found associated with intracellular viral particles and purified virions. To date, there is extraordinarily little experimental evidence to support a role for ORF7a or ORF7b in SARS-CoV-2 replication [32].ORF8 protein: ORF8 is a unique 121 aa long accessory protein in SARS-CoV-2, and it stands out by being poorly conserved among other CoVs, accordingly showing structural changes suggested to be related to the ability of virus spread [35]. ORF8 sequences of SARS-CoV-2 and RaTG13 share a 95% amino acid identity (Fig. 5
).ClustalW alignment of SARS-CoV-2 (NCBI GenBank ID BCA87366.1) and RaTG13 (NCBI GenBank ID MN996532.2, translated 5′3′ frame 2) ORF8 proteins show a 95% sequence identity, despite up to 89 years of genetic diversion.ClustalW alignment of SARS-CoV-2 (NCBI GenBank ID BCA87369.1) and RaTG13 (NCBI GenBank ID MN996532.2, translated 5′3′ frame 2) ORF10 proteins show a 97.3% sequence identity, despite up to 89 years of genetic diversion.SARS-CoV-2 ORF8 interacts with the major histocompatibility complex (MHC) class-I molecules and down-regulates their surface expression on various cell types [36]. It has been reported earlier that inhibition of ORF8 could be a strategy to improve the special immune surveillance and to accelerate the eradication of SARS-CoV-2 in vivo [37].ORF10 protein: The 38 aa long ORF10 accessory protein has been reported to be unique for SARS-CoV-2 containing eleven cytotoxic T lymphocyte (CTL) epitopes of nine amino acids each in length, across various human leukocyte antigen (HLA) subtypes [38,39]. ORF10 negatively affects the antiviral protein degradation process through its interaction with the Cul2 ubiquitin ligase complex [6]. The ORF10 protein is missing in SARS-CoV, but SARS-COV-2 ORF10 and RaTG13 ORF10 have a 97.3% sequence identity [40] (Fig. 6).For every continent, the total number of accessory proteins and the total number of unique sequences with respective percentages are presented in Fig. 7
. In summary, for all six continents, the total number of unique ORF3a, ORF6, ORF7a, ORF7b, ORF8, and ORF10 accessory protein sequences are 419, 55, 122, 26, 147, and 32, respectively (Supplementary Figure S1). Furthermore, the percentage of unique sequences on each continent among all available accessory proteins are also enumerated (Fig. 7).
Fig. 7
Number of unique accessory proteins across six continents.
Number of unique accessory proteins across six continents.The percentages of each accessory protein across the six continents are presented as bar diagrams in Fig. 8
. The following observations were drawn from Fig. 8. Across all continents, the decreasing order of percentage of unique variations in the accessory proteins was observed as follows: ORF3a > ORF8 > ORF7a > ORF6 > ORF10 > ORF7b. The highest and lowest unique variations of ORF3a were observed in South America and Oceania, respectively. In addition, the highest percentage (statistically significant) of unique variations in each accessory protein was observed in South America. The lowest percentage of unique variations among ORF3a, ORF6, ORF7b, and ORF8 was observed in Oceania. It is worth noticing that the smallest number of unique variations of ORF7b and ORF7a was seen in North America and Europe, respectively. It is further noted that in Europe, the lowest variations among all accessory proteins were found in ORF7a. The smallest percentage of unique ORF10 variations was found in Oceania. With regards to the total unique variations across all accessory proteins of SARS-CoV-2, the decreasing order would be in South America > Asia > Europe > Africa > North America > Oceania.
Fig. 8
Bar representations of percentages of continental variations (A), and the percentage of unique accessory proteins (B).
Bar representations of percentages of continental variations (A), and the percentage of unique accessory proteins (B).ORF3a possessed the highest significant amount of unique variations across all six continents, while ORF10 showed the lowest variations in Africa, Asia, and Oceania. The lowest unique variations of ORF7b were observed in North America and South America.The percentage of unique accessory proteins among all unique sequences obtained across the six continents is represented as bar diagrams in Fig. 9
.
Fig. 9
Quantitative information of the accessory proteins.
Quantitative information of the accessory proteins.Among all available unique variations of the six accessory proteins of SARS-CoV-2, North America and South America exhibited the highest and lowest percentage of each accessory protein variation, respectively. The smallest number of unique variations of ORF3a, ORF6, and ORF10 were noticed in Africa. On the other hand, South America showed the lowest number of unique ORF6, ORF7a, ORF7b, and ORF8. Regarding ORF7b, the highest number of unique variations compared to the rest of the accessory proteins were observed in Africa, Asia, and Oceania. Furthermore, the highest percentage (84.35%) and lowest (0.82%) of unique variations of ORF8 and ORF7a (among all accessory proteins) were found in North America and Europe, respectively.Fig. 10 represents the continent-wise lists of identical sequences for each accessory protein. The following observations were made for each accessory protein based on the data shown (Fig. 10).
Fig. 10
Identical pairs of accessory protein sequences across all continents.
Identical pairs of accessory protein sequences across all continents.ORF3a: Note that the mutations described below were determined based on the Wuhan ORF3a sequence (YP 009724391). There were only two ORF3a sequences (marked in red), S2 (Africa, QOI60359) and S5 (Africa, QOI60335), which were present on all six continents.Note that the S2 (Africa-ORF3a) was identical with ORF3a (YP 009724391) from Wuhan, China. The other sequence, S5, is different from ORF3a (YP 009724391) by one missense mutation Q57H, a strain-determining mutation [41]. It was found that the ORF3a sequence S54 (Asia: QKK14624) possesses the single T175I mutation and is present on all continents except in Africa. The ORF3a sequences S62 (Asia: QMJ01306) and S63 (Asia: QJQ04482) possessed a single mutation each, G251V and G196V, respectively, compared to the Wuhan ORF3a (YP 009724391). These two sequences were present in Asia, Europe, North America, Oceania, and South America. The ORF3a sequence S4 (Africa: QLQ87565) has the single S171L mutation found on four continents, excluding Europe and Oceania. Two mutations, Q57H and D155Y, in sequence S34 (Asia), were present only on three continents, Asia, Europe, and North America. Sequence S53 (Asia) with the G172C mutation was found in Asia, Europe, and North America.The deletion mutation V255 occurred in S59 (Asia), which was found in Asia, Oceania, and South America. S68 (Asia) and S69 (Asia) possessed two mutations, H93Y and K67 N, respectively. These two ORF3a variants have been detected only on three continents, Asia, North America, and Oceania.The ORF3a sequence S103 containing the single T229I mutation is present only on three continents, Europe, North America, and Oceania. Another sequence, S104, with the P240L mutation has been detected only in Europe, North America, and South America. The V13L mutation was found in sequence S122 (ORF3a, North America) and is present on three continents, Oceania, North America, and South America. Further, there were 57 unique ORF3a variants detected only on two continents as listed in Table 3
.
Table 3
List of ORF3a sequences and their distribution over only two continents.
Sequence
Mutation(s)
Present in the continent(s)
Sequence
Mutation(s)
Present in the continent(s)
S7
D2G
Asia and North America
S37
Q57H, A103S
Asia and North America
S8
L15F, Q57H
Asia and North America
S46
L108F
Asia and North America
S9
T32I
Asia and Oceania
S48
W131C
Asia and North America
S12
S40L, Q57H
Asia and North America
S49
L140F
Asia and North America
S13
L41F
Asia and North America
S50
W149L
Asia and North America
S17
V48F
Asia and Europe
S51
T151I
Asia and North America
S23
Q57H, W131C
Asia and North America
S58
DEL(V255), N257D
Asia and North America
S25
Q57H, S166L
Asia and North America
S65
G172V
Asia and North America
S26
Q57H, S171L
Asia and North America
S66
D155Y
Asia and North America
S27
Q57H, T175I
Asia and North America
S67
A99V
Asia and North America
S28
Q57H, S216P
Asia and Europe
S70
K66 N
Asia and North America
Sequence
Mutation(s)
Present in Continent(s)
Sequence
Mutation(s)
Present in Continent(s)
S71
A54S, Q57H
Asia and North America
S167
V55G
North America and Oceania
S72
A54S
Asia and North America
S186
Q57H, L101F
North America and Oceania
S74
G49V
Asia and North America
S199
Q57H, L140F
North America and Oceania
S77
I35T, Q57H
Asia and North America
S289
G100C
North America and Oceania
S79
D22Y
Asia and North America
S295
V112F
North America and Oceania
S82
G18V, Q57H
Asia and North America
S312
L147F
North America and South America
S83
G18V
Asia and North America
S319
S166L
North America and Oceania
S84
K16 N, Q57H
Asia and North America
S321
S171L
North America and South America
S89
V55F
Europe and North America
S325
S177I
North America and Oceania
S92
Q57H, V237F
Europe and North America
S334
T223I
North America and Oceania
S94
Q57H, D155Y
Europe and North America
S338
T229I
North America and Oceania
S95
Q57H, A99V
Europe and North America
S341
P240L
North America and South America
S100
G172C
Europe and North America
S378
A110S
North America and South America
S113
A39S
Europe and North America
S385
H93Y
North America and Oceania
S115
A33S, Q57H
Europe and North America
S388
H78Y
North America and Oceania
S137
S26L
North America and Oceania
S390
K67 N
North America and Oceania
S155
L46F
North America and Oceania
S444
V13L
Oceania and South America
S163
L53F
North America and Oceania
List of ORF3a sequences and their distribution over only two continents.Fig. 11 represents a phylogenetic tree for SARS-CoV-2 ORF3a proteins. This ORF3a tree was composed by the alignment of 419 sequences, and its resultant phylogeny shows that there are no well-defined patterns for the grouping of sequences, as well as it is possibly not showing evolutionary relationships, but random mutation events. These results show that ORF3 does not seem to represent a target for selection pressure and, therefore, phylogenetic analysis of this protein does not provide noticeable grounds for making associations or evolutionary and/or lineage relationships between the strains.
Fig. 11
SARS-CoV-2 ORF3a amino acid phylogeny after group clustering.
SARS-CoV-2 ORF3a amino acid phylogeny after group clustering.ORF6: Note that the mutations described below were determined based on the Wuhan ORF6 sequence (YP 009724394). The sequence S2 (ORF6, Africa) was identical with YP 009724394 (China, Wuhan) ORF6, and this sequence was present on all six continents, whereas the ORF6 sequence, S10 (ORF6, Asia) with only the D53Y mutation, was found only in Asia, North America, and Oceania. The ORF6 sequences S38 (ORF6, North America) and S50 (ORF6, North America) possess a single mutation each, D2L and I33T, respectively, found on three continents, North America, Oceania, and South America. The ORF6 unique variant S7 (ORF6, Asia) possesses the E13D mutation found only in Asia and North America. The ORF6 sequence S12 (ORF6, Asia) possess a set of deletions,” FKVSIWNLD” (22–30 aa), and it appeared in Asia and North America only. The sequence S17 (ORF6, Europe) had the D61Y mutation, and it was found in Europe and North America. In addition, a single mutation H3Y occurred in S19 (ORF6, Europe), which was present in Europe and North America. The ORF6 sequence S27 (ORF6, North America) containing the W27L mutation was found in North America and Oceania only. Furthermore, the sequence S36 (ORF6, North America) with the D61H mutation was present in North America and Oceania only.Fig. 12 represents a phylogenetic tree for the ORF6 protein. This tree was constructed by the alignment of 55 sequences, and it was possible to identify four very distinct groups. On the other hand, most sequences did not present a clear grouping.
Fig. 12
SARS-CoV-2 ORF6 amino acid phylogeny after group clustering. Phylogenetic analysis identified four well-defined groups.
SARS-CoV-2 ORF6 amino acid phylogeny after group clustering. Phylogenetic analysis identified four well-defined groups.ORF7a: Mutations are based on the Wuhan ORF7a sequence (YP 009724395). The Wuhan ORF7a sequence YP 009724395 was found on all continents. V104F was found in S2 (ORF7a, Africa) in Africa, Asia, North America, and Oceania. The sequence S1 (ORF7a, Africa) had the P39L mutation, which was found in Africa, North America, and South America. S37F was found in the sequence S7 (ORF7a, Asia) in Asia, North America, and Oceania. The sequence S18 (ORF7a, Asia) has the A105V mutation found across Asia, North America, and Oceania. G38V was found in S24 (ORF7a, Asia) in Asia, North America, and Oceania. Also, there were 21 unique ORF7a variants present only on two continents. All mutations are listed in Table 4
.
Table 4
List of ORF7a sequences and their distribution over only two continents.
Sequence
Mutation(s)
Present in the continent(s)
Sequence
Mutation(s)
Present in the continent(s)
S10
V71I
Asia and North America
S49
S81L
North America and Oceania
S12
Q94H
Asia and Oceania
S52
S83L
North America and Oceania
S14
L116F
Asia and North America
S54
V93F
North America and Oceania
S15
T120I
Asia and North America
S57
L96F
North America and Oceania
S21
C67Y
Asia and North America
S61
P99L
North America and South America
S25
A13T
Asia and North America
S81
E95Q
North America and Oceania
S34
T28I
North America and Oceania
S90
H73Y
North America and Oceania
S35
V29L
North America and South America
S107
H47Y
North America and Oceania
S41
T39I
North America and Oceania
S113
P34S
North America and Oceania
S47
Q76H
North America and Oceania
S124
A8V
North America and South America
S48
R79C
North America and Oceania
List of ORF7a sequences and their distribution over only two continents.List of ORF8 sequences and their distribution over only two continents.The phylogenetic analysis for the 122 amino acid sequences of the ORF7a revealed the presence of two clear groups, with the first group containing most of the sequences. On the other hand, four non-grouped sequences were found as well (Fig. 13
).
Fig. 13
SARS-CoV-2 ORF7a amino acid phylogeny after group clustering. Two well-defined groups can be identified.
SARS-CoV-2 ORF7a amino acid phylogeny after group clustering. Two well-defined groups can be identified.ORF7b: Here, all mutations are accounted based on the Wuhan ORF7b sequence (YP 009725318). The sequence S2 (ORF7b, Africa) (identical to Wuhan ORF7b (YP 009725318)) was found on all six continents. It was found that only the C41F mutation was present in S8 (ORF7b, Asia), which appeared in Asia, North America, and Oceania. The sequence S1 (ORF7b, Africa) had the single mutation S5L, present in Africa and Asia. The sequence S5 (ORF7b, Asia) had the mutation S31L, and this sequence was found on two continents, Asia and North America only. L32F occurred in the sequence S10 (ORF7b, Europe), present in Europe and North America. Furthermore, the sequence S13 had the mutation L4F, and this sequence was found in North America and Oceania.For the ORF 7b proteins, phylogenetic analysis was performed using 26 amino acid sequences. Fig. 14
shows that the corresponding phylogenetic tree has three well-defined groups. In this phylogeny, an evolutionary proximity relationship between the sequences can be verified (Fig. 14).
Fig. 14
SARS-CoV-2 ORF7b amino acid phylogeny after group clustering. Analysis identified three well-defined groups.
SARS-CoV-2 ORF7b amino acid phylogeny after group clustering. Analysis identified three well-defined groups.ORF8: Mutations described below are determined regarding the Wuhan ORF8 sequence (YP 009724396). It was observed that the Wuhan ORF8 YP 009724396 sequence was found on every continent. Also, another sequence present on every continent was the single mutation L84S. The single mutaion V62L was observed in the sequence S2 (ORF8, Africa), which was found on all continents except South America, whereas the ORF8 sequence S38 (Europe) possessed the single mutation A65S, and the sequence was found in North America, Oceania, and South America. Further, the V62L and L84S mutations were observed in S12 (ORF8, Asia) in Asia, North America, and Oceania. The sequence S15 (ORF8, Asia) contained the mutation S67F, which was found in Asia, North America, and Oceania. The ORF8 sequence S24 (Asia) possessed the single mutation A65V, which was found in Asia, North America, and Oceania.In the ORF8 phylogenetic analysis, we used 147 amino acid sequences. Fig. 15
shows the presence of three well-defined groups. On the other hand, many sequences were not grouped, and did not present well-defined branches.
Fig. 15
Phylogenetic analysis of SARS-CoV-2 ORF8 protein identified three well-defined groups.
Phylogenetic analysis of SARS-CoV-2 ORF8 protein identified three well-defined groups.ORF10: Mutations are based on the Wuhan ORF10 sequence (YP 009725255). The Wuhan ORF10 (YP 009725255) was identical with S1 (ORF10, Africa), and it was found on every continent. The ORF10 sequence S6 (ORF10, Asia) had the mutation L37F, and the sequence was present in North America and Oceania only. The V30L mutation was only found in the ORF10 sequence S10 (Europe), which appeared in Europe, North America, and Oceania. The sequence S9 (ORF10, Europe) had the mutation S23F, and it was found in Europe and North America. The mutation D31Y appeared in the S12 sequence (ORF10, Europe), which was found in Europe and North America only.The results for the ORF10 phylogenetic analysis included 32 sequences and showed four groups, the first with eight sequences, the second with 16, and the last two groups with four sequences each (Fig. 16
).
Fig. 16
SARS-CoV-2 ORF10 amino acid phylogenetic analysis identified four well-defined groups.
SARS-CoV-2 ORF10 amino acid phylogenetic analysis identified four well-defined groups.Concluding this section, one need to keep in mind that the phylogeny results are only suggestive and can be used for finding new possibilities to search for other genes in association with the vaccine and/or drug development, which typically works best with well-defined strain clades (see Table 5).
Table 5
List of ORF8 sequences and their distribution over only two continents.
Sequence
Mutation(s)
Present in the continent(s)
Sequence
Mutation(s)
Present in the continent(s)
S1
V33F
Africa and North America
S40
P38S
Europe and North America
S7
T11I
Asia and North America
S50
T11K
North America and Oceania
S8
T12 N
Asia and North America
S54
S21 N
North America and Oceania
S9
V32L
Asia and North America
S59
S24L, DEL(DS)66–67, K68E
North America and Oceania
S14
G66C
Asia and North America
S62
S24L
North America and Oceania
S16
P93L
Asia and North America
S68
Q27K
North America and Oceania
S17
L95F
Asia and North America
S108
V114
North America and Oceania
S25
D63 N
Asia and North America
S130
A65V
North America and Oceania
S26
A51V
Asia and North America
S147
P36S
North America and Oceania
S29
D34G
Asia and North America
S156
G8R
North America and Oceania
S39
A55V
Europe and North America
Featuring uniqueness of the accessory proteins
Here, certain basic descriptive statistics (mean, variance, lower bound, upper bound, and range) were employed to describe the variability of the percentage of the predicted intrinsically disordered residues (PPIDRs), molecular weight (MW), and isoelectric point (pI) of all the unique variants of all accessory proteins (Table 6
). The zigzag behavior of the plots of PPIDRs, MW, and pI depicts wide variability of variants for each accessory protein (Supplementary Figures S2–S41).
Table 6
Descriptive statistics of PPIDR, MW, and IP of unique accessory proteins of SARS-CoV-2.
PPIDR of unique accessory proteins of SARS-CoV-2based on PONDR® VSL2
Accessory proteins
Mean
Variance
Lower bound
Upper bound
Range
ORF3a
4.756
0.2328
2.91
7.64
4.73
ORF6
25.74
74.69
21.31
87.5
66.19
ORF7a
3.51
0.5716
2.48
7.29
4.81
ORF7b
44.663
10.527
37.21
51.16
13.95
ORF8
9.125
1.285
5.6
13.45
7.85
ORF10
18.67
5.0691
13.16
23.68
10.52
MW of unique accessory proteins of SARS-CoV-2
Accessory proteins
Mean
Variance
Lower bound
Upper bound
Range
ORF3a
31123
17917.58
29187
31270
2083
ORF6
7171.03
371714.6
2881.205
7542.84
4661.635
ORF7a
13673.4
150719.4
10874.515
14328.65
3454.135
ORF7b
5173.02
2651.26
5033.005
5224.22
191.215
ORF8
13841.4
21411.43
12608.465
14431.55
1823.085
ORF10
4446.53
1173.801
4389.085
4509.285
120.2
pI of unique accessory proteins of SARS-CoV-2
Accessory proteins
Mean
Variance
Lower bound
Upper bound
Range
ORF3a
5.9127
0.0278
5.2349
6.5881
1.3532
ORF6
4.4013
0.057
3.8436
5.7589
1.9153
ORF7a
8.0932
0.0434
6.7486
8.5946
1.846
ORF7b
3.9519
0.0063
3.6379
4.1442
0.5063
ORF8
5.6368
0.1223
4.7442
6.8829
2.1387
ORF10
8.2415
0.6857
6.0601
9.2043
3.1442
Descriptive statistics of PPIDR, MW, and IP of unique accessory proteins of SARS-CoV-2.The following observations were made based on the data shown in Table 6. The amount of total dispersion (based on range) of the percentage of PPIDR and MW of ORF6 variants was highest, whereas the highest amount of total dispersion of pI was observed for ORF10. The smallest amounts of total dispersions of the percentage of PPIDR, MW, and pI were found for ORF3a, ORF10, and ORF7b, respectively. The broad range and variance of the MW values of the unique ORF3a, ORF7a, ORF8, and ORF10 variants imply the wide variability of each set of ORF3a, ORF7a, ORF8, and ORF10 although range and variance of PPIDR and pI were not widely spread. In the case of the unique variance of ORF6, the range and variance of MW and percentage of PPIDR were found to be large, which implied the wide quantitative differences among the unique ORF6 variants. Furthermore, a moderately broad range and variance associated with the percentage of PPIDR and MW of ORF7a variants imply their moderate variability.In line with the previously reported data, Fig. 17, Fig. 18
and Table 6 show that all SARS-CoV-2 accessory proteins contain different levels of intrinsic disorder. In fact, based on their overall disorder predispositions, these proteins can be arranged as follows: ORF8 < ORF3a < ORF7a < ORF10 < ORD6 < ORF7b, where the difference in the overall intrinsic disorder predisposition between these proteins can be as high as 6-7-fold (compare data for ORF8 and ORF7b in Fig. 17). Furthermore, disorder predispositions of these proteins are sensitive to the mutations found in their natural variants. For example, Fig. 17 represents the effect of mutations in the natural variants on the overall disorder predisposition of accessory proteins and shows that the whole-protein disorder-related parameters, PPIDR and mean disorder score (MDS), can be dramatically changed by mutations. The largest variability of mutation-induced change in intrinsic disorder propensity is observed for ORF10 and ORF6 (see Fig. 17).
Fig. 17
Effect of mutations observed in unique natural variants of the SARS-CoV-2 accessory proteins on their overall intrinsic disorder predisposition evaluated in terms of percent of predicted intrinsically disordered residues (PPIDR) and mean disorder score (MDS). These data were generated using the PONDR® FIT [42] algorithm, which is a meta predictor that combines outputs of six predictors of intrinsic disorder, PONDR® VLXT [43], PONDR® VSL2 [44,45], PONDR® VL3 [46], FoldIndex [47], IUPred [48], and TopIDP [49]. PONDR® FIT is moderately more accurate than each of its component predictors [42]. For each mutant, the predicted percentage of intrinsically disordered residues (PPIDR) and mean disorder score (MDS) were calculated based on the outputs of this per-residue disorder predictors. Here, PPIDR in a query protein represents a percentage of residues with disorder scores exceeding 0.5. In this study, protein residues and regions were classified as disordered or flexible if their predicted disorder scores were above 0.5, or ranged between 0.15 and 0.5, respectively.
Fig. 18
Per-residue intrinsic disorder profiles generated for the SARS-CoV-2 accessory proteins and their natural variants by PONDR® VSL2, which systematically shows good performance in various comparative analyses, including recently conducted Critical assessment of protein intrinsic disorder prediction (CAID) experiment, where this tool was recognized as #3 predictor of 43 evaluated methods [31].
Effect of mutations observed in unique natural variants of the SARS-CoV-2 accessory proteins on their overall intrinsic disorder predisposition evaluated in terms of percent of predicted intrinsically disordered residues (PPIDR) and mean disorder score (MDS). These data were generated using the PONDR® FIT [42] algorithm, which is a meta predictor that combines outputs of six predictors of intrinsic disorder, PONDR® VLXT [43], PONDR® VSL2 [44,45], PONDR® VL3 [46], FoldIndex [47], IUPred [48], and TopIDP [49]. PONDR® FIT is moderately more accurate than each of its component predictors [42]. For each mutant, the predicted percentage of intrinsically disordered residues (PPIDR) and mean disorder score (MDS) were calculated based on the outputs of this per-residue disorder predictors. Here, PPIDR in a query protein represents a percentage of residues with disorder scores exceeding 0.5. In this study, protein residues and regions were classified as disordered or flexible if their predicted disorder scores were above 0.5, or ranged between 0.15 and 0.5, respectively.Per-residue intrinsic disorder profiles generated for the SARS-CoV-2 accessory proteins and their natural variants by PONDR® VSL2, which systematically shows good performance in various comparative analyses, including recently conducted Critical assessment of protein intrinsic disorder prediction (CAID) experiment, where this tool was recognized as #3 predictor of 43 evaluated methods [31].Next, we looked at the effect of natural variants on local intrinsic disorder predisposition. Results of this analysis are shown in Fig. 18, which represents the per-residue disorder profiles generated by PONDR® VSL2 for all the proteins analyzed in this study. Fig. 18 generally supports the observation that intrinsic disorder predispositions could vary significantly between the natural variants of each individual accessory protein. Importantly, the largest mutation-induced variability is observed within the disordered or flexible regions of these proteins (i.e., regions characterized by the predicted disorder scores exceeding the 0.5 threshold and regions with disorder scores between 0.15 and 0.5). This is an important observation suggesting that the natural variability of SARS-CoV-2 accessory proteins is shaping their structural flexibility.SARS-CoV-2 is the first HCoVs with pandemic capacity due to its highly contagious nature deriving from the structural differences in its S protein, such as a flat sialic acid-binding domain, tight binding to its entry ACE2 receptor, and capacity to be cleaved by furin protease [50]. Based on more than 355 million confirmed cases of COVID-19 and additionally a large number of asymptomatic cases, SARS-CoV-2 is a highly contagious, but relatively weak pathogen considering the ratio of the number of patients with severe infections associated with the multiple organ dysfunction to the total number of infected [6], or relatively low mortality rate (∼2.2%). The host immunity modulated by the SARS-CoV-2 accessory proteins could be responsible at least for some of these pathological features.Based on various mutations of accessory proteins, SARS-CoV-2 has had very little selective pressure to tackle host immunity in nature after diverging with BatCoV RaTG13 19–89 years ago [2]. The genomic stability of the relatively large RNA genomes (around 30,000 nucleotides) of SARS-CoV-2, as other CoVs, is protected by proofreading proteins, such as 3′-5′ exonuclease non-structural protein 14 (NSP14) that assists RNA synthesis with a unique RNA proofreading function [51]. Muller's ratchet effect explains the extinctive effect of high mutation rates of asexual organisms such as viruses potentially leading to the accumulation of deleterious mutations in an irreversible manner [52]. Therefore, SARS-CoV-2 repairs its mutations to preserve its genomic stability as mutations can lead to pathological fitness losses or viral extinction [52]. However, there is a balance governed by genomic repair mechanisms such as NSP14, and viruses that require a certain degree of mutations to gain novel traits such as emergence transmission into zoonotic hosts [52]. For instance, a 29-nucleotide deletion mutation in the SARS-CoV ORF8 gene, was associated with a less pathogenic strain [52]. Similarly, SARS-CoV-2 variants with a 382-nucleotide deletion in ORF8, showed only mild symptoms in COVID-19 patients, as they did not require supplemental oxygen [52].Only one variant identical to the Wuhan sequence (NC 045512) of each of the accessory ORF6, ORF7a, ORF7b, and ORF10 proteins was present on all continents. Most of the ORF3A variants with the prevalent non-synonymous amino acid substitutions (V13L, T14I, L46F, A54S, Q57H, S58 N, K75 N, A99V, L108F, R126S, G172V, G196V, F207L, T223I, G251I, G252V, N257S, and Y264C) possess a single point mutation [53,54]. Ten of these mutation sites occur within the transmembrane (TM) domain of ORF3a. Four of these variants contain the mutation Q57H paired with another amino acidic change (A99V, S58 N, Y264C, or G172V). Only two variants of ORF3a, differed by the clade/strain determining single mutation Q57H, were found on all six continents [41], and V13L, Q57H+A99V, G196V, and G252V were the most frequent mutations [54]. When Q57H and G251V (ORF3a) are combined with S19L and R203K/G204R in the nucleocapsid, these four mutations cause a dramatic change in viral protein structures [55]. In addition to being predominating in North America [53,56], some ORF3a variants were found on all six continents. This can be associated with virus evasion of the immune system leading to induction of cytokine, chemokine, and interferon-stimulated gene expression in primary human respiratory cells [25,57]. These dominating mutational effects are not limited to the modulation of the efficiency of viral pathogenesis, disease severity, and patient outcomes due to aggravation of the host immunity [21,53,58]. It may also play a role in viral ion-channel formation, viral particle loads, and virus release [21]. The precise roles of natural and/or variants of various SARS-CoV-2 ORFs on the outcome of COVID-19 patients are rather controversial [59] and need a more in depth analysis. Also, in ORF8, only two unique variants, differed by a strain determining single mutation L84S, appeared on all continents. So, the maximally intersecting family of variations across all accessory proteins turned out to be the smallest. These findings confirmed that all other variants of accessory proteins were due to demographic and environmental constraints.It was found that most of the unique variants of accessory proteins differed from the corresponding Wuhan accessory proteins by a single mutation, although basic descriptive statistics unfolded their respective wide variability. New variants of each accessory protein have been found in recent days and will continue to be discovered in the future. Significant amounts of unique variants of each accessory protein with wide variability might significantly contribute to the pathogenicity of SARS-CoV-2.Therefore, our firm conviction that naturally weakened stability (if achievable) of SARS-CoV-2 seems to be a far reachable goal, which needs to address the dangers of the present pandemic scenario. Also, unique accessory protein variants across individual continents would all be expected to be mixed, while international travels will resume without strict protective measures and restrictions. In this regard, it is our (SACRED, Self-Assembled COVID-19 Research & Education Directive, consisting of international experts in mathematics, physics, computer science, bioinformatics, nanotechnology, structural biology, molecular biology, immunology, and virology) strong recommendation to governmental and non-governmental organizations to take necessary measures to mitigate the spread of COVID-19.
Future perspective
In comparison to either SARS or MERS alone or combined, COVID-19 has caused more illness and death. CoVs can similarly trigger spreads and outbreaks in the coming years with different waves of variants as part of increased globalization. Broad spectrum genomics experiments should be used for the identification of possible genetic factors involved in COVID-19 development. Although costly and complicated, more genomics studies are required to assess the effect of host genomics and genetics on immune responses to CoV. Furthermore, understanding the progression and geographical location of SARS-CoV-2 viral genomics and genetics in the context of frequency and quantity of emerging viral variants and their association with viral infectivity, transmissibility, and clinical manifestation are issues to be addressed in future research and development programs.
Declaration of competing interest
The authors do not have any conflicts of interest to declare.
Authors: Andrew Campen; Ryan M Williams; Celeste J Brown; Jingwei Meng; Vladimir N Uversky; A Keith Dunker Journal: Protein Pept Lett Date: 2008 Impact factor: 1.890
Authors: Murat Seyran; Damiano Pizzol; Parise Adadi; Tarek M A El-Aziz; Sk Sarif Hassan; Antonio Soares; Ramesh Kandimalla; Kenneth Lundstrom; Murtaza Tambuwala; Alaa A A Aljabali; Amos Lal; Gajendra K Azad; Pabitra P Choudhury; Vladimir N Uversky; Samendra P Sherchan; Bruce D Uhal; Nima Rezaei; Adam M Brufsky Journal: J Med Virol Date: 2020-09-03 Impact factor: 2.327
Authors: Sk Sarif Hassan; Alaa A A Aljabali; Pritam Kumar Panda; Shinjini Ghosh; Diksha Attrish; Pabitra Pal Choudhury; Murat Seyran; Damiano Pizzol; Parise Adadi; Tarek Mohamed Abd El-Aziz; Antonio Soares; Ramesh Kandimalla; Kenneth Lundstrom; Amos Lal; Gajendra Kumar Azad; Vladimir N Uversky; Samendra P Sherchan; Wagner Baetas-da-Cruz; Bruce D Uhal; Nima Rezaei; Gaurav Chauhan; Debmalya Barh; Elrashdy M Redwan; Guy W Dayhoff; Nicolas G Bazan; Ángel Serrano-Aroca; Amr El-Demerdash; Yogendra K Mishra; Giorgio Palu; Kazuo Takayama; Adam M Brufsky; Murtaza M Tambuwala Journal: Comput Biol Med Date: 2021-04-15 Impact factor: 6.698