Literature DB >> 33406522

Genetic Variation and Evolution of the 2019 Novel Coronavirus.

Salvatore Dimonte1, Muhammed Babakir-Mina2, Taib Hama-Soor2, Salar Ali2.   

Abstract

INTRODUCTION: SARS-CoV-2 is a new type of coronavirus causing a pandemic severe acute respiratory syndrome (SARS-2). Coronaviruses are very diverting genetically and mutate so often periodically. The natural selection of viral mutations may cause host infection selectivity and infectivity.
METHODS: This study was aimed to indicate the diversity between human and animal coronaviruses through finding the rate of mutation in each of the spike, nucleocapsid, envelope, and membrane proteins.
RESULTS: The mutation rate is abundant in all 4 structural proteins. The most number of statistically significant amino acid mutations were found in spike receptor-binding domain (RBD) which may be because it is responsible for a corresponding receptor binding in a broad range of hosts and host selectivity to infect. Among 17 previously known amino acids which are important for binding of spike to angiotensin-converting enzyme 2 (ACE2) receptor, all of them are conservative among human coronaviruses, but only 3 of them significantly are mutated in animal coronaviruses. A single amino acid aspartate-454, that causes dissociation of the RBD of the spike and ACE2, and F486 which gives the strength of binding with ACE2 remain intact in all coronaviruses. DISCUSSION/
CONCLUSION: Observations of this study provided evidence of the genetic diversity and rapid evolution of SARS-CoV-2 as well as other human and animal coronaviruses.
© 2021 S. Karger AG, Basel.

Entities:  

Keywords:  Biodiversity; Clinical genetics; Ecology; Mutation rate; Pandemics; Prevalence

Year:  2021        PMID: 33406522      PMCID: PMC7900485          DOI: 10.1159/000513530

Source DB:  PubMed          Journal:  Public Health Genomics        ISSN: 1662-4246            Impact factor:   2.000


Introduction

Coronavirus disease (COVID-19) is the severe acute respiratory disease in humans that is caused by the emergence of a new type of a coronavirus. The outbreak of the disease was first identified in Wuhan, China, in December 2019. Wuhan city became an epicenter of the disease, and then the outbreak spread outside of China and caused the emergence of the disease in 213 countries and territories. The total number of infected cases (25,658,983) and death (855,186) has been recorded globally until September 1, 2020. The World Health Organization called it as a public health emergency of worldwide concern (https://www.worldometers.info/coronavirus/). The estimated mean time of incubation period of SARS-CoV-2 is 5.1 days. In general, all 3 emerged coronaviruses have relatively similar mean time of incubation period around 5 days. More precisely it is 5.1 in SARS-CoV-2 (ranged from 4.5 to 5.8 days) [1], 5 days in SARS (ranged from 2 to 14 days) [2], and 7 days in MERS (ranged from 2 to 14 days) [3]. The coronavirus family is an enveloped virus containing a single-strand RNA, and the diameter of the virus is about 80–120 nm. Coronaviruses are divided on 4 types: α-coronavirus, β-coronavirus, δ-coronavirus, and γ-coronavirus. Some common human coronaviruses, HKU1, NL63, and OC43 and 229E, infect humans and only cause mild respiratory disease, while MERS-CoV and SARS-CoV-1 cause devastating severe acute respiratory infection [4]. All devastating coronaviruses, 2019-nCoV (SARS-CoV-2) with MERS-CoV and SARS-CoV-1, belong to the same group of coronaviruses, β-coronavirus [5]. The genome sequence homology between SARS-CoV-2 and other viruses are noted: it is 79% homologous with SARS-CoV-1, and it is more homologous to BatCoV RaTG13 [6, 7, 8, 9] which belongs to SARS-like bat coronaviruses. There are no recombination evidences found between them [10]. Both SARS-CoV-2 and SARS-CoV-1 exploit the same receptor for the binding process, angiotensin-converting enzyme 2 (ACE2) receptor, and this means that both viruses may share structure similarity and a way of attaching to the cell receptor [11]. SARS-CoV-2 recognizes and binds with ACE2 receptor through its spike protein to initiate infection. It has been found via structural model analysis that the new coronavirus SARS-CoV-2's affinity of binding to ACE2 is 10 times stronger than the previously known coronavirus, SARS-CoV-1. The surface spike glycoprotein helps viral entry into host cells via homotrimers projection from the viral surface. The spike protein has 2 main domains: S1 which is responsible to bind the host cell receptor and S2 is responsible for the fusion of both viral and host cellular membranes [12]. The most genomic variation part in SARS-CoV-1 and SARS-CoV-2 is the receptor-binding domain (RBD) in the spike protein [13, 14], and some locations in this protein sequence might be related to positive selection [15]. Due to the abundant variability in SARS-CoV-2 isolates, many questions require an answer to understand whether these mutations have a role in the pathogenicity of SARS-CoV-2. This is significant to understand the viral infection mechanisms and pave a way to find drug and vaccine to protect people from the next stage of the pandemic.

Materials and Methods

NC_045512.2 COVID-19/Wuhan-Hu-1CHN/2019/First Isolate was used as a reference strain for the definition of mutations. Multiple sequence alignments of sequences were performed using T-Coffee program (https://www.ebi.ac.uk/Tools/msa/tcoffee/) and were manually edited with the BioEdit software. The T-coffee program was used to mitigate the pitfalls of progressive alignment methods and because it is suitable for small alignments. In the case of amino acid insertion or deletion, these events were not considered. All amino acid viral sequences of the spike, nucleocapsid, membrane, and envelope proteins were screened in both animal and human viruses, and the frequency of mutations was calculated and statistically compared using the χ2 test (based on a 2 × 2 contingency table containing the number of isolates from animal and human viruses and the number of isolates with and without mutations) [16, 17]. Fisher exact tests were used to determine whether the differences in frequency between the 2 groups of samples (animal vs. human) were statistically significant. The Benjamini-Hochberg method was used to identify results that were statistically significant in multiple hypothesis testing. A strong false discovery rate of 0.001 was used to determine statistical significance [18].

Results

The pandemic COVID-19 is the third known outbreak to cause respiratory illness in the current century which has been caused by zoonotic coronaviruses, and it is thought to have jumped from bats or/and pangolin hosts to humans [14, 15, 19, 20, 21]. The disease spread mostly everywhere globally very rapidly mainly due to traveling. Several mutations have been observed in the virus genome during the rapid dissemination of the disease across different continents. Antigenic drift may occur as a result of accumulation of several mutations in the virus during seasonal outbreaks which helps the virus to survive due to natural selection process like it is common among influenza viruses [14]. Tracking mutations in the genome of the virus is necessary especially in spike protein which directly has a role in binding and infecting the cell. Vaccine development is mostly dependent on the spike protein, and this is aimed to find antibody against spike or any other surface proteins to neutralize viruses [22, 23]. Therefore, in this study, the occurrence of mutations in different animal and human coronaviruses and finding genetic diversity among human coronaviruses and/with animal coronaviruses were focused. For this purpose, over 50 protein sequences of each of the spike (S), nucleocapsid (N), envelope (E), and membrane (M) proteins from the online database GenBank was collected. The sum of SARS viruses' amino acid changes is listed in Table 1. The sequences were processed for each of them separately in both human and animal viruses as follows.
Table 1

Sum of SARS viruses' amino acid changes

ENVSpike
Membrane
Nucleocapsid
nAbRDB
F4SE340SL387IA2SA68DR131FY178VS2AI84HQ163DQ242SF307LY360V
V5LE340GN388AD3NV70SR131IK180RD3SY87WG164QG243SA308VK361G
E7NF342SD389GT7SY71FP132AL181RN4GR88NT165GQ244SS310NF363R
L12SA344GD389ST9DR72QL133IG182QN4VR89EL167IT245RA311ST366D
I13VR346LD389TT9PI73PL134EA183RG5KA90QK169DV246TA311VE367Q
V14LF347IL390GL13VI73VE135GS184YG5NR93WF171NT247PS312AP368E
N15TF347YF392KK14VW75TE135SQ185LN8DG96MF171VS250KS312HK370R
S16AA348PF392TK15EI76GS136AR186GN8GG96PY172FA252RF314CK373S
L18YA348QN394DL17FT77FS136VR186NQ9KD98KE174PE253KF314LK374R
A22GS349TN394SE18RT77GE137PV187AR10TK100GE174VA254MF315LA376S
A22LA352GY396FQ19NG78FL138MV187DN11PK100QS176NS255AG316FD377E
V24LW353FA397LW20YG79AV139GA188QF17LM101RG179RK256DM317GD377S
V25AW353YV401RL22FG79VG141SG189SS21KL104QQ181GK256HS318GE378R
V25FN354KD405TV23FI80AA142LD190GR41PL104VQ181RK257RI320VA381P
F26VN354YT415AG25NI80FA142PS191KQ43KS105PA182RP258RG321TL382A
L27YK356SG416TG25TA83LV143IS191TQ43RP106DS183NR259EM322PR385Q
L28II358HK417CF26AM84LV143TG192RG44VP106SS184TR259YM322TQ390K
V29IS359PI418CF26IC86VI144LF193WL45GR107AR185AQ260CE323KT391E
V29LV362PD467NL27FL87FL145KA195FL45RR107NS187RQ260WE323RT393D
T30SA363FT470YF28LL87IR146SA195VP46KY109HN192AA264PV324EL394V
L31RD364RY473LL29TV88TH148TS197AP46SE118AN192ST265PV324LL395T
I33VY365FT500CT30FG89CR150LS197VN47RE118HS193PK266PP326GP396D
L34QY365LT500GT30IG89LR150YR198KN48GG120DS197GA267ES327DA397N
A36FS366EG504YW31FL90VI151CY199QT49NL121AS197RA267GT329LA397Q
L37IV367TY505FW31LM91LI151FY199ST49PP122NT198SN269SW330EA398L
L39AL368IQ506GI32LM91SI151VR200KA50LP122RP199RT271DW330HD399E
C40AL368VP507EC33VW92FA152ER200SW52LY123WG200SQ272AL331IL400F
A41DY369NP507SL34VS94GH154FI201HT54AA125DS201NA273CY333FF403E
Y42AN370SY508LF37YS94MH154QI201VT54QN126RA208GA273VG335FS404P
Y42TS371GR509YA38GF96WH155KG202DA55PK127QA211SR276PG335YQ406V
N45LA372LV512DY39HA98NH155WN203DT57KI130VG214SG278TA336KL407I
V47WS373WV512LA40YF103YG157AN203TT57RI131FG215EP279KA336TQ408N
N48YS375NL513VN41KA104KR158KY204GQ58AT135AA217DT282KI337MQ409W
V49ST376KS514YN41TA104RR158TK205EQ58VE136KA218IQ283EI337TS410G
S50WK378FF515VR42YT106CC159VK205SH59KL139DA218LG284KK338VS413A
L51IC379SE516QN43SM109WD160EL206AH59TL139KL219IQ289AL339VA414L
K53IY380VE516TR44AS111AD160QN207EG60DN140TL219VQ289DD340PA414V
F56KG381DR44KT116SI161PN207VG60NT141KL221AE290KD341KD415G
Y57GG381SL46IN117DI161VD209NE62PK143RL222AL291MK342DS416E
V58AV382II48GI118AD163HD209VD63FD144SL223AI292VK342SQ418E
V58TV382LI49LL119ID163QS212TL64PH145NL223VR293EK347DA419L
Y59AP384YI49VL119VE167DS213DL64WI146LL224KQ294EK347EA419N
S60FT385GL51ML120GE167FS213EF66VI146QD225KT296ID348N
L65FT385LI52LN121SI168VS214GQ70NT148VD225QT296VQ349F
L65NT385SF53IN121TT169FD215EQ70SA152DR226AD297KQ349Y
S68DK386PL56CV122IL16HD215SG71AN153KR226IY298AI351K
D72KL57FP123LA171CN216KI74TN154FN228KK299GL352I
V60LP123SS173AN216SN75GA155DE231GW301FL353C
T61NL124VS173PI217VT76EA155PS232IW301VN354D
T61VH125TR174DL220HN77GA156QK233QW301YK355E
L62IT127QT175RV221LS79KI157YM234QP302TK355S
C64VT127RL176RV221YS79NV158PK237QQ303AH356C
F65GI128QL176TQ222TP80KQ160RG238KI304MH356Q
L67FL129CS177NQ222VD81SL161FQ239SA305LI357V
L67IT130NY178ID82QP162SQ241KQ306NA359G

Sum of the mutations found and observed in a statistically significant manner (p ≤ 0.001) for each viral protein analyzed in the text. In the table were listed the changes, related to envelope (ENV), spike (nAb and RDB epitopes crucial and enough to bind ACE2 receptor), membrane, and nucleocapsid viral proteins. The ENV protein is responsible for protection of the interior parts of the virus and has a role in viral assembly during viral replication. The spike protein helps in viral attachment to its corresponding receptor and mediates fusion of the cell and viral membrane. The surface membrane is the most abundant structural protein and defines the shape of the viral envelope. The nucleocapsid is the pretentious structure inside the box of the coronaviruses: N structural protein and CoV RNA genome make up the nucleocapsid.

Envelope

The entire envelope protein sequences, derived from 51 animal viruses and 55 human viruses, were analyzed (Fig. 1). In Figure 1, frequencies of SARS-CoV-2 envelope amino acid signatures were shown, using the isolate NC_045512.2 COVID-19/Wuhan as a reference.
Fig. 1

Frequencies of SARS viruses' amino acid changes in envelope protein. Frequencies of envelope signatures in animal viral isolates (dark gray) and human viral isolates (light gray). The analysis was performed in sequences derived from 106 subjects; 51 reported animal viral strains, and 55 reported human viral strains. Statistically significant differences were assessed by χ2 tests of independence. All p values were calculated from 2-sided tests using 0.001 as the significance level (p ≤ 0.001).

In envelope virus sequences, among the 76 residues, 365 mutations were observed; 47, in a statistically significant manner (p ≤ 0.001). In animal viruses' sequences, 333 mutations were observed; 47 of them (F4S, V5L, E7N, L12S, I13V, V14L, N15T, S16A, L18Y, A22G, A22L, V24L, V25A, V25F, F26V, L27Y, L28I, V29I, V29L, T30S, L31R, I33V, L34Q, A36F, MUT, L37I, L39A, C40A, A41D, Y42A, Y42T, N45L, V47W, N48Y, V49S, S50W, L51I, K53I, F56K, Y57G, V58A, V58T, Y59A, S60F, L65F, L65N, S68D, and D72K) in a statistically significant manner (p ≤ 0.001) (Fig. 1a, b). In envelope human virus sequences, 94 mutations were observed and just 1, L28I, was observed in a statistically significance manner (p ≤ 0.001) (3 isolates; 5.4%) (Fig. 1a).

Spike

The entire spike protein sequences, derived from 51 animal viruses and 52 human viruses, were analyzed. This protein has several domains (1,273 amino acid residues): N-terminal domain, RBD, receptor-binding motif, subdomain-1, subdomain-2, fusion peptide, heptad repeat 1, heptad repeat 2, transmembrane region, and intracellular domain. The SARS-CoV-2 spike RBD bound to the cell receptor ACE2, and here this crucial region (domain located between amino acid residues 336–516) was analyzed [24]. In Figure 2, frequencies of SARS-CoV-2 spike RBD amino acid signatures were shown, using the isolate NC_045512.2 COVID-19/Wuhan as a reference.
Fig. 2

a, b Frequencies of SARS viruses' amino acid changes in neutralizing antibody (m396 and 80R) epitope (336–516 amino acid domain) spike protein. b The SARS-CoV-2 RBD/ACE2 interface corresponds to 387–516 amino acid domain. Frequencies of spike signatures in animal viral isolates (dark gray) and human viral isolates (light gray). The analysis was performed in sequences derived from 103 subjects; 51 reported animal viral strains, and 52 reported human viral strains. Statistically significant differences were assessed by χ2 tests of independence. All p values were calculated from 2-sided tests using 0.001 as the significance level (p ≤ 0.001).

In spike viruses' sequences, among the 1,273 residues, 9,782 mutations were observed; 982, in a statistically significant manner (p ≤ 0.001). In animal sequences, 920 mutations were observed, and specifically in the RBD region, 83 of them (E340S, E340G, F342S, A344G, R346L, F347I, F347Y, A348P, A348Q, S349T, A352G, W353F, W353Y, N354K, N354Y, K356S, I358H, S359P, V362P, A363F, D364R, Y365F, Y365L, S366E, V367T, L368I, L368V, Y369N, N370S, S371G, A372L, S373W, S375N, T376K, K378F, C379S, Y380V, G381D, G381S, V382I, V382L, P384Y, T385G, T385L, T385S, K386P, L387I, N388A, D389G, D389S, D389T, L390G, F392K, F392T, N394D, N394S, Y396F, A397L, V401R, D405T, T415A, G416T, K417C, I418C, D467N, T470Y, Y473L, T500C, T500G, G504Y, Y505F, Q506G, P507E, P507S, Y508L, R509Y, V512D, V512L, L513V, S514Y, F515V, E516Q, and E516T) in a statistically significant manner (p ≤ 0.001) (Fig. 2a, b). In spike human virus sequences, 222 mutations were observed, and specifically in the RBD region, 18 of them (E340S, F347I, W353F, K356S, V362P, Y369N, Y380V, G381D, T385L, K386P, D389T, D467N, T470Y, Y473L, G504Y, P507S, Y508L, and V512D) in a statistically significant manner (p ≤ 0.001) (Fig. 2a, b). To best study this domain, the geographical localization of sequences was analyzed. Homogeneity continental sources were observed: just the D405T animal virus mutation (10 Chinese isolates) shows a specific country (and continent, Asia).

Membrane

The entire membrane protein sequences, derived from 51 animal viruses and 55 human viruses, were analyzed (Fig. 3). In Figure 3, frequencies of SARS-CoV-2 membrane amino acid signatures were shown, using the isolate NC_045512.2 COVID-19/Wuhan as a reference.
Fig. 3

Frequencies of SARS viruses' amino acid changes in membrane protein. Frequencies of membrane signatures in animal viral isolates (dark gray) and human viral isolates (light gray). The analysis was performed in sequences derived from 106 subjects; 51 reported animal viral strains, and 55 reported human viral strains. Statistically significant differences were assessed by χ2 tests of independence. All p values were calculated from 2-sided tests using 0.001 as the significance level (p ≤ 0.001).

In membrane viruses sequences, among the 222 residues, 851 mutations were observed; 219, in a statistically significance manner (p ≤ 0.001). In animal membrane sequences, 794 mutations were observed, and specifically 219 of them (A2S, D3N, T7S, T9D, T9P, L13V, K14V, K15E, L17F, E18R, Q19N, W20Y, L22F, V23F, G25N, G25T, F26A, F26I, L27F, F28L, L29T, T30F, T30I, W31F, W31L, I32L, C33V, L34V, F37Y, A38G, Y39H, A40Y, N41K, N41T, R42Y, N43S, R44A, R44K, L46I, I48G, I49L, I49V, L51M, I52L, F53I, L56C, L57F, V60L, T61N, T61V, L62I, C64V, F65G, L67F, L67I, A68D, V70S, Y71F, R72Q, I73P, I73V, W75T, I76G, T77F, T77G, G78F, G79A, G79V, I80A, I80F, A83L, M84L, C86V, L87F, L87I, V88T, G89C, G89L, L90V, M91L, M91S, W92F, S94G, S94M, F96W, A98N, F103Y, A104K, A104R, T106C, M109W, S111A, T116S, N117D, I118A, L119I, L119V, L120G, N121S, N121T, V122I, P123L, P123S, L124V, H125T, T127Q, T127R, I128Q, L129C, T130N, R131F, R131I, P132A, L133I, L134E, E135G, E135S, S136A, S136V, E137P, L138M, V139G, G141S, A142L, A142P, V143I, V143T, I144L, L145K, R146S, H148T, R150L, R150Y, I151C, I151F, I151V, A152E, H154F, H154Q, H155K, H155W, G157A, R158K, R158T, C159V, D160E, D160Q, I161P, I161V, D163H, D163Q, E167D, E167F, I168V, T169F, L16H, A171C, S173A, S173P, R174D, T175R, L176R, L176T, S177N, Y178I, Y178V, K180R, L181R, G182Q, A183R, S184Y, Q185L, R186G, R186N, V187A, V187D, A188Q, G189S, D190G, S191K, S191T, G192R, F193W, A195F, A195V, S197A, S197V, R198K, Y199Q, Y199S, R200K, R200S, I201H, I201V, G202D, N203D, N203T, Y204G, K205E, K205S, L206A, N207E, N207V, D209N, D209V, S212T, S213D, S213E, S214G, D215E, D215S, N216K, N216S, I217V, L220H, V221L, V221Y, Q222T, and Q222V) in a statistically significant manner (p ≤ 0.001) (Fig. 3a–d). In human membrane virus sequences, 262 mutations were observed and specifically 35 of them (A2S, D3N, L22F, F26I, F28L, T30F, L34V, F37Y, A38G, N43S, L51M, F53I, V60L, L62I, L67F, L87I, A98N, M109W, V122I, L133I, V143T, R150Y, H155K, R158T, I168V, S173P, L181R, A195V, S197V, R198K, Y199S, R200K, I201V, S212T, and S214G) in a statistically significant manner (p ≤ 0.001) (Fig. 3a–d).

Nucleocapsid

The entire nucleocapsid protein sequences, derived from 51 animal viruses and 54 human viruses, were analyzed (Fig. 4). In Figure 4, frequencies of SARS-CoV-2 nucleocapsid amino acid signatures were shown, using the isolate NC_045512.2 COVID-19/Wuhan as a reference.
Fig. 4

Frequencies of SARS viruses' amino acid changes in nucleocapsid protein. Frequencies of nucleocapsid signatures in animal viral isolates (dark gray) and human viral isolates (light gray). The analysis was performed in sequences derived from 105 subjects; 51 reported animal viral strains, and 54 reported human viral strains. Statistically significant differences were assessed by χ2 tests of independence. All p values were calculated from 2-sided tests using 0.001 as the significance level (p ≤ 0.001).

In nucleocapsid virus sequences, among the 419 residues, 1,885 mutations were observed; 317, in a statistically significance manner (p ≤ 0.001). In animal nucleocapsid sequences, 613 mutations were observed and specifically 38 of them (D82Q, Y87W, A90Q, D98K, E118H, I130V, I131F, V158P, Q160R, L161F, G179R, S183N, T198S, A208G, A264P, A273C, L291M, W301F, F307L, A308V, G316F, M317G, E323R, S327D, W330E, K347E, I351K, K355E, K370R, A381P, and Q418E) in a statistically significant manner (p ≤ 0.001) (Fig. 4a–f). In human nucleocapsid sequences, 1,628 mutations were observed and specifically 317 of them (S2A, D3S, N4G, N4V, G5K, G5N, N8D, N8G, Q9K, R10T, N11P, F17L, S21K, R41P, Q43K, Q43R, G44V, L45G, L45R, P46K, P46S, N47R, N48G, T49N, T49P, A50L, W52L, T54A, T54Q, A55P, T57K, T57R, Q58A, Q58V, H59K, H59T, G60D, G60N, E62P, D63F, L64P, L64W, F66V, Q70N, Q70S, G71A, I74T, N75G, T76E, N77G, S79K, S79N, P80K, D81S, D82Q, I84H, Y87W, R88N, R89E, A90Q, R93W, G96M, G96P, D98K, K100G, K100Q, M101R, L104Q, L104V, S105P, P106D, P106S, R107A, R107N, Y109H, E118A, E118H, G120D, L121A, P122N, P122R, Y123W, A125D, N126R, K127Q, I130V, I131F, T135A, E136K, L139D, L139K, N140T, T141K, K143R, D144S, H145N, I146L, I146Q, T148V, A152D, N153K, N154F, A155D, A155P, A156Q, I157Y, V158P, Q160R, L161F, P162S, Q163D, G164Q, T165G, L167I, K169D, F171N, F171V, Y172F, E174P, E174V, S176N, G179R, Q181G, Q181R, A182R, S183N, S184T, R185A, S187R, N192A, N192S, S193P, S197G, S197R, T198S, P199R, G200S, S201N, A208G, A211S, G214S, G215E, A217D, A218I, A218L, L219I, L219V, L221A, L222A, L223A, L223V, L224K, D225K, D225Q, R226A, R226I, N228K, E231G, S232I, K233Q, M234Q, K237Q, G238K, Q239S, Q241K, Q242S, G243S, Q244S, T245R, V246T, T247P, S250K, A252R, E253K, A254M, S255A, K256D, K256H, K257R, P258R, R259E, R259Y, Q260C, Q260W, A264P, T265P, K266P, A267E, A267G, N269S, T271D, Q272A, A273C, A273V, R276P, G278T, P279K, T282K, Q283E, G284K, Q289A, Q289D, E290K, L291M, I292V, R293E, Q294E, T296I, T296V, D297K, Y298A, K299G, W301F, W301V, W301Y, P302T, Q303A, I304M, A305L, Q306N, F307L, A308V, S310N, A311S, A311V, S312A, S312H, F314C, F314L, F315L, G316F, M317G, S318G, I320V, G321T, M322P, M322T, E323K, E323R, V324E, V324L, P326G, S327D, T329L, W330E, W330H, L331I, Y333F, G335F, G335Y, A336K, A336T, I337M, I337T, K338V, L339V, D340P, D341K, K342D, K342S, K347D, K347E, D348N, Q349F, Q349Y, I351K, L352I, L353C, N354D, K355E, K355S, H356C, H356Q, I357V, A359G, Y360V, K361G, F363R, T366D, E367Q, P368E, K370R, K373S, K374R, A376S, D377E, D377S, E378R, A381P, L382A, R385Q, Q390K, T391E, T393D, L394V, L395T, P396D, A397N, A397Q, A398L, D399E, L400F, F403E, S404P, Q406V, L407I, Q408N, Q409W, S410G, S413A, A414L, A414V, D415G, S416E, Q418E, A419L, and A419N) in a statistically significant manner (p ≤ 0.001) (Fig. 4a–f).

Discussion/Conclusion

Different variants of coronaviruses have been identified infecting human and animals. The differences in genome structure and sequences make the coronaviruses to be different in severity of infection and host selectivity. Coronaviruses continuously change their structures via mutation, deletion, and/or insertion mutations. The most genomic variation part in SARS-CoV and SARS-CoV-2 is the RBD in the S protein [13, 14] and some locations in S protein sequence might be related to positive selection [15]. Due to the several changes in the genome sequences of SARS-CoV-2 isolates, it is necessary to find the location of mutations and to understand the role of these mutations in the pathogenicity of SARS-CoV-2. This is significant to understand the viral infection mechanisms and pave a way to find drug and vaccine to protect people from the next stage of the pandemic.

Envelope Protein

The surface envelope is the smallest but abundant structural protein. This protein participates to create envelope which is the important part of the virus, and it is responsible for protection of the interior parts of the virus and has a role in viral assembly during viral replication [25, 26]. Due to the envelope's roles in host cell infectivity, it is necessary to investigate the mutations in this protein in different animal and human coronaviruses, including SARS-CoV-2. For this purpose, the full amino acid sequences of several envelope proteins were retrieved from GenBank and used in this study. Fifty-one animal viruses and 55 human viruses' envelope proteins were taken and analyzed (Fig. 1). In envelope viruses' sequences, among the 76 residues, 365 mutations were observed and 47 in a statistically significance manner (p ≤ 0.001). In animal viruses' sequences, 333 mutations were observed; 47, of them in a statistically significant manner (p ≤ 0.001) (Fig. 1). In envelope human virus sequences, 94 mutations were observed and just 1, L28I, was observed in a statistically significant manner (p ≤ 0.001) (3 isolates; 5.4%) (Fig. 1a). These results show that there was high diversity seen in animal viruses than human viruses because animal viruses isolated in different species of animals, but for human viruses, only 1 host participated in the diversity. The rate of significant amino acid mutations (14.1%) is higher in animal coronaviruses than in human coronaviruses (0.9%). This means that human coronavirus preserves its envelope amino acids with the least number of amino acid mutations, and majority of its amino acids are conservative.

Spike Protein

The surface spike glycoprotein is the most important part of the virus, and it is responsible in natural host selection to initiate infection. This trimmer protein helps in viral attachment to its corresponding receptor and mediates fusion of the cell and viral membrane [27, 28, 29]. The S protein has 2 main domains: S1 which is responsible to bind the host cell receptor, and S2 is responsible for the fusion of membrane of both viral and host cellular membranes [12]. Due to the variability in sequences of the RBD SARS-CoV-1 and SARS-CoV-2 [13, 14], it is necessary to investigate the mutations in this protein in different animal and human coronaviruses including SARS-CoV-2. For this purpose, the full amino acid sequences of several spike proteins were retrieved from GenBank and used in this study. Fifty-one animal viruses and 52 human viruses spike proteins were taken and analyzed. Due to the important role of RBD of the spike, only this domain was analyzed in this study. More accurately, amino acid residues 336–516 in SARS-CoV-2, which are crucial and enough to bind ACE2 receptor [24], were analyzed. In general, in spike virus's sequences, among the 1,273 residues, 9,782 mutations were observed; 982, in a statistically significant manner (p ≤ 0.001). This large number of mutations indicates the wide variation between different host coronaviruses. Of which, 982 mutations are significant which means it is found in most of the virus's spikes of most of the coronaviruses. The same investigation was made in animal and human coronaviruses. In animal coronaviruses, 920 mutations were observed, and specifically in the RBD region, 83 of them in a statistically significant manner (p ≤ 0.001) (Fig. 2). The lower rate of significant number of mutations (8.1%) in human coronaviruses indicates the close relation between human coronaviruses more than in animal coronaviruses (9%). The variation in sequences is much less among human coronaviruses than animal coronaviruses. In spike human virus sequences, 222 mutations were observed, and specifically in the RBD region, 18 of them (E340S, F347I, W353F, K356S, V362P, Y369N, Y380V, G381D, T385L, K386P, D389T, D467N, T470Y, Y473L, G504Y, P507S, Y508L, and V512D) in a statistically significance manner (p ≤ 0.001) (Fig. 2). Only 222 amino acid mutations in human coronaviruses are different. This means that out of 1,273, 222 (17.4%) amino acids are different among different types of coronaviruses and 1,051 (82.5%) amino acids are conservative in all human coronaviruses. Of which, only 18 amino acids are significantly different which means they are the most variable amino acids prone to mutation in the spike proteins of human coronaviruses, and interestingly, most of them are located in RBD. The large proportion of significant amino acid mutations in RBD domain of spike protein indicating that this domain is very variable among human and animal coronaviruses and it may be because it is responsible for binding to the cell receptor in different hosts and it may indicate the strength of binding as well. In RBD of SARS-CoV-2, many amino acids have a direct contact with ACE2 receptor and have a role in direct spike and corresponding receptor binding [24]. These amino acids are K417, G446, Y449, Y453, L455, F456, A475, F486, N487, Y489, Q493, G496, Q498, T500, N501, G502, and Y505. In our study, it is revealed that K417, T500, and Y505 amino acids are missing in most of animal coronaviruses and the differences are significant. The mentions amino acids are substituted to K417C, T500C, and Y505F in animal coronaviruses. According to ding [24], these 3 amino acids are part of 17 (17.6%) amino acids which are important in the binding of spike protein of SARS-CoV-2 to human ACE2 receptor. Therefore, significant substitution mutation in animal coronaviruses of these amino acids may have a role in host specific infectivity and are unable to infect human cells. It is revealed in a different study that 193 amino acid residues (318–510) of RBD of SARS-CoV-1 spike protein are enough to bind to ACE2 receptor and among them, one-point mutation (aspartate-454) can abort the binding interaction between S1 domain of spike and ACE2 receptor [30]. In our study, this amino acid is not among significant animal and human coronaviruses mutations, and amino acids found previously are not required to have contact with ACE2 receptor [30]. Therefore, this amino acid may be very conservative in all coronaviruses and it has a big role in spike receptor binding but not host selectivity. F486 in RBD of the spike protein gives strength for RBD binding of spike protein of SARS-CoV-2 to the ACE2 receptor [31]. This mutation is also not present among significant amino acid mutations in animal and human coronaviruses; therefore, it may be one of the conservative amino acids. Finally, a recent observation underlied that RDB plays a fundamental role as a damping element of the massive viral particle's motion prior to cell recognition, while also facilitating viral attachment, fusion, and entry [32].

Membrane Protein

The surface membrane protein participates to create an outer layer of the virus which is the important part of the virus and it is responsible for giving a shape to the virus and protection of the interior parts of the virus. This membrane has a role in pathogenesis and virus entry into the cell. After binding of spike to the ACE2 receptor, it moves closer to its corresponding host cell membrane and mediates fusion of the cell viral membranes [6, 7, 8, 9, 24, 31]. Due to its important role in pathogenesis, it is necessary to investigate the mutations in this protein in different animal and human coronaviruses, including SARS-CoV-2. For this purpose, the full amino acid sequences of several membrane proteins were retrieved from GenBank and used in this study. Fifty-one animal viruses and 55 human viruses' membrane proteins were taken and analyzed. In Figure 3, frequencies of SARS-CoV-2 membrane amino acid signatures were shown, using the isolate NC_045512.2 COVID-19/Wuhan as a reference. In membrane viruses' sequences, among the 222 residues, 851 mutations were observed; 219, in a statistically significance manner (p ≤ 0.001). In animal membrane sequences, 794 mutations were observed, and specifically, 219 of them in a statistically significant manner (p ≤ 0.001). In animal viruses, rate of significant amino acid mutations is very high (27.5%) which means the diversity among them is very significant. This may have been related to the diversity of animal hosts ranging from birds, land animals, and sea animals where the virus lives and propagates because the virus needs to adapt themselves in new hosts and enjoinments to survive. On the other hand, the significant mutation rate (13.3%) is less in human hosts because there is no diversity between human hosts as seen in animals. In human membrane virus sequences, 262 mutations were observed, and specifically, 35 of them in a statistically significant manner (p ≤ 0.001). Nucleocapsid is the pretentious structure inside the box of the coronaviruses and it combines with the nucleic acid, RNA. The function of nucleocapsid is to hang viral nucleic acid around itself by packaging the genomic viral genome into long, flexible, helical ribonucleoprotein complexes, the nucleocapsid. The nucleocapsid protects the RNA of the virus and ensures its replication and transmission [33, 34]. Due to large number of diversity between human and animal coronaviruses, it is necessary to investigate the rate of mutations of nucleocapsid between animal and human coronaviruses. For this purpose, the entire nucleocapsid protein sequences, derived from 51 animal viruses and 54 human viruses, were analyzed (Fig. 4). In nucleocapsid virus's sequences, among the 419 residues, 1,885 mutations were observed; 317, in a statistically significance manner (p ≤ 0.001). In animal nucleocapsid sequences, 613 mutations were observed, and specifically, 38 of them in a statistically significant manner (p ≤ 0.001). In human nucleocapsid sequences, 1,628 mutations were observed, and specifically, 317 of them in a statistically significant manner (p ≤ 0.001). The number of significant amino acid mutations is lower in animal viruses at a rate of 6.1% than in human viruses, 19.4%. In all previously mentioned structural proteins, envelope, membrane, and spike, the rate of significant mutations was higher than in that of human viruses, whereas it is opposite in nucleocapsid protein. A connection between conserved viral structure and possible target medicine treatment is mandatory. As recently reported, thousands of compounds including approved drugs and drugs in the clinical trial are available in the literature, and some anti-COVID-19 candidates based on computer-aided drug design can be followed up [35]. Here, in general, in all human structural proteins, significant amino acid mutation(s) was observed. The highest significant mutation (19.4%) was observed in nucleocapsid protein, whereas the lowest rate was in envelope protein (0.9%). This means that the structure of envelope and envelope protein is very stable and the amino acids are very conservative while it is opposite in nucleocapsid. The rate of significant amino acid mutations in spike protein (8.1%) is less than that in nuleocapsid (19.4%). In spite of the existing less-significant amino acid mutations in spike protein, the mutations in spike protein still have a crucial role in the viruses' host selectivity and infectivity because it is directly related to receptor binding and indicating the type of cell and host to infect. Therefore, the least mutation in RBD of spike may make a catastrophic effect on the human hosts than other structural proteins of the coronaviruses. The structural proteins, spike, envelope, nucleocapsid, and membrane proteins are very genetically diverse among human and animal coronaviruses. The highest rate of significant mutation was found in nucleocapsid rather than in spike protein, but most of mutations in spike proteins are in the RBD part of S1 of the spike. The least rate of mutation was in envelope protein which means envelope protein amino acid residues are more stable and conservative. A single amino acid which dissociates spike binding with ACE2 receptor remains intact in all human and animal coronaviruses. In addition, among 17 amino acids that have a direct contact with ACE2 receptor, only 3 of them are significantly mutated and substituted in animal coronaviruses.

Summary

Large number of significant mutations were recorded in animal coronaviruses than in human coronaviruses. The highest rate of significant amino acid substitution was found in nucleocapsid viral protein, but it is the lowest in envelope protein which means envelope protein is more stable and less diverse. In spike protein, the highest number of significant amino acid mutations was found in RBD. Three out of 17 binding amino acids in RBD are significantly mutated in animal coronaviruses. A single amino acid of RBD, aspartate-454, which is essential for the binding of spike protein with ACE2 cell receptor remains intact in all human and animal coronaviruses.

Statement of Ethics

No ethical approval was required, as it is not applicable to our research. According to the study design, neither medical treatments nor procedures involving humans or animals were performed.

Conflict of Interest Statement

The authors have no conflicts of interest to declare.

Funding Sources

This work was supported by Mam Humanitarian Foundation.

Author Contributions

S.A. and M.B.-M. conceived the study, and T.H.-S. and S.D. participated in its design. All authors critically discussed and interpreted the results, drafted and critically reviewed this manuscript, and approved the final version.
  34 in total

1.  Natural polymorphisms of HIV-1 subtype-C integrase coding region in a large group of ARV-naïve infected individuals.

Authors:  S Dimonte; M Babakir-Mina; S Aquaro; C-F Perno
Journal:  Infection       Date:  2013-04-26       Impact factor: 3.553

2.  The M, E, and N structural proteins of the severe acute respiratory syndrome coronavirus are required for efficient assembly, trafficking, and release of virus-like particles.

Authors:  Y L Siu; K T Teoh; J Lo; C M Chan; F Kien; N Escriou; S W Tsao; J M Nicholls; R Altmeyer; J S M Peiris; R Bruzzone; B Nal
Journal:  J Virol       Date:  2008-08-27       Impact factor: 5.103

3.  Association between Severity of MERS-CoV Infection and Incubation Period.

Authors:  Victor Virlogeux; Minah Park; Joseph T Wu; Benjamin J Cowling
Journal:  Emerg Infect Dis       Date:  2016-03       Impact factor: 6.883

4.  Cryo-EM structures of MERS-CoV and SARS-CoV spike glycoproteins reveal the dynamic receptor binding domains.

Authors:  Yuan Yuan; Duanfang Cao; Yanfang Zhang; Jun Ma; Jianxun Qi; Qihui Wang; Guangwen Lu; Ying Wu; Jinghua Yan; Yi Shi; Xinzheng Zhang; George F Gao
Journal:  Nat Commun       Date:  2017-04-10       Impact factor: 14.919

5.  A pneumonia outbreak associated with a new coronavirus of probable bat origin.

Authors:  Peng Zhou; Xing-Lou Yang; Xian-Guang Wang; Ben Hu; Lei Zhang; Wei Zhang; Hao-Rui Si; Yan Zhu; Bei Li; Chao-Lin Huang; Hui-Dong Chen; Jing Chen; Yun Luo; Hua Guo; Ren-Di Jiang; Mei-Qin Liu; Ying Chen; Xu-Rui Shen; Xi Wang; Xiao-Shuang Zheng; Kai Zhao; Quan-Jiao Chen; Fei Deng; Lin-Lin Liu; Bing Yan; Fa-Xian Zhan; Yan-Yi Wang; Geng-Fu Xiao; Zheng-Li Shi
Journal:  Nature       Date:  2020-02-03       Impact factor: 69.504

Review 6.  Novel 2019 coronavirus structure, mechanism of action, antiviral drug promises and rule out against its treatment.

Authors:  Subramanian Boopathi; Adolfo B Poma; Ponmalai Kolandaivel
Journal:  J Biomol Struct Dyn       Date:  2020-04-30

7.  Decoding the evolution and transmissions of the novel pneumonia coronavirus (SARS-CoV-2 / HCoV-19) using whole genomic data.

Authors:  Wen-Bin Yu; Guang-Da Tang; Li Zhang; Richard T Corlett
Journal:  Zool Res       Date:  2020-05-18

Review 8.  The coronavirus nucleocapsid is a multifunctional protein.

Authors:  Ruth McBride; Marjorie van Zyl; Burtram C Fielding
Journal:  Viruses       Date:  2014-08-07       Impact factor: 5.048

9.  Structural insights into coronavirus entry.

Authors:  M Alejandra Tortorici; David Veesler
Journal:  Adv Virus Res       Date:  2019-08-22       Impact factor: 9.937

Review 10.  Hosts and Sources of Endemic Human Coronaviruses.

Authors:  Victor M Corman; Doreen Muth; Daniela Niemeyer; Christian Drosten
Journal:  Adv Virus Res       Date:  2018-02-16       Impact factor: 9.937

View more
  3 in total

1.  Using generative adversarial networks for genome variant calling from low depth ONT sequencing data.

Authors:  Han Yang; Fei Gu; Lei Zhang; Xian-Sheng Hua
Journal:  Sci Rep       Date:  2022-05-30       Impact factor: 4.996

2.  A Synthetic Peptide CTL Vaccine Targeting Nucleocapsid Confers Protection from SARS-CoV-2 Challenge in Rhesus Macaques.

Authors:  Paul E Harris; Trevor Brasel; Christopher Massey; C V Herst; Scott Burkholz; Peter Lloyd; Tikoes Blankenberg; Thomas M Bey; Richard Carback; Thomas Hodge; Serban Ciotlos; Lu Wang; Jason E Comer; Reid M Rubsamen
Journal:  Vaccines (Basel)       Date:  2021-05-18

Review 3.  The Immune Response to SARS-CoV-2 and Variants of Concern.

Authors:  Elham Torbati; Kurt L Krause; James E Ussher
Journal:  Viruses       Date:  2021-09-23       Impact factor: 5.048

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.