Literature DB >> 33487779

Genomic Study of COVID-19 Corona Virus Excludes Its Origin from Recombination or Characterized Biological Sources and Suggests a Role for HERVS in Its Wide Range Symptoms.

Ahmed M El-Shehawi1,2, Saqer S Alotaibi1, Mona M Elseehy2.   

Abstract

The COVID-19 corona virus has become a world pandemic which started in December 2019 in Wuhan, China with no confirmed biological source. Various countries reported the genomic sequence of different isolates obtained from infected patients. This allowed us to obtain a number of 38 isolates of full genomic sequences. Alignment of nucleotide (nt) sequence was carried out using Clustal Omega multiple alignment service at the EBI website. Alignment of nt sequence and phylogenetic relationship revealed that the COVID-19 is a new viral strain and its biological source has not been yet detected. The expected orf pattern was different among isolates obtained from the same country or different countries as well as from SARS-CoV isolates or bats CoV suggesting different virus human interaction possibilities during infection and severity. All isolates had the main five orfs (1ab, S, M, N, E), whereas they differed in the expected accessory orfs. Being with the biological source of COVID-19 undetected, the role of human endogenous retrovirus (HERVs) in the regulation of the host cell gene expression or the encoding for products that could modulate COVID-19 infection and the spectrum of its symptoms is discussed. © Allerton Press, Inc. 2020, ISSN 0095-4527, Cytology and Genetics, 2020, Vol. 54, No. 6, pp. 588–604. © Allerton Press, Inc., 2020.

Entities:  

Keywords:  COVID-19; Human endogenous retroviruses (HERVs); genome; nucleotide sequence alignment

Year:  2021        PMID: 33487779      PMCID: PMC7810191          DOI: 10.3103/S0095452720060031

Source DB:  PubMed          Journal:  Cytol Genet        ISSN: 0095-4527            Impact factor:   0.579


INTRODUCTION

Coronavirus belong to coronaviridae family, genus betacoronavirus, and subgenus sarbecovirus. Coronaviridae includes numerous birds and mammalian coronaviruses [1, 2]. Human to human coronaviruses was detected after its outbreak in Southern China in 2003 [3-5]. It was associated with severe acute respiratory symptoms (SARS), therefore it was named SARS-Coronavirus (SARS-CoV) [1, 6]. Its worldwide spread in 2003 outbreak caused above 8000 infections and more than 774 confirmed dead [1]. It was detected in the civets at the Himalayan palm [7]. Genome comparison confirmed that the civet viral isolate had 29 missing nucleotide of the open reading frame 10 (orf10) in most of characterized human isolates in the 2003 outbreak [7]. This led to the suggestion that the missing nucleotides caused the transmission of the virus from civets to human [1]. Another version of the virus was isolated from horseshoe bats [8] with 29 nucleotide insertion in orf8 (Bat-SARS-CoV) compared to most characterized human isolates. This genomic relationship suggested a common ancestor for civets, bats, and human SARS-CoV genomes [8]. After SARS outbreak in 2003, bats were considered the reservoir for future human CoV pandemics [9]. In 2012, the Middle East Respiratory coronavirus (MERS-CoV) was detected in Saudi Arabia [10, 11]. It is believed that it was transmitted from dromedary camels to human [12] but its origin was linked also to bats [13]. It caused 2521 infections and the death of 919 (35%) [14]. In 2019, a novel coronavirus (COVID-19) appeared in China (Wuhan City, Hubei Province). It is believed that COVID-19 originated from fresh seafood [15, 16]. This version of coronavirus was able to transmit from human to human [17, 18]. It has been spread in 193countries with above 10 Million confirmed infection and more than 500 000 confirmed deaths [19]. Analysis of COVID-19 full genome showed that it is similar to betacoronavirus, yet it is different from the previous SARS-CoV or MERS-CoV [15]. The COVID-19 diverged with the Bat_SARS-CoV in a separate group of sarbecovirus [15]. Genome study of COVID-19 and the Bat SARS-CoV (isolate BatCoV RaTG13) revealed that the genetic similarity between COVID-19 and RaTG13 indicated that COVID-19 is not the exact variant that led to the outbreak in China. However, the COVID-19 could have originated form the bats. Also, this study confirmed that COVID-19 did not result of recombination and not a mosaic [14]. Bioinformatics analysis using nucleotide sequence of COVID-19 genome isolated from patients revealed that the COVID-19 has 89% nt identity with Bat coronavirus (Bat SARS-like-CoVZXC21) and 82% to the SARS-CoV. Using amino acid sequence of the expected orfs of COVID-19 showed that it was diverged with bat, civet, and human SARS-CoV. Yet, unlike other coronaviruses, its orf3b produce a shorter protein and its orf8 encode a secreted protein making the source of the cOVID-19 version is undetectable [20]. Interaction between the COVID-19 spike protein (S) receptor and its host receptor angiotensin-converting enzyme 2 (ACE2) was investigated based on similar information obtained from SARS-CoV. The amino acid (aa) sequence of COVID-19 S protein including the receptor-binding domain (RBD) which interact with ACE2 is similar to that of SARS-CoV. This supports that the COVID-19 use ACE2 as its receptor and it has more affinity to human ACE2 and other animals, explaining its capability of human cell infection and human-human transmission [21]. The question now is where the COVID-19 came from and how similar are the isolates from different patients and different countries? Also, the wide spectrum of symptoms of the virus starting from no symptoms to death is a second key question. These are fundamental questions need to be answered for better understanding of the virus origin, transmission, and severity. In this study, we investigated the similarity of nucleotide sequence of 38 COVID-19 isolates from 6 countries to evaluate differences among them. Similarity among COVID-19 at the nt sequence or the predicted orfs were investigated. The role of human endogenous retroviruses (HERVs) in the COVID-19 wide range of symptoms is also discussed.

MATERIALS AND METHODS

Nucleotide and Protein Sequences

All nucleotide sequences of COVID-19 or SARS-CoV complete genome nt sequence of isolates were obtained from NCBI nucleotide database (https://www. ncbi.nlm.nih.gov/nuccore). Isolates included 17 from China, 10 from USA, 5 from Japan, 2 from Hong Kong, 2 from Taiwan, 1 from South Korea, 1 from Australia (Table 1).
Table 1.  

Nucleotide sequence identity to the first reported case from China isolateHZ-1 (Accession no. MT039873.1)

AccessionIsolateCountryTotal ScoreQueryCover, %Ident, %
1MT019532.1IPBCAMS-WH-04China55 092100100
2MN996528.1WIV04China55 092100100
3MN988668.1V WHU01China55 092100100
4NC_045512.2Wuhan-Hu-1China55 092100100
5MT019533.1IPBCAMS-WH-05China55 086100100
6MT019531.1IPBCAMS-WH-03China55 086100100
7MT066176.1NTU02Taiwan55 08110099.99
8MT066175.1NTU01Taiwan55 08110099.99
9MT027064.1USA-CA5USA55 08110099.99
10MN994468.1USA-CA2USA55 08110099.99
11MT027062.1USA-CA3USA55 07510099.99
12MT019529.1IPBCAMS-WH-01China55 07510099.99
13MN985325.1USA-WA1USA55 07510099.99
14MN996530.1WIV06China55 07199100
15LC522974.1TY/WK-501Japan55 07010099.99
16LC522972.1KY/V-029Japan55 07010099.99
17MN997409.1USA-AZ1USA55 07010099.99
18MT039888.1USA-MA1USA55 06610099.98
19MT039887.1USA-WI1USA55 06610099.99
20MT049951.1Yunnan-01China55 06410099.98
21LC522975.1TY/WK-521Japan55 06410099.98
22LC522973.1TY/WK-012Japan55 06410099.98
23MN996529.1WIV05China55 0649999.99
24MN975262.1HKU-SZ-005bH Kong55 06410099.98
25MN996531.1WIV07China55 0629999.99
26MN988713.1USA-IL1USA55 06210099.97
27LR757996.1Wuhan, genome assemblyChina55 06099100
28MT019530.1IPBCAMS-WH-02China55 05810099.98
29LR757995.1Wuhan, genome assemblyChina55 0579999.99
30MT044257.1USA-IL2USA55 05310099.98
31MN994467.1USA-CA1USA55 05310099.98
32MT039890.1SNU01Korea55 04210099.97
33LR757998.1Wuhan, genome assemblyChina55 0409999.99
34MN996527.1WIV02China55 0259999.99
35MN938384.1HKU-SZ-002aH Kong55 0229999.99
36MT007544.1VIC01Australia55 01010099.96
37MT044258.1USA-CA6USA54 93710099.92
38LC521925.1Japan/AI/I-004Japan54 92610099.91
39MN996532.1BatCoV-RaTG13China48 6309996.11
40AY395003.1SARS coronavirus ZS-CChina (2004)15 2138882.34
Nucleotide sequence identity to the first reported case from China isolateHZ-1 (Accession no. MT039873.1)

Blast and Multiple Alignment Analysis of COVID-19 Isolates

The sequence of the first reported COVID-19 isolate from China (HZ-1, MT039873.1) was used in a BLAST search to determine the identity of its sequence with other sequences reported from China or other countries in the nucleotide database. The nt sequence of isolates were aligned using Clustal Omega (ClustalO) multiple alignment service (https://www.ebi.ac.uk/Tools/msa/ clustalo/). Phylogenetic tree of isolate sequence was constructed using the same ClustalO. Nucleotide SNPs were detected manually in the aligned sequences.

Expected ORFs of Different COVID-19 Isolates

The expected orfs of each COVID-19 isolate were obtained from the NCBI graphics view of the nucleotide accession at the NCBI nucleotide database website (https://www.ncbi.nlm.nih.gov/nuccore).

RESULTS

Nucleotide Sequence Identity of COVID-19 and Other Corona Viruses

The first Chinese reported sequence (MT039873.1) of COVID-19 was used in a BLAST search. This search revealed high identity to the other 38 COVID-19 isolates (Table 1). These included 16 other reported sequences from China, 11 form USA, 5 from Japan, 2 Hong Kong, 2 from Taiwan, and 1 from Australia. High identity of these isolates was observed to the Chinese isolate ranging from 100 to 99.91% (Table 1) with query coverage range from 99–100%. Interestingly, the Chinese first reported case showed 96.11% identity and 99% coverage with the Chinese BatCoV-RaTG13 (MN996532.1) isolate; closest identity in this study. More important, its identity to the closest isolate of SARS-CoV (AY395003.1) was 82.34% identity and 88% query coverage (Table 1).

Phylogenetic Relationship among COVID-19 Isolates

Phylogenetic relationship among the 38 COVID-19 isolates reported from different countries showed random clustering without any noticeable phylogenetic relationship on various clades of the phylogenetic tree of isolates from China or any other country (Fig. 1). Clade A has 1 Chinese isolate. Clade B has 2 Chinese isolates. Clade C has 14 isolates, 1 from Australia, 3 USA, 6 from China, 1 from Taiwan, 2 from Japan, 1 from Korea. Clade D has 3 isolates, 2 from China, 1 from USA. Clade E has 18 isolates, 7 from USA, 6 from China, 2 from Hong Kong, 3 from Japan (Fig. 1). This random distribution of isolates from the same country, specifically Chinese isolates, indicated they belong to the same strain.
Fig. 1.

Phylogenetic relationship among COVID-19 isolates from different countries.

Phylogenetic relationship among COVID-19 isolates from different countries.

Nucleotide Sequence Alignment of COVID-19 Isolates

Using blast search, COVID-19 first reported Chinese isolate had 3.89% difference from the closest SARS-CoV and 17.66% difference from the closest bat coronavirus isolate (Table 1), Similarly, alignment of COVID-19 and SARS-CoV isolates as one group resulted in tremendous differences in the nt sequence spread overall the genome, therefore we investigated the nucleotide SNPs among COVID-19 and SARS-CoV isolates. The 38 COVID-19 isolates and the 3 SARS-CoV isolates were compared as separate groups. Among the 38 COVID-19 isolates, 108 nucleotide changes (103 SNPs and 5 deletions) were detected (Table 2). Seven Chinese isolates did not have any SNPs, whereas other isolates had different number of SNPs ranging from 1–9 (Table 2). The Korean isolate SNU01 came on the top with 9 SNPs, followed by USA isolate USA-IL1, USA isolate USA-IL1, and the Chinese isolate IPBCAMS-WH-02 with 8, 7, 6 SNPs consecutively (Table 2). All Japanese isolates had SNPs ranged from 3–5. Nucleotide SNPs were distributed among transition (66) and transversion (37). The number of detected SNPS indicated that the base substitution rate (SNPs) rate for all studied COVID-19 isolate was 103/1 135 284 = 9.07 × 10–5. Similar alignment among three SARS-CoV isolates (DQ182595.1; China, AY323977.2, Italy; AY310120.1, Germany) revealed that the Chinese isolate (DQ182595.1) nucleotide sequence had 99.97 and 99.95% identity with the Italian (AY323977.2) and German (AY310120.1) isolates consecutively. Nucleotide sequence alignment resulted in 12 SNPs and 1 deletion among the three SARS-CoV isolates (Table 2) indicating base substitution rate of 12/89197 = 12.22 × 10–5 among SARS-CoV isolates. This seems to be higher that the SNPs rate in COVID-19 isolates because of low number of isolates used.
Table 2.  

Summary of detected nucleotide SNPs among COVID-19 isolates

NOAC #CountryIsolateLength,ntNt SNPsSNPs
1LR757995.1ChinaWhole genome29 872T28129C, C8767T2
2LR757996.1ChinaWhole genome29 868
3LR757998.1ChinaWhole genome29 866C6943A, T11739A2
4MN988668.1ChinaWHU0129 881
5MN996527.1ChinaWIV0229 825G21292A, A24292G2
6MN996528.1ChinaWIV0429 891
7MN996529.1ChinaWIV0529 852G7004A, A21125G2
8MN996530.1ChinaWIV0629 854
9MN996531.1ChinaWIV0729 857A7988C, C9521T,2
10MT019529.1ChinaIPBCAMS-WH-0129 899A3778G, A8388G, T8987A3
11MT019530.1ChinaIPBCAMS-WH-0229 889T104A, T111C, T112G, C119G, T120C, G124A6
12MT019531.1ChinaIPBCAMS-WH-0329 899T6996C1
13MT019532.1ChinaIPBCAMS-WH-0429 890
14MT019533.1ChinaIPBCAMS-WH-0529 883G7866T1
15MT039873.1ChinaHZ-1, 1st case29 833
16NC_045512.2ChinaWuhan-Hu-129 903
17MT049951.1ChinaYunnan-0129 903C75A, C8782T,G11083C, T21644A, T28144C5
18 MN985325.1USAUSA-WA129 882C8782T, T28144C2
19MN994468.1USAUSA-CA229 883C17000T, G26144T2
20MT027062.1USAUSA-CA329 882G614A, A5084G, C28854T3
21MT027064.1USAUSA-CA529 882C2091T, C21707T2
22MT044258.1USAUSA-CA629 858Del 508-523, del 671-679,2
23MT039888.1USAUSA-MA129 882G3518T, C8782T,A17423G, C24034T C28854T5
24MT039887.1USAUSA-WI129 879C17373T, del 20298-20300,2
25MN988713.1USAUSA-IL129 882T490W,C3177Y, C8782Y, C24034Y, T26729Y, G28077S, T28144Y, C28854Y8
26MT044257.1USAUSA-IL229 882T490A, C3177T, C8782T, C24034T, T26729C, G28077C, T28144C7
27MN997409.1USAUSA-AZ129 882C8782T,G11083T, T28144C, C29095T4
28LC521925.1JapanAI-I-00429 848Del 351-374, C18485T, C18485T3
29LC522972.1JapanKY-V-02929 878G11554T, C15321T, C25807G, C29300T4
30LC522973.1JapanTY-WK-01229 878C2659T, C8779T C3789T, C29092T, T28141C5
31LC522974.1JapanTY-WK-50129 878C2659T, C8779T, C29092T, T28141C4
32LC522975.1JapanTY-WK-52129 878C2659T, C8779T, C29092T, G29702T, T28141C5
33MN938384.1H KongHKU-SZ-002a29 838C8750T, C29063T2
34MN975262.1H KongHKU-SZ-005b29 891C8782T,C9561T, T15607C, C29095T, T28144C5
35MT066175.1TaiwanNTU0129 870C8782T, T28144C2
36MT066176.1TaiwanNTU0229 870A9034G, C9491T2
37MT007544.1AusuraliaAustralia-VIC0129 893T19065C, T22303G, G26144T,del 29740299504
38MT039890.1KoreaSNU0129 903G2969T, C6031T, C12115T, T15597C, C20936G, C22224G, G25775T, G26144T, T26354A9
Total1 135 284108
SARS-CoV-1
39AY310120.1GermanySARS-CoV-1-FRA29 740T18965A, C19084T, C24933T, C26660T, C28268T5
40AY323977.2ItalySARS-CoV-1-HSR129 751G27254R1
41DQ182595.1ChinaSARS-CoV-1ZJ030129 706Del1-16, A12965C,T14022A, A14976T, C17478G, T17518A,C22573T7
Total89 19713
Summary of detected nucleotide SNPs among COVID-19 isolates

COVID-19 Open Reading Frames (orfs)

Five main orfs are usually produced by all corona virus isolates including orflab polyprotein, orfS, orfN, orfM, and orfE. Another seven orfs have been reported by various isolates including orf1a polyprotein, orf3a, orf6, orf7a, orf7b, orf8, and orf10 (Table 3). Usually, polyprotein 1ab and orf1a are processed into smaller accessory orfs (Table 4). The accessory orfs are not produced in all corona virus isolates.
Table 3.  

Common coronavirus orfs

Accession #orfGenomic locationLength, aaFunction
start ntend nt
YP_009724389.1orf1ab26621 5557.096Polyprotein
YP_009725295.1orf1a26613 4834.405Polyprotein
YP_009724390.1orfS21 56325 3841,273Surface glycoprotein
YP_009724392.1orfE262426 47275Envelope protein
YP_009724397.2orfN (orf9)28 27429 533419Nucleocapsid phosphoprotein
YP_009724393.1orfM (orf5)26 52327 191222Membrane glycoprotein
YP_009724391.1orf3a25 39326 220275ORF3a protein
YP_009724394.1orf627 20227 38761ORF6 protein
YP_009724395.1orf7a27 39427 759121ORF7a protein
YP_009725296.1orf7b27 75627 88743ORF7b
YP_009724396.1orf827 8948259121ORF8 protein
YP_009725255.1orf1029 55829 67438ORF10 protein
Table 4.  

Accessory orfs produced from polyprotein orf1ab and orf1a

Accession#Protein nameLength (aa)Source orf (1ab or 1a)Function
YP_009725297.1nsp1180ofr1ab, orf1aLeader protein
YP_009725298.1nsp2638ofr1ab, orf1a
YP_009725299.1nsp31,945ofr1ab, orf1a
YP_009725300.1nsp4500ofr1ab, orf1a
YP_009725301.1nsp5306ofr1ab, orf1a3C-like proteinase
YP_009725302.1nsp6290ofr1ab, orf1a
YP_009725303.1nsp783ofr1ab, orf1a
YP_009725304.1nsp8198ofr1ab, orf1a
YP_009725305.1nsp9113ofr1ab, orf1a
YP_009725306.1nsp10139ofr1ab, orf1a
YP_009725312.1nsp1113orf1a
YP_009725307.1nsp12932orf1aRNA-dependent RNA polymerase
YP_009725308.1nsp13601orf1aHelicase
YP_009725309.1nsp14527orf1a3'-to-5' exonuclease
YP_009725310.1nsp15346orf1aEndoRNAse
YP_009725311.1nsp16298orf1a2'-O-ribose methyltransferase
Common coronavirus orfs Accessory orfs produced from polyprotein orf1ab and orf1a

Expected orfs from COVID-19 Isolates

We investigated the expected orfs of different isolates from the same country or from different countries to check if different corona virus isolate differ in their expected orf pattern, although they have similar genome size and high identity in their genome nucleotide sequence (Tables 1, 2). Interestingly, orf pattern produced by isolates form the same country or from different countries differed greatly (Table 5, Fig. 2). All COVID-19, SARS-CoV, and the BatCoV-RaTG13 isolates have the five main orfs (1ab, S, E, M, N). Also, all of these isolates have orf3a except the Chinese isolate WHU01 (MN988668.1). This isolate is expected to produce only the five main orfs being the minimum orfs detected in this study. Only two Chinese isolates (Wuhan-Hu-1 and Yunnan-01) of COVID-19 38 isolates had the orf1a which is expected in three SARS-CoV isolates and the BatCoV-RaTG13 isolate (Table 5). Orf6 and orf7a are expected in all isolates except the Chinese isolate Wuhan-Hu-1. Orf7b is expected only in 7 Chinese isolates, the three SARS-CoV isolates, and the BatCoV-RaTG13 isolate, whereas orf8 is not expected in the three SARS-CoV isolates and the Chinese isolate Wuhan-Hu-1 (Table 5). Orf10 is not expected in 6 COVID-19 Chinese isolates, the three SARS-CoV isolates, and the BatCoV-RaTG13 isolate. Four extra accessory orfs (3b, 8a, 8b, 9b) are only expected in the three SARS-CoV isolates and the BatCoV-RaTG13 isolate (Table 5). Among isolates from the same country, USA isolates and Japanese Isolates did not show differences among their groups in the expected orf pattern. On the other hand, Chinese isolates showed differences in orfs 1a, 3a, 6, 7a, 7b, 8, 10 with Chinese isolate WHU01 (MN988668.1) is expected to produce only the five main orfs (Table 5). The orf pattern of selected 4 Chinese COVID-19 isolates, one SARS-CoV isolates, and the BatCoV-RaTG13 isolate is shown in Fig. 2. The first reported Chinese isolate (HZ-1, MT039873.1) has10 expected orfs of its genome including 1ab, N, S, E, M, 3a, 6, 7a, 8, 10. Orf1a is not expected from the genome of this isolate (Fig. 2). On the other hand, another Chinese isolate (Yunnan-01, MT049951.1) is expected to produce orf1a and orf7b beside the 10 orfs expected in isolate HZ-1 (Fig. 2). In addition, the Chinese isolate WIV02 (MN996527.1) expected orfs is similar to expected orf pattern of isolate Yunnan-01 except the absence of orf1a. Interestingly, bat isolate BatCoV-RaTG13 (MN996532.1) has exact similar expected orfs pattern as Chinese isolate WIV02. The Chinese isolate WHU01 (MN988668.1) only has 5 expected orfs (1ab, S, M, N, E). The Chinese SARS-CoV isolate SARS-CoV-1-ZJ0301 has expected 32 orfs including the main 5 orfs and 27 accessory orfs (Fig. 2).
Table 5.  

Summary of predicted ORFs in reported nCoV-2 isolates (+ indicates the presence of orf, – indicates the absence of orf)

NoAC #CountryIsolatebporfExtra orfs
1ab1aS3aEM67a7b8N103b8a8b9b
1LR757995.1*ChinaWhole genome29 872
2LR757996.1*ChinaWhole genome29 868
3LR757998.1*ChinaWhole genome29 866
4MN988668.1ChinaWHU0129 881 + + + + +
5MN996527.1ChinaWIV0229 825 + + + + + + + + + +
6MN996528.1ChinaWIV0429 891 + + + + + + + + + +
7MN996529.1ChinaWIV0529 852 + + + + + + + + + +
8MN996530.1ChinaWIV0629 854 + + + + + + + + + +
9MN996531.1ChinaWIV0729 857 + + + + + + + + + +
10MT019529.1ChinaIPBCAMS-WH-0129 899 + + + + + + + + + +
11MT019530.1ChinaIPBCAMS-WH-0229 889 + + + + + + + + + +
12MT019531.1ChinaIPBCAMS-WH-0329 899 + + + + + + + + + +
13MT019532.1ChinaIPBCAMS-WH-0429 890 + + + + + + + + + +
14MT019533.1ChinaIPBCAMS-WH-0529 883 + + + + + + + + + +
15MT039873.1ChinaHZ-1, 1st case29 833 + + + + + + + + + +
16MT039890.1S. KoreaSNU0129 903 + + + + + + + + + +
17NC_045512.2ChinaWuhan-Hu-129 903++++++++++++
18MT049951.1ChinaYunnan-0129 903++++++++++++
19MN985325.1USAUSA-WA129 882 + + + + + + + + + +
20MN994468.1USAUSA-CA229 883 + + + + + + + + + +
21MT027062.1USAUSA-CA329 882 + + + + + + + + + +
22MT027064.1USAUSA-CA529 882 + + + + + + + + + +
23MT044258.1USAUSA-CA629 858 + + + + + + + + + +
24MT039888.1USAUSA-MA129 882 + + + + + + + + + +
25MT039887.1USAUSA-WI129  879 + + + + + + + + + +
26MN988713.1USAUSA-IL129 882 + + + + + + + + + +
27MT044257.1USAUSA-IL229 882 + + + + + + + + + +
28MN997409.1USAUSA-AZ129 882 + + + + + + + + + +
29LC521925.1JapanAI-I-00429 848 + + + + + + + + + +
30LC522972.1JapanKY-V-02929 878 + + + + + + + + + +
31LC522973.1JapanTY-WK-01229 878 + + + + + + + + + +
32LC522974.1JapanTY-WK-50129 878 + + + + + + + + + +
33LC522975.1JapanTY-WK-52129 878 + + + + + + + + + +
34MN938384.1H KongHKU-SZ-002a29 838 + + + + + + + + +
35MN975262.1H KongHKU-SZ-005b29 891 + + + + + + + + + +
36MT066175.1TaiwanNTU0129 870 + + + + + + + + + +
37MT066176.1TaiwanNTU0229 870 + + + + + + + + + +
38MT007544.1AusuraliaAustralia-VIC0129 893 + + + + + + + + + +
Bat CoV-2
39MN996532.1ChinaBatCoV-RaTG1329 855 + + + + + + + + + +
SARS-CoV
40AY310120.1GermanySARS-CoV-1-FRA29 740 + + + + + + + + + + + + + +
41AY323977.2ItalySARS-CoV-1-HSR129 751 + + + + + + + + + + + + + +
42DQ182595.1ChinaSARS-CoV-1ZJ030129 706 + + + + + + + + + + + + + +

*Isolates number 1,2,3 have their nt sequence in the nucleotide database without their expected orfs annotated.

Fig. 2.

Map of expected orfs pattern of selected 4 COVID-19, 1 SARS-CoV isolate compared to the bat BatCoV-RaTG13 isolate. Accession number and isolate name are shown in each map panel.

Summary of predicted ORFs in reported nCoV-2 isolates (+ indicates the presence of orf, – indicates the absence of orf) *Isolates number 1,2,3 have their nt sequence in the nucleotide database without their expected orfs annotated. Map of expected orfs pattern of selected 4 COVID-19, 1 SARS-CoV isolate compared to the bat BatCoV-RaTG13 isolate. Accession number and isolate name are shown in each map panel.

DISCUSSION

The high identity (99.91 to 100%) in nucleotide sequence among COVID-19 isolates from various countries or the same country (Table 1) and their random clustering on the phylogenetic tree (Fig. 1) indicated that the reported COVID-19 isolates from different countries are highly similar and they belong to one COVID-19 strain. Also, the difference between COVID19 and SARS-CoV (11.66%) or COVID-19 and bat corona virus isolate BatCoV-RaTG13 (3.89%) strains distance COVID-19 as a novel viral strain that has not been identified before with different genome context. In addition, the low differences in nt sequence indicated by the nt SNPs among COVID-19 isolates and their distinction from SARS-CoV or bat corona virus support the same idea. Interestingly, collective base substitution rate for the studied isolates was 9.07 × 10–5. Base substitution rate of RNA viruses is the number of changed bases per cellular infection (generation). This is very difficult to determine because it is not known how many generations (infections) these isolates have gone before they had been sequenced, therefore this number is overestimation of SNPs rate in the studied strains because they should have gone through huge number of infections from being isolated from patients with symptoms. RNA viruses have mutation rate from 1 × 10–6 to 1 × 10–4 [22-24]. Our overestimated mutation rate of COVID-19 is still in the range of RNA viruses' mutation rate indicating that COVID-19 is a new viral strain. COVID-19 isolates showed differences in the expected orf pattern from their highly similar genome suggesting a high level of expected complexity of the COVID-19 genome and its host cells. This is in agreeing with other previous reports. Production of extra orfs beside the main orfs by different retroviruses has been reported previously. Human endogenous retrovirus K (HERV-K) produces two variant proteins (np9, rec) of its full sequence or the 292 bp deficient gene respectively [25]. Our results are in agree with results reported from other several studies which indicated that COVID-19 is a novel corona virus and did not originate from other previous existing strains [15]. Similarly, it was reported that COVID-19 is not a mosaic virus nor did it originated from recombination events [14]. In the same line, a third study revealed that COVID-19 had 89% nt identity with Bat coronavirus (Bat SARS-like-CoVZXC21) and 82% to the SARS-CoV. Its orf3b produce a shorter protein and its orf8 encode for a secreted protein leaving the source of the COVID-19 undetectable [20]. Therefore, the most probable scenario is that this strain was transmitted from unknown organism and developed the ability to infect and transmit from human to human [16]. Based on this scenario, future studies are needed to screen wide range of animals that come in contact with human to search for the possible source of this viral strain; COVID-19. On the other hand, in the absence of its biological source, the possibility of it is being synthetic and it became public by a leakage from unknown biological facilities can not be rolled out at this time. This possibility is supported by the detection of unique isolate reported in 2004. The sequence of a new SARS-CoV strain was reported in 2004 and filled by Centre National de la Recherche Scientifique CNRS, Institut Pasteur, Universite Paris Diderot as patent to the European Patent Office (Patent no. EP1694829B1). This strain was isolated from a patient from Hanoi, Vietnam. The sequence of this strain was not deposited in the nucleotide database or anywhere else except in the patent itself. When we blasted the nt sequence of this strain against the nucleotide database it turned out the SARS-CoV Urbani isolate icSARS-MA (Acc no. MK062180.1) as the closest sequence with only 89.65% identity indicating its difference from reported SARS-CoV isolates at that time and consequently from any other reported corona virus or COVID-19 isolates.

COVID-19 Symptoms Implicate Its Unique Interaction with Human Biology

It is well known that COVID-19 has a wide range of symptoms in human ranging from no symptoms to death. The valid question here is that what makes people different in their response to COVID-19 infection? Based on the distinction of COVID-19 genome from SARS-CoV and Bat CoV, COVID-19 unique characteristics, similarity among COVID-19 isolates at the nt, some possible scenarios could be suggested for the discrepancies among humans in response to infection. In addition to age and health of the host person, some genomic scenarios are summarized in the following sections based on the current studies of human endogenous retroviruses (HERVs). 4.1.1 Human endogenous retroviruses (HERVs). HERVs are DNA sequences originated from recurrent integrations of the previous exogenous retrovirus [26, 27]. HERVs are one type of highly conserved transposable elements (TE). TE and HERVS make up 40 and 8% of our genome consecutively [28]. HERVs were first detected in the human genome in the 1970s [29]. HERVs are classified into three main groups; I (gamaretrovirus and epsilonretrovirus-like), II (betaretrovirus-like), III (spumaretrovirus-like) based on their phylogenetic relationship [30, 31]. Their integration allowed the vertical transmission of retroviral genomes along with the human genome across generation [32]. HERVs are inserted in the genome through the reverse transcription of viral RNA producing a double stranded DNA (provirus) using the viral reverse transcriptase [33] and then the integration of the provirus in the host genome by the viral integrase and other host proteins [34]. Integrated copies can be activated and become active infection. After integration, the proviral DNA produce mRNA that encodes for various viral proteins or reverse transcribed by viral reverse transcriptase into proviral DNA that has the capability of new integration cycle. HERVs have similar structure to exogenous retroviruses that is comprised of two long terminal repeats (LTRs) with internal gag (matrix protein), pro-pol (protease, reverse transcriptase, and integrase), env (envelope) viral genes [32]. Beside these main retroviral proteins, some retroviruses produce extra proteins. Accordingly, the env gene of the HERV-K encodes two different protein variants (np9, rec) using its full sequence or the 292 bp deficient variant respectively [25]. 4.1.2. Impact of HERVs on human cells. HERVs have several different impacts on their host cells. Production of RNA and proteins from HERV sequence could have a role in the regulation of human genes and modulate immunity of the host [35, 36]. Although most of TEs have been silenced by accumulation of mutations or hypermethyaltion, some of them have been domesticated and still active in human biology [37]. For example, syncytins is a group of env proteins produced by different HERVs in mammals [38]. In human genome, two env genes HERV-W and HERV-FRD are involved in the production of env proteins syncytin-1 and -2, respectively [39]. They are involved in placental syncytiotrophoblast development, homeostasis [39, 40], and maternal immune tolerance to the growing fetus [41] respectively. 4.1.3. HERVs and regulation of human gene expression. At DNA level, huge number of HERV are integrated in the human genome and function as binding sites for transcription factors, alternative promoter, or splicing signals for cellular genes [37, 42–46] which indicates their role in regulation of transcription and human genome development. This could lead to upregulation, downregulation, suppression, or tissue-specific splicing of cellular genes [42, 45, 47]. Also, they represent a plethora of cis-acting regulatory elements that function as binding sites for the host trans-acting elements. The interplay between both types of elements makes up the gene regulation network in a cell [48, 49]. In the same line, the solitary LTRs, reminiscent of complete HERVs, can also regulate the host gene expression. Recurrent insertions of HERVs cause insertional mutations in the target genes and allelic homologous recombination [32]. For example, recombination between homologous HERV-I on chromosome Y cause microdeletion in the azoosperma factor and consequently male infertility [50]. In addition, HERVs can produce non-coding RNAs (ncRNAs) including microRNA and long ncRNA which furnish recognition motifs for RNA binding proteins or modulate the function of transcription factors [32]. Accordingly, HERV ncRNAs that has sequence similarity to human miRNA work as RNA sponges to bind other miRNA which are involved in the post-transcriptional regulation of gene expression [51]. This was the case in the regulation of embryonic stem cells in which an interaction of ncRNA (HPAT5) produced by HERVH to the let-7 miRNAs sequence [52]. Furthermore, in case of a HERV produces a protein which could function as regulator of the host gene expression during the virus life cycle and provide cellular functions during the cycle [36]. Interesting example is the HERV Gag and Rec proteins which are involved in the stability and translation the host cell mRNA [36]. For example, HML2 Rec was able to bind to 1 600 nt mRNAs of host embryonic cells and regulate their translation by ribosome in an early development process [53]. In the same line, Arc Gag-like protein produced by the Ty3/gypsy retrotransposon was suggested to coordinate brain neural cell communication indicating its role in the nervous system development [54, 55]. Specifically, Arc has been proposed to form capsids to carry mRNA between neuron cells via extracellular vesicles to be translated in the target neuron cell [56]. A group of HERVs spread in the human genome can form a coordinated regulatory network to regulate the expression of many host genes involved in the same pathway simultaneously [35, 47, 57]. For example, more than 30% of the human genome binding sites for the protein p53 were distributed in the genome by the HERV sequences and become the target network of p53 protein [58] leading to human genome plasticity and cellular networking. An interesting example for this plasticity is the MHC (major histocompatibility complex) locus which has been shown to have heavy integration of HERVs leading to its tremendous plasticity and hyper genetic variability [59]. Accordingly, the HERVK (HERVKC4) was integrated in the 9th intron of human complement C4A gene leading to its hyper variation [60, 61]. One vital example is the role of HERVs in the interferon (IFN) antiviral pathway in the innate immunity in the induction of adaptive immune response [62]. HERV integrations were involved in the development of INF network of INF inducible transcription enhancers in various mammalian genomes [35]. It was shown that deletion of HERV sequence near IFN gene suppressed the linked pathway [35]. Also, sequences of the HERV LTRs function as promoter or enhancer sites in response to IFN based activation [63]. The HERVK LTRs that have two IFN-stimulated response elements (ISREs) were induced by the IFN cascade in response to inflammation [64]. 4.1.4. HERVs and human immune modulation. Products of ancient integrated HERV represent the border line between human self and microbial non-self molecules and can be tolerated by human immune system or induce human immunity giving rise to autoimmune diseases. The innate immune pathways induced by HERVs’ products are the ones that function in the exogenous antiviral infection [65]. In humans, Toll Like Receptors (TLRs) and cytosolic pattern recognition receptors (cytPRRs) can recognize HERV products and lead to induction of immune response. This was reported in the case of autoimmune diseases and cancer [66, 67]. Recognition of viral molecules by innate immune receptors induces inflammatory molecules including IFN, cytokines, and chemokines invoking the antiviral response. This group of molecules activates the adaptive immune response through the activation of T and B cells. Both immune responses are required to fight exogenous viral infection and finally stop this activated response after infection. In case of HERV products, their continuous presence in the host cells provokes chronic stimulation of the host immune response resembling the chronic stimulation of immune response in autoimmune and inflammatory diseases caused by exogenous retroviral molecules [67-70]. The induced antiviral response activated by HERV products cause vicious circle in which the produced inflammatory molecules and epigenetic dysregulation further upregulated HERV expression [65, 71, 72]. Also, peptides produced from HERVs were implicated in the suppression of immune response. This includes the env proteins that has immunosuppressive conserved domain (ISD) in retroviral env proteins. For example, ISD from HERVs function in the maternal immune tolerance during pregnancy [38, 41]. 4.1.5. HERVs and exogenous viral infection. It is well documented that HERVs can contribute negatively or positively during exogenous viral infection [67]. Infection by some viruses including HIV, herpesviruses and influenza changed HERV expression [73-75]. In this regard exogenous infection could cooperatively upregulate the HERV expression and increase the immune response [67]. Also, HERV products could play a protective role against exogenous viral infection [36]. For example, production of HERV antisense RNA develops protection against exogenous infection by viruses with complementary RNA [65, 76]. Some studies reported that products of HERV function as pathogen-associated molecular patterns (PAMPs) which is able to induce receptors for host defense system [49, 65]. In addition, some of their products mimic antigens for stimulating specific B and T cells [77, 78]. This explains the role of HERVs in autoimmune and inflammatory diseases. On the other hand, they had a role in suppressing the immunity of host cells as they have been involved in maternal immune suppression and protection of excessive imune activation [79, 80].

Possible Role for HERVs in COVID-19 Infection and Symptoms

HERVs could modulate the infection and symptoms in the case of exogenous COVID-19 infection in different possible ways. First, HERVs or their products could compromise the immune system and facilitate the infection and penetrance of the virus to human cells. Also, individuals with high levels of the ACE2 receptor could be an easy target for the virus, especially those with high blood pressure and various types of stress. Second, different isolates of the virus can use the host cell to produce different protein sets (orf pattern) that can use the host cells and compromise the host immune system with different efficiencies. This will result in spectrum of disease severity and possibly death. In this study, different isolates from the same country (China) or from different countries are expected to produce various orf patterns. Some of the produced orfs which is the enzyme responsible for methylation of the 2' carbon of the ribose sugar of viral RNA. This modification of viral RNA makes it undetectable by the host immune system and effectively infects human cells [81]. Third, HERVs could produce protein products that complement the viral set of orfs in its entry, infection, replication, packaging, and integration in the human genome. In addition, partial proviral genomes of previous integration can produce some enzymes required for the replication of viral isolates that do not have the infection ability. For example, one animal isolate which does not have the capability to infect human could transfer to human and find in this individual’s genome some proviral genes that complement the animal strain to be infectious and able to cause the symptoms. Fourth, Corona virus genome can only produce its effective proteins for viral reproduction with -1ribosomal slippage at the translation start site. HERVs may produce proteins or miRNA that modulates the translation start for the ribosome changing the pattern of COVID-19 orfs in different human hosts. This leads to different course of symptoms and severity of the COVID-19 infection. Long term studies are urgent to be conducted on the COVID-19 and other retroviruses that attach human to validate all of these possibilities for future safety and better management of future pandemics like COVID-19. Also, intensive studies are needed to survey human populations (expecially elders and immune compromised) for their HERV loads and link this to their predisposition for other autoimmune diseases, cancer, and their risk for exogenous viral infection.

CONCLUSIONS

Our results conclude that COVID-19 did not originate from a known biological source or other previously characterized strains. COVID-19 isolates used in this study showed high similarity at the nt sequence, yet they differed greatly in the expected orf pattern from their similar genomes. The most probable scenario is that this strain was transmitted from unknown organism and has/or has developed the ability to infect human cells as well as to transmit from human to human. On the other hand, in the absence of its biological source, the possibility of it is being synthetic and it became public from unknown biological facilities can not be rolled out at this time.
  79 in total

1.  Evolution of the mammalian transcription factor binding repertoire via transposable elements.

Authors:  Guillaume Bourque; Bernard Leong; Vinsensius B Vega; Xi Chen; Yen Ling Lee; Kandhadayar G Srinivasan; Joon-Lin Chew; Yijun Ruan; Chia-Lin Wei; Huck Hui Ng; Edison T Liu
Journal:  Genome Res       Date:  2008-08-05       Impact factor: 9.043

Review 2.  Transposable elements and the evolution of regulatory networks.

Authors:  Cédric Feschotte
Journal:  Nat Rev Genet       Date:  2008-05       Impact factor: 53.242

3.  Potential molecular mimicry between the human endogenous retrovirus W family envelope proteins and myelin proteins in multiple sclerosis.

Authors:  Ranjan Ramasamy; Blessy Joseph; Trevor Whittall
Journal:  Immunol Lett       Date:  2017-02-09       Impact factor: 3.685

4.  Expression of HERV-W Env glycoprotein (syncytin) in the extravillous trophoblast of first trimester human placenta.

Authors:  A Malassiné; K Handschuh; V Tsatsaris; P Gerbaud; V Cheynet; G Oriol; F Mallet; D Evain-Brion
Journal:  Placenta       Date:  2005-08       Impact factor: 3.481

5.  Structural basis of arc binding to synaptic proteins: implications for cognitive disease.

Authors:  Wenchi Zhang; Jing Wu; Matthew D Ward; Sunggu Yang; Yang-An Chuang; Meifang Xiao; Ruojing Li; Daniel J Leahy; Paul F Worley
Journal:  Neuron       Date:  2015-04-09       Impact factor: 17.173

6.  A novel coronavirus associated with severe acute respiratory syndrome.

Authors:  Thomas G Ksiazek; Dean Erdman; Cynthia S Goldsmith; Sherif R Zaki; Teresa Peret; Shannon Emery; Suxiang Tong; Carlo Urbani; James A Comer; Wilina Lim; Pierre E Rollin; Scott F Dowell; Ai-Ee Ling; Charles D Humphrey; Wun-Ju Shieh; Jeannette Guarner; Christopher D Paddock; Paul Rota; Barry Fields; Joseph DeRisi; Jyh-Yuan Yang; Nancy Cox; James M Hughes; James W LeDuc; William J Bellini; Larry J Anderson
Journal:  N Engl J Med       Date:  2003-04-10       Impact factor: 91.245

7.  Arc/Arg3.1 is a postsynaptic mediator of activity-dependent synapse elimination in the developing cerebellum.

Authors:  Takayasu Mikuni; Naofumi Uesaka; Hiroyuki Okuno; Hirokazu Hirai; Karl Deisseroth; Haruhiko Bito; Masanobu Kano
Journal:  Neuron       Date:  2013-06-19       Impact factor: 17.173

Review 8.  Type W Human Endogenous Retrovirus (HERV-W) Integrations and Their Mobilization by L1 Machinery: Contribution to the Human Transcriptome and Impact on the Host Physiopathology.

Authors:  Nicole Grandi; Enzo Tramontano
Journal:  Viruses       Date:  2017-06-27       Impact factor: 5.048

Review 9.  Association of endogenous retroviruses and long terminal repeats with human disorders.

Authors:  Iyoko Katoh; Shun-Ichi Kurata
Journal:  Front Oncol       Date:  2013-09-11       Impact factor: 6.244

10.  Middle East respiratory syndrome coronavirus infection in dromedary camels in Saudi Arabia.

Authors:  Abdulaziz N Alagaili; Thomas Briese; Nischay Mishra; Vishal Kapoor; Stephen C Sameroff; Peter D Burbelo; Emmie de Wit; Vincent J Munster; Lisa E Hensley; Iyad S Zalmout; Amit Kapoor; Jonathan H Epstein; William B Karesh; Peter Daszak; Osama B Mohammed; W Ian Lipkin
Journal:  mBio       Date:  2014-02-25       Impact factor: 7.867

View more
  3 in total

1.  SARS-CoV-2 and human retroelements: a case for molecular mimicry?

Authors:  Benjamin Florian Koch
Journal:  BMC Genom Data       Date:  2022-04-08

2.  Confirming Multiplex RT-qPCR Use in COVID-19 with Next-Generation Sequencing: Strategies for Epidemiological Advantage.

Authors:  Rob E Carpenter; Vaibhav Tamrakar; Harendra Chahar; Tyler Vine; Rahul Sharma
Journal:  Glob Health Epidemiol Genom       Date:  2022-07-30

Review 3.  Syncytin, envelope protein of human endogenous retrovirus (HERV): no longer 'fossil' in human genome.

Authors:  Serpen Durnaoglu; Sun-Kyung Lee; Joohong Ahnn
Journal:  Anim Cells Syst (Seoul)       Date:  2022-01-12       Impact factor: 1.815

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.