| Literature DB >> 32803835 |
Mohamed M Ba Abduallah1, Maged Gomaa Hemida2,3.
Abstract
The Middle East respiratory syndrome coronavirus (MERS-CoV) emerged in late 2012 in Saudi Arabia. For this study, we conducted a large-scale comparative genome study of MERS-CoV from both human and dromedary camels from 2012 to 2019 to map any genetic changes that emerged in the past 8 years. We downloaded 1309 submissions, including 308 full-length genome sequences of MERS-CoV available in GenBank from 2012 to 2019. We used bioinformatics tools to describe the genome structure and organization of the virus and to map the most important motifs within various regions/genes throughout the genome over the past 8 years. We also monitored variations/mutations among these sequences since its emergence. Our phylogenetic analyses suggest that the cluster within African camels is derived by S gene. We identified some prominent motifs within the ORF1ab, S gene and ORF-5, which may be used for barcoding the African camel lineages of MERS-CoV. Furthermore, we mapped some sequence patterns that support the zoonotic origin of the virus from dromedary camels. Other sequences identified selection pressures, particularly within the N gene and the 5' UTR. Further studies are required for careful monitoring of the MERS-CoV genome to identify any potential significant mutations in the future.Entities:
Keywords: MERS-CoV; bioinformatics; coronaviruses; evolution; genome; organization; phylogenetic analysis
Mesh:
Substances:
Year: 2020 PMID: 32803835 PMCID: PMC7461035 DOI: 10.1002/rmv.2150
Source DB: PubMed Journal: Rev Med Virol ISSN: 1052-9276 Impact factor: 11.043
Flanking sequences used for retrieving ORFs
| ORF | Upstream sequence | Downstream sequence |
|---|---|---|
| ORF1ab | GGGCACATC | NNNNNNNCCAGATTCT |
| S |
|
|
| ORF3 | ACGAACTATNN | CGAACTCT |
| ORF4a | AACGAACTCT | GAAACTGCGC |
| ORF4b | CTACATAAGG | CGAACTATGG |
| ORF5 |
| GCAGCTCTG |
| E | AAACGAACT | CGAACTCCT |
| M | CGAACTCCTNNNNN | GCTCTTTAGT |
| N | TTTCATTGTT | NNNNNNTCAAAGTAAC |
| ORF8 |
| GCAGAAACT |
Note: Bold and underlined sequences are part of the intended ORF region. N is any nucleotide.
Since some sequences showed no wild stop codon at the end of ORF8, 84 downstream nucleotides were included (up to “GGAGCAGTAG”).
FIGURE 1Genome structure and organization of the MERS‐CoV 2012 to 2019. The average full‐length genome sequences of the MERS‐CoV based on the (HCoV‐EMC/2012‐JX869059) is 30 107 bp, excluding the poly A tail at the 3′ end. Central panel: a schematic diagram of MERS‐CoV showing the predicted ORFs and their relative sizes and positions. Mapping the position and sequence of the (−1) ribosomal frameshifting (RFS) in the overlapped region of the ORF1a/ORF1b is shown. Arrows underneath and above ORF1a and ORF1b, respectively, represent positions of the replicase polyproteins (RdRp) pp1a and pp1ab that are predicted to be cleaved by papain‐like proteinases into the 16 non‐structural proteins (NSP‐1‐nap11) or the 3C‐like cysteine proteinase (NSP‐12‐NSP‐16). Top panel: expanded representation of ORF4b. Red arrows are representing the positions of deleted regions (±2 nt) in MERS‐CoV in dromedaries sampled from Burkina Faso, Nigeria, and Morocco. Bottom panel: expanded representation of the S glycoprotein organization showing identified regions in S protein subunits (S1 and S2) and the cleavage site (S1/S2). Amino acid variants in the African camels samples (V26A, R1020Q, and A1158S/L) are shown. NTD, N‐terminal domain; L, linker region; RBD, receptor‐binding domain; S.D., subdomain; U.H., upstream helix; F.P., fusion peptide; C.R., connecting region; H.R., heptad repeat; C.H., central helix; B.H., b‐hairpin
MERS‐CoV genes, their locations and protein size
| ORFs | Position (accession number JX869059) | Protein size (aa) | |
|---|---|---|---|
| ORF1ab | Nsp‐1 | 279 to 857 | 193 |
| Nsp‐2 | 858 to 2837 | 660 | |
| Nsp‐3 | 2838 to 8498 | 1887 | |
| Nsp‐4 | 8499 to 10 019 | 507 | |
| Nsp‐5 | 10 020 to 10 937 | 306 | |
| Nsp‐6 | 10 938 to 11 813 | 292 | |
| Nsp‐7 | 11 814 to 12 062 | 83 | |
| Nsp‐8 | 12 063 to 12 659 | 199 | |
| Nsp‐9 | 12 660 to 12 989 | 110 | |
| Nsp‐10 | 12 990 to 13 409 | 140 | |
| Nsp‐11 | 13 410 to 13 451 | 14 | |
| Nsp‐12 | 13 410 to 16 207 | 933 | |
| Nsp‐13 | 16 208 to 18 001 | 598 | |
| Nsp‐14 | 18 002 to 19 573 | 524 | |
| Nsp‐15 | 19 574 to 20 602 | 343 | |
| Nsp‐16 | 20 603 to 21 511 | 303 | |
| S gene | 21 456 to 25 517 | 1353 | |
| ORF3 | 25 532 to 25 843 | 103 | |
| ORF4a | 25 852 to 26 181 | 109 | |
| ORF4b | 26 093 to 26 833 | 246 | |
| ORF5 | 26 840 to 27 514 | 224 | |
| E | 27 590 to 27 838 | 82 | |
| M | 27 853 to 28 512 | 219 | |
| N | 28 566 to 29 807 | 413 | |
| ORF8b | 28 762 to 29 100 | 112 | |
Observed unique features of the MERS‐CoV genomes 2012–2019
| ORF | Accession number | Observation |
|---|---|---|
| 5′‐UTR | KT026453.1 and KT026455.1 (Human/Saudi/2015) | Have deletion of 9 nt (at 121‐129 of JX869059) |
| KT368869.1 and KT368879.1 (Camel/Saudi/2015) | Have deletion of 11 nt (121‐131 of JX869059) | |
| ORF1ab | MK462255 (Human/Saudi/2018) | has a deletion of 66 nt (3466‐3531 of JX869059) (within nsp3) |
| MF741827.1 and MF741832.1 (Human/Jordan/2015) | Nsp2 starts with V instead of D (D194V) | |
| S | KJ477102.1 (Camel/Egypt/2013)and MF679171.1 (Camel/UAE/2015) | Have deletion of three consecutive nucleotides (no frame shift) |
| KU710265.1(Human/Saudi/2014) | Has a deletion of 530 nt (2316‐2845 of JX869059.2) | |
| KJ614529, KC776174 (Human/Jordan/2012) and KX108943 (Camel/UAE/2015) | Have African motif R1020Q and A1158S/L | |
| KT806010.1 (Human/Saudi/2015) | Has a mutation found in Korean samples D510G | |
| ORF3 | KY688119 (Human/Saudi/2015) | Has deletion of 41 nt (25 790‐25 830 of JX869059) causing changing in two A.A. and 15AA short ORF3 |
| KT806046 (Human/Saudi/2015) | has deletion of 3AA due to deletion (25693‐25 701 of JX869059) | |
| KX108943 (UAE/camel/2015) | has deletion of 3AA due to deletion (25802‐25 810 of JX869059) | |
| MG923472 (Nigeria/camel/2015) | has deletion of 4AA due to a deletion (25809‐25 820 of JX869059) | |
| MG923473 (Burkina Faso/camel/2015) | Has deletion of 19 A.A. due to deletions of 17 and 25 bp in two positions (25772‐25 788 and 25 806‐25 830 of JX869059) | |
| MF000458.1, MF000459, MF000460.1, MF741837.1, MF741833.1, MF741834.1, MF741835.1, MF741836.1, KU233362.1 (Human/Jordan/2015) | Have deletion of 3AA due to deletion (25 675‐25 683 of JX869059) | |
| ORF4b | MF598715.1, MF598719.1, MF598720.1, MF598721.1, MF598722.1 (UAE/camel/2015) | Have full length gene ORF4b but truncated ORF4b protein (148 AA shorter) due to mutation at 26 388 causing stop codon |
| MF598690.1 (UAE/camel/2015) | Hasefull length gene ORF4b but truncated ORF4b protein (155 AA shorter) due to mutation at 26 366 causing stop codon | |
| KF600612.1, KF600620.1 (Human/Saudi/2012) | Have 144 AA shorter ORF4b due to a deletion of 17 nt (26 544‐26 560 of JX869059) causing a frameshift and stop codon | |
| MK483839.1 (Human/Saudi/2018) | Has full length gene ORF4b with 114 AA shorter ORF4b protein due to SNP at 26 489 creating stop codon mutation | |
| ORF5 | KU851859 (Human/Saudi/2015) | Has 655 nt produced 147 AA due to a deletion of 20 bp (27 227‐27 246 of JX869059) which cause a frame shift |
| KX108941 (Camel/ UAE/2015) | Has 673 nt produces only 7AA ORF5 protein due to deletion of 2 nt (26 859‐26 869 of JX869059) which created stop codon mutation | |
| MG923472, MG923481, MG923480, MG923479, MG923478, MG923477, MG923476, MG923475, MG923474 (Camel/Nigeria/2015‐2016) | Have A19V variants | |
| N | KJ614529 and KC776174 (Human/Jordan/2012) | Have D14Y variant which observed in 10 MERS‐CoV sequences isolated from African camels in 2015 and 2016 |
| KJ650295.1, KJ650296.1, and KJ650297.1 (Camel/Saudi/2013), KT156561 (Human/Oman/2013) | Have L23M variant | |
| MG757604, MG011358, MG011351, MG011345, MG011342, MG011350, MG011349, MG011348, MG011346, MG011344, KX154693, KX154692, KX154691, KX154690, KX154689, KX154688, KX154687, KX154686, KX154685, MH310909, MK129253 | Sequences isolated from human in 2016 to 2018 and have L23M variant | |
| MG923472, MG923481, MG923480, MG923479, MG923478, MG923477, MG923476, MG923475, MG923474 | Isolated from camel from Nigeria in 2015 to 2016 and have G198S variant | |
| MH822886, MK483839, MK462256, MK462255, MK462254, MK462253, MK462252, MK462250, MK462249, MK462248, MK462247, MN120514, MN120513 | Have G198S variant. Thirteen sequences isolated from Human from different regions in Saudi in 2018 and 2019 only (one sample was sequenced in the UK from a traveller from KSA) | |
| ORF8 | JX869059 (Human/Saudi/2012), KJ614529, KC776174 (Human/Jordan/2012), MG011340 (Human/Saudi/2016) | Have a C/T SNP at 28772 of JX869059 which is observed in MERS‐CoV isolated from African Camels only |
Grouping of MERS‐CoV‐5′‐UTR sequences based on the nucleotide sequences at positions 127 and 132
| Origin | Position (of JX869059) | % | |
|---|---|---|---|
| 127 | 132 | ||
| Dromedary camels | C | T | 81 |
| T | T | 19 | |
| Humans | C | T | 3.2 |
| T | T | 46.2 | |
| C | C | 50.5 | |
FIGURE 2Phylogenetic analysis of MERS‐CoV full‐length genome and Spike glycoprotein sequences, 2012 to 2019. Phylogenetic analysis of 57 (37 human and 20 camels) MERS‐CoVs full genomes, A, ORF1ab, B, and S gene sequences, C, isolated from 2012 to 2019. The unrooted phylogenetic trees were constructed by the maximum‐likelihood method and bootstrap values calculated from 100 trees. The scale bar represents the tree distance corresponding to 0.001 nucleotide substitution/kb. Numbers at branch nodes indicate bootstrap values greater than 50. Branch lengths represent degrees of diversity between sequences. Clades are denoted, and African camel samples are highlighted in red. Sample information is labeled as the following: Accession number/ Country/Human (Hu) or Camelus dromedary (Cd)/year. Both trees show almost an identical distribution of samples in clades. This phenomenon supports the notion that the cluster of African camels in clade C is derived by ORF1ab and S gene sequences
FIGURE 3Evolution timeline of MERS‐CoV based on ORF3 sequence analysis. Timelines show the emergence of two variants of the MERS‐CoV‐ORF3 (L17F and P86L of JX869059 ORF3 protein sequence) isolated from humans and camels in 2012 to 2019 and in 2013 to 2019, respectively. Blue arrows represent ORF3 protein sequences with L17 and P86 residues. Red arrows represent the emergence and persistence of the two variants, L17F and P86L in ORF3 protein sequences. The percentages of sequences with the two variants to the total sequences isolated in the same period are shown. The magnification of red arrows does not reflect the actual percentage, but they were magnified for illustration purposes only
FIGURE 4Prediction of all potential ORFs within the MERS‐CoV‐ORF4b sequences, 2012 to 2019. An illustration is showing all putative ORFs of over 300 nucleotides across the wild type ORF4b as well as all the potential models for the defective ORF4b gene. The asterisks are indicating the identical ORFs generated by the wild type ORF4b gene (accession No.: JX869059) and other observed models for the defective ORF4b gene. This phenomenon suggests that the first half of the gene ORF‐4b could be responsible for the functional part of ORF4b protein