Literature DB >> 33588026

Variant analysis of SARS-CoV-2 genomes in the Middle East.

Khalid Mubarak Bindayna1, Shane Crinion2.   

Abstract

BACKGROUND: Coronavirus (COVID-19) was introduced into society in late 2019 and has now reached over 88 million cases and 1.9 million deaths. The Middle East has a death toll of ~80,000 and over 35000 of these are in Iran, which has over 1.2 million confirmed cases. We expect that Iranian cases caused outbreaks in the neighbouring countries and that variant mapping and phylogenetic analysis can be used to prove this. We also aim to analyse the variants of severe acute respiratory syndrome coronavirus-2 (SARS -CoV-2) to characterise the common genome variants and provide useful data in the global effort to prevent further spread of COVID-19.
METHODS: The approach uses bioinformatics approaches including multiple sequence alignment, variant calling and annotation and phylogenetic analysis to identify the genomic variants found in the region. The approach uses 122 samples from the 13 countries of the Middle East sourced from the Global Initiative on Sharing All Influenza Data (GISAID).
FINDINGS: We identified 2200 distinct genome variants including 129 downstream gene variants, 298 frame shift variants, 789 missense variants, 1 start lost, 13 start gained, 1 stop lost, 249 synonymous variants and 720 upstream gene variants. The most common, high impact variants were 10818delTinsG, 2772delCinsC, 14159delCinsC and 2789delAinsA. These high impact variant ultimately results in 36 number of mutations on spike glycoprotein. Variant alignment and phylogenetic tree generation indicates that samples from Iran likely introduced COVID-19 to the rest of the Middle East.
INTERPRETATION: The phylogenetic and variant analysis provides unique insight into mutation types in genomes. Initial introduction of COVID-19 was most likely due to Iranian transmission. Some countries show evidence of novel mutations and unique strains. Increased time in small populations is likely to contribute to more unique genomes. This study provides more in depth analysis of the variants affecting in the region than any other study. Crown
Copyright © 2021. Published by Elsevier Ltd. All rights reserved.

Entities:  

Keywords:  Covid 19; Sequencing; Variant

Mesh:

Substances:

Year:  2021        PMID: 33588026      PMCID: PMC7880837          DOI: 10.1016/j.micpath.2021.104741

Source DB:  PubMed          Journal:  Microb Pathog        ISSN: 0882-4010            Impact factor:   3.848


Introduction

On January 9h 2020, the China Centre for Disease Control reported that 15 of 59 suspected cases of pneumonia were due to a novel human coronavirus (CoV), now known as Severe Acute Respiratory Syndrome CoV 2 (SARS-CoV-2) [1]. The genome for this novel virus was then made publicly available on the Global Initiative on Sharing All Influenza Data (GISAID) the next day. SARS-CoV-2 is an easily spreadable virus which would evolve into a global pandemic of at least 88 million cases and 1.9 million deaths [2]. One of the first countries to experience a significant outbreak was Iran. The country reported its first confirmed case on 19th February 2020 from a merchant in Qom who travelled from China [3].. Many of the first countries with infections in the Middle East were linked to travellers from Iran including Lebanon, Kuwait, Bahrain, Iraq, Oman and UAE. COVID-19 continued to spread to the remaining Middle Eastern countries with a death toll of over 50,000 people according to health authorities. This number is expected to be an underestimation due to countries effected by war including Libya, Syria and Yemen. Needless to say, there have been devastating effects to the region and the real effects are expected to be unreported [4]. Researchers are racing to develop a vaccine that can provide viral immunity and avoid additional deaths. SAR-CoV-2 is transmitted using the spike protein which binds to human angiotensin-converting enzyme 2 (ACE2) receptor; the virus is easily transmittable due to mutations in the receptor-binding (S1) and fusion (S2) domain of the strain [5]. Transmission could be made even easier if more mutations accumulate. Although mutations are rare, they can create new strains and it is not guaranteed that the current leading vaccine trials will be effective as SARS-CoV-2 continues to mutate [6]. By categorizing variants, we can identify any new strains and how the mutations are likely to affect spread. As the Middle East is often under reported, it is important to characterise the variants of strains that are commonly present. Analysis of the common variants in the Middle East is essential to develop a vaccine that treats the strains in the region. This analysis helps understand the viral genome landscape and identify clades of the region.

Objectives

Our hypothesis is that variants found in SARS-CoV-2 genomes from Middle Eastern samples will indicate delivery from Iran. We will use bioinformatics tools and publicly available samples to explore the composition of strains within each country. We expect that many strains will show evidence of Iranian origin. The aim is to explore the structure of Middle Eastern genome strains using multiple sequence alignment, tree generation and variant prediction (and others). If we explore the structure and common variants of SARS-CoV-2 strains in these populations, we expect to learn more about how the virus spread.

Methods

Sample Source: We obtained the publicly available data from the Global Initiative on Sharing All Influenza Data (GISAID) [7]. Sample Size: 122 Middle Eastern samples, Wuhan reference sequence NC_045512 and 5 recent Wuhan samples. Sample Selection: Samples were selected from the Middle East by using filtering available on the GISAID website. Only complete genome samples were used. The countries considered were Afghanistan, Bahrain, Cyprus, Egypt, Iraq, Iran, Israel, Jordan, Kuwait, Lebanon, Libya, Oman, Qatar, Saudi Arabia, Sudan, Syria, Turkey, United Arab Emirates (UAE) and Yemen. Iran had only 7 samples available after filtering for low coverage. Cyprus, Kuwait, Lebanon all had 8 samples available after filtering. No samples were available on the database from Afghanistan, Iraq, Libya, Sudan, Syria or Yemen. 10 samples were taken from all other countries. Samples were also filtered to high quality when possible. 10 samples was selected as the optimum number to cover all possible countries and remain within alignment file limit of size 4 Mb (maximum size for Clustal Omega tool). In countries with 10 samples, to prevent sample sourcing from same outbreak, the 5 earliest and 5 most recent samples were taken. All samples were downloaded from GISAID and then concatenated into a single multi-sample file and saved in FASTA format. Multiple sequence alignment: Using the collected samples, multiple sequence alignment (MSA) was performed using Clustal Omega (https://www.ebi.ac.uk/Tools/msa/clustalo/) [8]. The Clustal Omega online tool was used to perform the alignment (found at: https://www.ebi.ac.uk/Tools/msa/clustalo/). The online tools allows up to 4000 sequences or a maximum file size of 4 MB, therefore the maximum number of samples was used. The concatenated file of samples was uploaded to the online tool. For step 2, the output parameters selected were PEARSON/FASTA. All other options were kept at the default option. The output file generated is an alignment file; the file consists of all sequences with gaps denoted by ‘-’. The output file format is also a FASTA file. Variant identification: Variant calling was performed using the alignment FASTA file and the SNP extraction tool snp-sites [9] (https://github.com/sanger-pathogens/snp-sites). These tools identify the SNP sites by taking a multi-sample FASTA file as input. The program then restructures the data as a variant call format (VCF) file. The VCF file provides a clear mapping of SNPs from the aligned sequences – this allows us to easily identify the SNP location and the genotype for each sample at a given locus. In the outputted VCF file, the rows correspond with each unique variant and the column provides the genotype at the given site. Variant annotation: SNP-eff [10] has been used to perform the variant annotation information such as the variant definition and the overlapping gene (https://pcingola.github.io/SnpEff/SnpEff.html). SnpEff could also predict the effect of the variants. SNPeff is integrated into the Galaxy web-based tool for bioinformatics analysis (usegalaxy.org) [11]. VCF file is uploaded in the Galaxy platform. SnpEff database has been created by downloading the Wuhan reference NC_045512.2 from NCBI. The variants annotation have been performed by the “SnpEff build”. The custom parameters include the setup of 5000 bases for Upstream/Downstream length and 2 bases as set size for splice sites (donors and acceptor) in bases. Once the analysis was executed, the annotation data is outputted as an annotated VCF and a HTML report file. We have also analyzed the mutations using CoVsurver in the GISAID database [7]. Data visualisation: Once the annotated VCF was generated, the VCF was imported to R for extraction of the variant annotation information. The annotated data was imported, manipulated and plotted using R v3.6.2 [12]. The dplyr v0.8.4 package was used to summarise and align the data [13]. The ggplot2 package was used to align the identified variants and visualise the types of mutations that re-occur [14]. Variant position along the SARS-CoV-2 genome is indicated in the X axis. The y-axis indicates the sample name and the right y-axis represents the country of origin for each sample. This plot is used to compare the genome in different populations. Phylogenetic analysis: Phylogenetic analysis was performed using BEAST (Bayesian Evolutionary Analysis Sample Trees) v1.10.4, to perform Bayesian analysis of molecular sequences using Monte Carlo Markov Chains (MCMC) [15]. The analysis followed the approach recommended to reconstruct the evolutionary dynamics of an epidemic. The aim of this is to obtain an estimate of the origin of the epidemic in the region and understand how it spread through the Middle East. To undertake the analysis, we opened BEAUTi, the graphical application used to analyse the control file. Although it requests a NEXUS file, the FASTA file can also be used. The data was uploaded using the Import Data option and appeared under the Partitions section. BEAUTi confirmed that 30851 sites are present in the uploaded data. The default options are selected for site model and clock model. Next, we specified the individual virus dates by selecting the “Tips” panel and selecting the “Use tip dates” option. A tab delimited file was uploaded which specified the upload date. This information was extracted from the names as they were downloaded from GISAID. Next, we set the substitution model by selecting the “Sites” tab and selected the default options of HKY model, the default Estimated base frequencies and select Gamma as Site Heterogeneity Model. Next, the molecular clock was selected under the “Clock” tab as a strict clock since we know that the frequency of mutation is low. The tree options are elected under the “Tree Prior” tab as “Random starting tree” for the tree model and “Coalescent: Exponential Growth”, a model that assumes a finite but constant population size and predicts that all alleles will be removed from the population individually. This provides additional predictions on the reproductive rate. In the “Priors” tab, select the scale as 100 for prior distribution which models the expected growth for a pandemic. The operators require no changes from the default. The MCMC option for chain length is set as 100,000 and sampling frequency to 100. Tree visualisation: Finally, we summarised the tree using the TreeAnnotator tool, an additional package as part of BEAST. We first select the file generated using BEAST and outputted the tree file. Then the output NEXUS file was imported to FigTree program to display. Once we opened FigTree to display, we re-ordered the order by Increasing value and then switched on Branch Labels. We switched on Node Bars and selected the 95% highest posterior density (HPD) credible intervals for the node heights. We plotted a time scale by turning on the Scale Axis and then setting the Time Scale section for Offset as 2020.7, the latest date of collection for our samples.

Results

Sequence alignment and variant calling were completed successfully. Once these were complete, variation annotation was performed. We identified 2200 distinct genome variants which are recorded in Table 1 . The most common, high impact variants were 10818delTinsG, 2772delCinsC, 14159delCinsC and 2789delAinsA. The frequency of each unique variant type can be found in Table 2 , which outlines the locus of all SNPs with over 50 instances.
Table 1

The frequency of each type of mutation found in the data.

AnnotationCount
Downstream gene variants129
Frame shift variants298
Missense variants789
Start lost1
Start gained13
Stop lost1
Synonymous variant249
Upstream gene variant720
Table 2

The frequency of mutations at each locus with >50 hits.

PositionCount
24174
303774
2435174
1108373
1440872
864
2159
2259
2357
The frequency of each type of mutation found in the data. The frequency of mutations at each locus with >50 hits. CoVsurver [7] analysis indicated that these strains have overall 2.828% mutations on spike glycoproteins. Total number of 36 mutations have been identified. Strain EPI_ISL_507007, isolated from Iran, has maximum number of unique mutations. These mutations include NSP1_W161L, NSP1_V116A, NSP1_V106G, NSP1_V54A, NSP2_I393T, NSP2_N92S, NSP2_T153L, NSP2_L506S, NSP2_E309A, NSP2_A159D, NSP2_C136S, NSP3_K1838R, NSP3_R1341L, NSP3_K1596R, NSP3_R1345L, NSP3_A886D, NSP3_M1865T, NSP3_I468T, NSP3_K140Q, NSP3_A1279D, NSP3_A1105G, NSP3_G1389D, NSP3_N1778S, NSP3_L781S, NSP3_G1944V, NSP3_L1523H, NSP3_K1715R, NSP3_C296R, NSP3_R558P, NSP3_V1673D, NSP3_E378V, NSP3_S674Y, NSP3_S721 N, NSP4_P274L, NSP4_L321P, NSP4_A48D, NSP4_L243P, NSP4_L329H, NSP4_L176Q, NSP4_Q488L, NSP4_P168Q, NSP5_G174V, NSP5_E166V, NSP5_N84S, NSP5_S62Y, NSP5_P9L, NSP5_T111 N, NSP5_P52L, NSP6_C221Y, NSP6_A119G, NSP6_W140L, NSP6_S53Y, NSP6_C68Y, NSP6_F70Y, NSP6_F184S, NSP6_H64 N, NSP6_G188D, NSP6_V101G, NSP6_G48D, NSP6_F220S, NSP6_P87L, NSP7_L13S, NSP8_D134E, NSP9_P57H, NSP10_V119A, NSP10_E135A, NSP10_R134H, NSP11_Q5P, NSP12_E144D, NSP13_D160E, Spike_G1246A, Spike_R1185H, Spike_L368P, Spike_S974P, Spike_G268D, Spike_R190S, Spike_A411D, Spike_G798A, Spike_A672D, Spike_V1230E, Spike_Q774R, Spike_H146R, Spike_P337R, Spike_Q607L, E_I33T, M_P59T, M_K14E, NS7a_F114I, NS8_V62 M, N_Q390L, N_P20H, N_R10Q, N_R149L. Other than that EPI_ISL_514753, isolated from Iran, also have goof nuber of unique mutations like NSP1_W161L, NSP2_E309A, NSP2_C136S, NSP3_R1341L, NSP3_K232T, NSP3_N1778S, NSP3_E229A, NSP3_C296R, NSP6_H64 N, Spike_D808G, Spike_H146R, Spike_N1192S. But only on the basis of spike proteins EPI_ISL_427420 strain which is isolated from Qatar has highest number of mutations on the spike protein. On the other hand EPI_ISL_514306, isolated from Israel has highest number of known mutantion on overall protain structure. Table 3 has summerized top 5 strains that have maximum number of known mutations. These strains have been isolated from Israel(3), United Arab Emirates(1), Bahrain(1), Iran(1) and Egypt(1). A detailed account of these mutations are reported in the supplementary information 1.
Table 3

Catalogue of sample accession ID by country.

RegionID
1BahrainEPI_ISL_487274
2BahrainEPI_ISL_486889
3BahrainEPI_ISL_483545
4BahrainEPI_ISL_510528
5BahrainEPI_ISL_483542
6BahrainEPI_ISL_483548
7BahrainEPI_ISL_483547
8BahrainEPI_ISL_483543
9BahrainEPI_ISL_485401
10BahrainEPI_ISL_487273
11CyprusEPI_ISL_463742
12CyprusEPI_ISL_463743
13CyprusEPI_ISL_463744
14CyprusEPI_ISL_463748
15CyprusEPI_ISL_463741
16CyprusEPI_ISL_463745
17CyprusEPI_ISL_463746
18CyprusEPI_ISL_463747
19EgyptEPI_ISL_482761
20EgyptEPI_ISL_479735
21EgyptEPI_ISL_479733
22EgyptEPI_ISL_479734
23EgyptEPI_ISL_510532
24EgyptEPI_ISL_430819
25EgyptEPI_ISL_430820
26EgyptEPI_ISL_479732
27EgyptEPI_ISL_482759
28EgyptEPI_ISL_482760
29IranEPI_ISL_424349
30IranEPI_ISL_445088
31IranEPI_ISL_507007
32IranEPI_ISL_514753
33IranEPI_ISL_437512
34IranEPI_ISL_442044
35IranEPI_ISL_442523
36IsraelEPI_ISL_435291
37IsraelEPI_ISL_514303
38IsraelEPI_ISL_514305
39IsraelEPI_ISL_514302
40IsraelEPI_ISL_435286
41IsraelEPI_ISL_435289
42IsraelEPI_ISL_419211
43IsraelEPI_ISL_435284
44IsraelEPI_ISL_514306
45IsraelEPI_ISL_514301
46JordanEPI_ISL_430012
47JordanEPI_ISL_430002
48JordanEPI_ISL_430003
49JordanEPI_ISL_430009
50JordanEPI_ISL_429993
51JordanEPI_ISL_450188
52JordanEPI_ISL_434516
53JordanEPI_ISL_429997
54JordanEPI_ISL_450186
55JordanEPI_ISL_450187
56KuwaitEPI_ISL_421652
57KuwaitEPI_ISL_422427
58KuwaitEPI_ISL_416543
59KuwaitEPI_ISL_416458
60KuwaitEPI_ISL_416541
61KuwaitEPI_ISL_416542
62KuwaitEPI_ISL_422426
63KuwaitEPI_ISL_422424
64LebanonEPI_ISL_498551
65LebanonEPI_ISL_498552
66LebanonEPI_ISL_498554
67LebanonEPI_ISL_450512
68LebanonEPI_ISL_450515
69LebanonEPI_ISL_450511
70LebanonEPI_ISL_450508
71LebanonEPI_ISL_450509
72OmanEPI_ISL_492023
73OmanEPI_ISL_492026
74OmanEPI_ISL_492024
75OmanEPI_ISL_492065
76OmanEPI_ISL_492025
77OmanEPI_ISL_457706
78OmanEPI_ISL_457704
79OmanEPI_ISL_457937
80OmanEPI_ISL_457701
81OmanEPI_ISL_457974
82QatarEPI_ISL_427404
83QatarEPI_ISL_427420
84QatarEPI_ISL_427407
85QatarEPI_ISL_427406
86QatarEPI_ISL_427419
87QatarEPI_ISL_427418
88QatarEPI_ISL_427405
89QatarEPI_ISL_427408
90QatarEPI_ISL_427417
91QatarEPI_ISL_427416
92SaudiArabiaEPI_ISL_512924
93SaudiArabiaEPI_ISL_512926
94SaudiArabiaEPI_ISL_489996
95SaudiArabiaEPI_ISL_489998
96SaudiArabiaEPI_ISL_490000
97SaudiArabiaEPI_ISL_489999
98SaudiArabiaEPI_ISL_489997
99SaudiArabiaEPI_ISL_512927
100SaudiArabiaEPI_ISL_512922
101SaudiArabiaEPI_ISL_512923
102TurkeyEPI_ISL_495421
103TurkeyEPI_ISL_495445
104TurkeyEPI_ISL_495433
105TurkeyEPI_ISL_429868
106TurkeyEPI_ISL_429867
107TurkeyEPI_ISL_429866
108TurkeyEPI_ISL_428712
109TurkeyEPI_ISL_495436
110TurkeyEPI_ISL_495429
111TurkeyEPI_ISL_424366
112UnitedArabEmiratesEPI_ISL_469277
113UnitedArabEmiratesEPI_ISL_469279
114UnitedArabEmiratesEPI_ISL_435126
115UnitedArabEmiratesEPI_ISL_435121
116UnitedArabEmiratesEPI_ISL_435131
117UnitedArabEmiratesEPI_ISL_463740
118UnitedArabEmiratesEPI_ISL_469281
119UnitedArabEmiratesEPI_ISL_469280
120UnitedArabEmiratesEPI_ISL_435137
121UnitedArabEmiratesEPI_ISL_435134
122WuhanEPI_ISL_402123
123WuhanEPI_ISL_454949
124WuhanEPI_ISL_454948
125WuhanEPI_ISL_406798
126WuhanEPI_ISL_454951
127WuhanEPI_ISL_403931
Catalogue of sample accession ID by country. The results on SNP analysis were then used to generate the dotplot of variant alignment (Fig. 1 ). The dotplot successfully indicated a pattern in variants that could not be easily identified from the alignment or annotation files. The alignment includes samples in facets based on their country. The alignment shows a pattern in variants that occur between each country. For example, this is prominent in Qatar, Jordan and Oman where the pattern makes the country distinctive from the variants plotted for other countries. In addition, the phylogenetic tree generated branching indicative of an Iranian origin (Fig. 2 ). All variants are in relation to Wuhan reference sequence NC_045512.2. The Wuhan samples are in the top facet and show low mutation frequency in comparison to the reference. The following samples show a greater accumulation of mutations. One Iranian sample has many more variants than orders. The Saudi Arabia also appears to have a low mutation rate which may be due to their early contraction of the virus. Oman samples also indicate evidence of a distinctive strain. Many cases feature a missense variant at 3037, 14408 and 24351 base pairs.
Fig. 1

Dotplot of variants per sample by country. All samples included in the above plot are grouped by country of origin. The left x-axis denotes the accession ID and the right x-axis denotes the country of origin. The y-axis is the position along the SARS-CoV-2 genome. The order of countries is based on date of first reported of COVID -19.

Fig. 2

Phylogenetic tree for the SARS-CoV-2 genomes. Contains all 128. samples (including reference NC_045512). The colours represent the greatest height in association. The blue lines represent the 95% HDP for each region. The highest branching point is with Iran samples EPI_ISL_507007 and EPI_ISL_514753 with length of 4.319 and 0.8752 respectively.

Dotplot of variants per sample by country. All samples included in the above plot are grouped by country of origin. The left x-axis denotes the accession ID and the right x-axis denotes the country of origin. The y-axis is the position along the SARS-CoV-2 genome. The order of countries is based on date of first reported of COVID -19. Phylogenetic tree for the SARS-CoV-2 genomes. Contains all 128. samples (including reference NC_045512). The colours represent the greatest height in association. The blue lines represent the 95% HDP for each region. The highest branching point is with Iran samples EPI_ISL_507007 and EPI_ISL_514753 with length of 4.319 and 0.8752 respectively.

Discussion

The aim of this study was to identify whether COVID-19 was introduced to the Middle East from Iran and also to explore the genomic composition in the region. Our study performs sequence alignment to compare all sequences against the reference genome. Once this is complete, the annotated variants were extracted to generate a plot mapping variants, grouping samples by country. The plot as seen in Fig. 1, shows clear distinctive patterns within countries that are not obvious from the generated alignment and annotation files. The variants found in different regions, ordered by their first reported case, are from United Arab Emirates (UAE), Egypt, Iran, Israel, Lebanon, Bahrain, Kuwait, Oman, Qatar, Jordan, Saudi Arabia, Turkey and Cyprus. Global travel plays an important role in the spreading of SARS-Cov2 in middle east where Dubai in the United Arab Emirates acts as a travel hub, as reported recently [23]. The variants at position 241, 3037, 24351, 11083 appear in many Middle Eastern countries but do not occur in Wuhan samples. This variant characterization may be useful in the fight against COVID-19 and the development of treatments. Identifying unique variants to a region may explain why treatment is working for some and not others, should the mutations have an effect on the delivery or the severity of the virus. On the basis of SNP analysis 10818delTinsG, 2772delCinsC, 14159delCinsC and 2789delAinsA are identified as high impact variants. The Iranian samples appear more diverse and interestingly do not share the mutation at 14408 which is evident in most samples [18]. Iranian sample EPI_ISL_507007 appears to have a high frequency of variants that is not seen in others. THE GISAID detection system indicates no faults were found with sample EPI_ISL_507007 and report a full sequence match so it was not removed. It is noteworthy that the Iran samples appear to have a lower average SNP frequency than other countries. This may indicate that the virus transmitted to Iran earlier than other countries, as we expected. Israel, Bahrain, Kuwait, Oman, Saudi Arabia, Jordan and Turkey all share similar variant mapping. Qatar shows an usual mapping with a high frequency of frameshift variants. This may indicate a diverse, new strain is circulating in the country. Cyprus has little diversity in the variant mapping which is surprising given its late date for first reported cases. CoVsurver analysis indicates various unique and known mutations have been identified in most of the samples from middle east in terms of amino acids change. Iran strain EPI_ISL_507007 has maximum number of unique mutations. Where as Israel isolate EPI_ISL_514306 has highest number of known mutant. Similarly, on the basis of mutations on the spike glycoprotein, Quatar strain EPI_ISL_427420 has highest number of mutations. Another interesting point is that time-varied samples were taken for countries with 10 samples. We see not indication that there are distinct groups within countries. This further indicates the mutation frequency is low. It also indicates that there is more variation in the genomic composition in samples from different countries than differences found in samples from different collection times. Smaller populations can cause greater accumulation of variants through genetic drift. This may occur given local lockdowns and travel restrictions that have been enforced worldwide which requires in depth planning [20]. It is possible that these genomic strains with new mutations may create a situation where the countries develop a deathly strain that is not prominent in other parts of the world. This could result in a situation where a country is disproportionately affected by accumulating deaths or an inefficient vaccine. Phylogenetic trees help in understanding the evolutionary relationships between groups. In the present context, we use them to identify the earliest strains and to track the spread of COVID-19 across the Middle East. The tree shows that UAE samples are distinguished and form one clade. This correlates with their early intervention and lockdown and subsequently appears to have resulted in a unique genome. Gómez-Carballa et al. Indicates the effect of world wide lockdown on SARS-CoV2 genome variations in presence of super spreaders [21,22]. Samples from Qatar also form the majority of 1 clade, with many of the Wuhan samples, indicating that they are similar to the Wuhan samples and show little distinction. Egypt also becomes a distinct branch earlier than most samples. These examples are indicative of the global response – the lockdown of each country and prevention of spreading has resulted in SARS-CoV-2 strains of great similarity within each country. If lockdowns were not enforced, it is likely that these clades would be less distinguisable as mutations are spread between countries. Though the lockdowns has other effects on various socio-econimical aspects [17,19]. As we expected, 2 of the highest branches points attach to Iranian samples, further implementing Iran in the initial spread across the Middle East. The phylogenetic tree therefore indicates what we suggest in our hypothesis – most samples originate from the Iranian sample. This is not surprising given the vast number of cases and early crisis state of the country. However, it is useful to see that the variant analysis shows what we suspect at the genome level. A related study also came to this conclusion by using contact tracing from cases related to religious events in the city of Qom, Iran [16].

CRediT authorship contribution statement

Khalid Mubarak Bindayna: Conceptualization, Data curation, Curation, Writing - original draft, Writing - review & editing, Visualization. Shane Crinion: Methodology, Software, Writing - original draft, and sharing writing a draft.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: We have no conflict of interest. This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
StrainsKnown mutations
EPI_ISL_463740NSP1_D75E, NSP3_P153L, NSP8_M129I, NSP14_F233L, NS8_V62L, NS8_L84S, N_A208G, N_T393I
EPI_ISL_486889NSP3_P109L, NSP3_A994D, NSP4_S481L, NSP12_P323L, NSP14_P412S, Spike_D614G, N_G204R, N_R203K
EPI_ISL_507007NSP2_A361V, NSP3_T1482I, NSP4_T73I, NSP8_R51H, NSP12_P323L, Spike_D614G, Spike_S1147L, NS3_Q57H, N_S194L
EPI_ISL_514303NSP2_T85I, NSP6_L37F, NSP12_P323L, NSP16_P236S, Spike_D614G, Spike_T95I, NS3_Q57H, NS3_G44V, N_G25C
EPI_ISL_514305NSP12_M666I, NSP12_P323L, NSP14_E204D, Spike_D614G, NS3_W131C, NS3_K75 N, N_S193I, N_G204R, N_R203K
EPI_ISL_514306NSP2_T85I, NSP7_S25L, NSP12_M666I, NSP12_P323L, NSP14_A320V, Spike_D614G, NS3_Q57H, NS3_K75 N, N_S193I, N_G243C, N_G204R, N_R203K
EPI_ISL_479733NSP3_T428I, NSP5_G15S, NSP8_T148I, NSP12_P323L, Spike_D614G, Spike_Q677H, N_G212V, N_R203K
  17 in total

1.  A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3.

Authors:  Pablo Cingolani; Adrian Platts; Le Lily Wang; Melissa Coon; Tung Nguyen; Luan Wang; Susan J Land; Xiangyi Lu; Douglas M Ruden
Journal:  Fly (Austin)       Date:  2012 Apr-Jun       Impact factor: 2.160

2.  Is visiting Qom spread CoVID-19 epidemic in the Middle East?

Authors:  N Al-Rousan; H Al-Najjar
Journal:  Eur Rev Med Pharmacol Sci       Date:  2020-05       Impact factor: 3.507

3.  Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega.

Authors:  Fabian Sievers; Andreas Wilm; David Dineen; Toby J Gibson; Kevin Karplus; Weizhong Li; Rodrigo Lopez; Hamish McWilliam; Michael Remmert; Johannes Söding; Julie D Thompson; Desmond G Higgins
Journal:  Mol Syst Biol       Date:  2011-10-11       Impact factor: 11.429

4.  SNP-sites: rapid efficient extraction of SNPs from multi-FASTA alignments.

Authors:  Andrew J Page; Ben Taylor; Aidan J Delaney; Jorge Soares; Torsten Seemann; Jacqueline A Keane; Simon R Harris
Journal:  Microb Genom       Date:  2016-04-29

5.  Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10.

Authors:  Marc A Suchard; Philippe Lemey; Guy Baele; Daniel L Ayres; Alexei J Drummond; Andrew Rambaut
Journal:  Virus Evol       Date:  2018-06-08

6.  The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update.

Authors:  Enis Afgan; Dannon Baker; Bérénice Batut; Marius van den Beek; Dave Bouvier; Martin Cech; John Chilton; Dave Clements; Nate Coraor; Björn A Grüning; Aysam Guerler; Jennifer Hillman-Jackson; Saskia Hiltemann; Vahid Jalili; Helena Rasche; Nicola Soranzo; Jeremy Goecks; James Taylor; Anton Nekrutenko; Daniel Blankenberg
Journal:  Nucleic Acids Res       Date:  2018-07-02       Impact factor: 16.971

7.  Phylogenetic Analysis and Structural Modeling of SARS-CoV-2 Spike Protein Reveals an Evolutionary Distinct and Proteolytically Sensitive Activation Loop.

Authors:  Javier A Jaimes; Nicole M André; Joshua S Chappie; Jean K Millet; Gary R Whittaker
Journal:  J Mol Biol       Date:  2020-04-19       Impact factor: 5.469

8.  Eating Habits and Lifestyle during COVID-19 Lockdown in the United Arab Emirates: A Cross-Sectional Study.

Authors:  Leila Cheikh Ismail; Tareq M Osaili; Maysm N Mohamad; Amina Al Marzouqi; Amjad H Jarrar; Dima O Abu Jamous; Emmanuella Magriplis; Habiba I Ali; Haleama Al Sabbah; Hayder Hasan; Latifa M R AlMarzooqi; Lily Stojanovska; Mona Hashim; Reyad R Shaker Obaid; Sheima T Saleh; Ayesha S Al Dhaheri
Journal:  Nutrients       Date:  2020-10-29       Impact factor: 5.717

9.  Impact of COVID-19 lockdown upon the air quality and surface urban heat island intensity over the United Arab Emirates.

Authors:  Abduldaem S Alqasemi; Mohamed E Hereher; Gordana Kaplan; Ayad M Fadhil Al-Quraishi; Hakim Saibi
Journal:  Sci Total Environ       Date:  2020-12-25       Impact factor: 7.963

View more
  5 in total

1.  A simulation of geographic distribution for the emergence of consequential SARS-CoV-2 variant lineages.

Authors:  Tetsuya Akaishi; Tadashi Ishii
Journal:  Sci Rep       Date:  2022-06-15       Impact factor: 4.996

2.  SARS-CoV-2 in hospital air as revealed by comprehensive respiratory viral panel sequencing.

Authors:  Nazima Habibi; Saif Uddin; Montaha Behbehani; Nasreem Abdul Razzack; Farhana Zakir; Anisha Shajan
Journal:  Infect Prev Pract       Date:  2021-12-27

3.  A Comparative Study between Spanish and British SARS-CoV-2 Variants.

Authors:  Jose A Jimenez Ruiz; Cecilia Lopez Ramirez; Jose Luis Lopez-Campos
Journal:  Curr Issues Mol Biol       Date:  2021-11-16       Impact factor: 2.976

4.  Tracking SARS-CoV-2 Spike Protein Mutations in the United States (January 2020-March 2021) Using a Statistical Learning Strategy.

Authors:  Lue Ping Zhao; Terry P Lybrand; Peter B Gilbert; Thomas R Hawn; Joshua T Schiffer; Leonidas Stamatatos; Thomas H Payne; Lindsay N Carpp; Daniel E Geraghty; Keith R Jerome
Journal:  Viruses       Date:  2021-12-21       Impact factor: 5.818

5.  Tracking SARS-CoV-2 Spike Protein Mutations in the United States (2020/01 - 2021/03) Using a Statistical Learning Strategy.

Authors:  Lue Ping Zhao; Terry P Lybrand; Peter B Gilbert; Thomas R Hawn; Joshua T Schiffer; Leonidas Stamatatos; Thomas H Payne; Lindsay N Carpp; Daniel E Geraghty; Keith R Jerome
Journal:  bioRxiv       Date:  2021-06-15
  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.