Literature DB >> 33835709

SARS-CoV-2 outbreak in Iran: The dynamics of the epidemic and evidence on two independent introductions.

Zohreh Fattahi^1,2, Marzieh Mohseni^1,2,3, Khadijeh Jalalvand¹, Fatemeh Aghakhani Moghadam¹, Azam Ghaziasadi^4,5, Fatemeh Keshavarzi¹, Jila Yavarian⁴, Ali Jafarpour^4,5, Seyedeh Elham Mortazavi⁶, Fatemeh Ghodratpour¹, Hanieh Behravan¹, Mohammad Khazeni^4,7, Seyed Amir Momeni⁷, Issa Jahanzad⁸, Abdolvahab Moradi⁹, Alijan Tabarraei⁹, Sadegh Ali Azimi⁹, Ebrahim Kord¹⁰, Seyed Mohammad Hashemi-Shahri¹⁰, Azarakhsh Azaran¹¹, Farid Yousefi¹¹, Zakiye Mokhames¹², Alireza Soleimani¹², Shokouh Ghafari¹³, Masood Ziaee¹³, Shahram Habibzadeh¹⁴, Farhad Jeddi¹⁴, Azar Hadadi¹⁵, Alireza Abdollahi¹⁶, Gholam Abbas Kaydani¹⁷, Saber Soltani^4,5, Talat Mokhtari-Azad⁴, Reza Najafipour¹⁸, Reza Malekzadeh¹⁹, Kimia Kahrizi¹, Seyed Mohammad Jazayeri^4,5, Hossein Najmabadi^1,2.

Abstract

The SARS-CoV-2 virus has been rapidly spreading globally since December 2019, triggering a pandemic, soon after its emergence. While Iran was among the first countries confronted with rapid spread of virus in February 2020, no real-time SARS-CoV-2 whole-genome tracking in early phase of outbreak was performed in the country. To address this issue, we provided 50 whole-genome sequences of viral isolates ascertained from different geographical locations in Iran during March-July 2020. The corresponding analysis on origins, transmission dynamics and genetic diversity of SARS-CoV-2 virus, represented at least two introductions of the virus into the country, constructing two major clusters defined as B.4 and B.1*. The first entry of the virus might have occurred around very late 2019/early 2020, as suggested by the time to the most recent common ancestor, followed by a rapid community transmission that led to dominancy of B.4 lineage in early epidemic till the end of June. Gradually, reduction in dominancy of B.4 occurred possibly as a result of other entries of the virus, followed by surge of B.1* lineages, as of mid-May. Remarkably, variation tracking of the virus indicated the increase in frequency of D614G mutation, along with B.1* lineages, which showed continuity till October 2020. The increase in frequency of D614G mutation and B.1* lineages from mid-May onwards predicts a rapid viral transmission that may push the country into a critical health situation followed by a considerable change in composition of viral lineages circulating in the country.

Entities: Chemical

Keywords: COVID-19; Iran; SARS-CoV-2; phylogenetic study; whole genome sequencing

Mesh：

Year: 2021 PMID： 33835709 PMCID： PMC8251331 DOI： 10.1111/tbed.14104

Source DB: PubMed Journal: Transbound Emerg Dis ISSN： 1865-1674 Impact factor: 4.521

INTRODUCTION

The coronavirus disease 2019 (COVID‐19) pandemic caused by SARS‐CoV‐2 (Zhu et al., 2020) has, as of 23 October 2020, exceeded one million deaths, while infecting 42,334,976 people worldwide. Real‐time whole‐genome sequencing of this emerging virus was commenced at the early phase of outbreak globally, as it can accurately evaluate the magnitude of transmission, offering insights for management of the epidemic (Oude Munnink et al., 2020). Therefore, an increasing number of sequences are being deposited in the global initiative on sharing all influenza data (GISAID) (Shu & McCauley, 2017). Iran was among the first countries confronting the rapid virus spread. The first COVID‐19 confirmed patient and death was reported on 19 February 2020, from Qom city. Local transmission to neighbouring provinces was reported just a day after and then the disease spread shortly Iran wide. The first outbreak peak dropped in April, but soon after relaxing the lockdown, the country experienced another notable increase in SARS‐CoV‐2 cases in May, which is still sustained. As of 23 October, 556,891 infected cases and 31,985 deaths have been reported officially and there is a concern about the increase in mortality rate in autumn and winter (https://www.worldometers.info/coronavirus/country/iran/). Although the outbreak in Iran initiated early in February along with Italy and South Korea, no real‐time SARS‐CoV‐2 whole‐genome tracking was performed in the first months of epidemic. The first virus sequence from Iran was deposited in GISAID (EPI_ISL_424349) on April 4. In the same month, the study by Eden et al. revealed three major substitutions of G1397A, T28688C and G29742T in genomes of patients with travel history to Iran, which constitute a distinct clade representative of the specific viral diversity present in Iran at that time (Eden et al., 2020). As of 23 October 2020, there were only eight complete genomes available in GISAID, not sufficient for tracking the virus in the country, and the only epidemiologic study of Iranian outbreak used the genomic sequences ascertained from travellers to Iran, which estimated 21/01/2020 (95% HPD: 05/12/2019–14/02/2020) as the start of epidemic in Iran with a doubling time of 3 days (95% HPD: 1.68–16.27) (Ghafari et al., 2020). To address this issue, we performed genome sequencing of 50 SARS‐CoV‐2 samples ascertained from different geographical locations and especially in early time intervals of the epidemic in Iran. We aimed at improving the understanding of the origins and transmission dynamics, circulating lineages and variation tracking of SARS‐CoV‐2 outbreak at early phase of Iranian epidemic, using molecular and phylogenetic methods.

MATERIALS AND METHODS

Specimen recruitment

We recruited 50 SARS‐CoV‐2 RNA samples, obtained as part of clinical testing in different referral centres of the following provinces: Alborz (n = 2), Ardabil (n = 4), Gilan (n = 6), Tehran (n = 27), Khuzestan (n = 3), Qom (n = 1), Sistan and Baluchestan (n = 3) and South Khorasan (n = 4). All patients were referred between March and July 2020, with clinical presentations of COVID‐19 disease, confirmed by real‐time RT‐PCR assay at those corresponding local centres.

Sequencing and Genome assembly

Whole‐genome sequencing of SARS‐CoV‐2 RNA samples was performed by targeted enrichment using CleanPlex® SARS‐CoV‐2 Research and Surveillance Panel (Paragon Genomics, Inc.). All samples were paired‐end sequenced on Illumina MiSeq instrument using 300‐cycle MiSeq v2 reagent kits (Illumina, Inc.), generating 5.4 Gb of data (94.5% of bases > = Q30). Initially, FASTQ files were assessed by FastQC (Andrews, 2010) and then pre‐processed using Fastp (Chen et al., 2018). The sequences were aligned to the SARS‐CoV‐2 reference genome (NC_045512.2) using Bowtie2 (Langmead & Salzberg, 2012) and keeping the reads mapped in proper pair. The resultant filtered BAM files were used for assembly of consensus SARS‐CoV‐2 sequences with Samtools mpileup and Bcftools (Li et al., 2009.). Finally, the consensus FASTQ files were converted into FASTA format by Seqtk (https://github.com/lh3/seqtk), masking bases with quality lower than 20 to ambiguous nucleotides (N).

Lineage assignment

In addition to 50 sequenced samples in this project, eight other SARS‐CoV‐2 Iranian sequences from GISAID were subjected to lineage assignment. We applied Pangolin v2.0.7 (Rambaut et al., 2020), CoV‐GLUE (Singer et al., 2020) and NextClade v.0.6.0 (Hadfield et al., 2018) to assign the global lineages present in Iranian SARS‐CoV‐2 outbreak.

Phylogenetic analysis

BEAST v1.10.4 was used to construct a phylogenetic tree and to estimate the most recent common ancestor (TMRCA) (Drummond & Rambaut, 2007). First, consensus sequences were evaluated by NextClade (Hadfield et al., 2018), and sequences containing >5% ambiguous nucleotides and bearing private mutations above the threshold were excluded from downstream analysis. To explore the temporal signal, high‐quality FASTA files were aligned by MAFFT v7.407 using the FFT‐NS‐2 algorithm (Katoh & Standley, 2013), and then, a maximum‐likelihood phylogenetic tree was built applying IQ‐TREE v2.1.1 with GTR + gamma model (Minh et al., 2020) and temporal signal was then explored by TempEst (Rambaut et al., 2016). Eventually, BEAST was used to estimate TMRCA and construct a Bayesian phylogenetic tree of 45 sequences, plus Wuhan‐1 patient (EPI_ISL_402125) as outgroup, using a simple model consisting of HKYγ codon partitioned 1 + 2, 3 substitution model, strict clock and coalescent exponential growth tree prior. Maximum clade credibility (MCC) tree was then made with 10% burn‐in from two separate Markov chain‐Monte Carlo runs (Drummond and Rambaut, 2007). To trace possible sources of SARS‐CoV‐2 entry into Iran, an additional phylogenetic tree was constructed based on a total of 261 samples including a list of high‐quality genomes in GISAID (Rambaut, 2020) from the start of epidemic till the end of February, and random subsets of samples in GISAID during March–June interval, that were selected using mothur (Schloss et al., 2009).

Variant analysis

Variant analysis was performed by CoV‐GLUE, relative to the reference sequence (NC_045512.1) (Singer et al., 2020). In total, 53 samples were investigated after excluding the sequences bearing private mutations above the threshold. Moreover, to track the renowned D614G mutation frequency after July 2020, sanger sequencing of additional 67 SARS‐CoV2 positive samples collected during July–October, was performed using the following primer pairs designed by ARTIC network (https://github.com/artic‐network/artic‐ncov2019/tree/master/primer_schemes/nCoV‐2019/V3); 5ʹ‐CCAGCAACTGTTTGTGGACCTA‐3ʹ, 5ʹ ‐CAGCCCCTATTAAACAGCCTGC‐3ʹ.

RESULTS

Genome assembly and data availability

In this study, we obtained 44 high‐quality SARS‐CoV‐2 sequences (>98% of the genome is complete) from the Iranian outbreak. The remaining six samples covered >82% of the NC_045512.2 genome sequence. The metadata information including the geographical location, collection date, age, gender, CT value, specimen source, percentage of ambiguous nucleotides (N) for each genome, percentage of genome coverage compared to NC_045512.2 and the exact lineages defined for each sample are provided in Table S1. Lineage assignment with Pangolin and CoV‐GLUE, although slightly different for some samples, both yielded B.4 as the dominant lineage circulating in Iranian SARS‐CoV‐2 outbreak; comprising 75.9% and 74%, respectively (see Figure 1a for spectrum of circulating lineages and Table S1 for exact lineage of each sample).

FIGURE 1

(a) Lineages assignment of 58 SARS‐CoV‐2 sequences from the Iranian outbreak. Abundance of SARS‐CoV‐2 lineages over time from March to the end of June 2020 indicates a reduction in dominancy of the B.4 lineage. Trend of circulating lineages assigned by (b) Pangolin v2.0.7, C. CoV‐GLUE and D. NextClade v0.6.0 This is consistent with previous reports (Eden et al., 2020), as also the majority (83%) of SARS‐CoV‐2 sequences in GISAID that were primarily exposed in Iran, belonged to the B.4 lineage (Table S2). Additionally, the allocated lineages provided by NextClade showed a dominancy of 19A major clade (77.6%), one of the most prevalent clades during the early phase of outbreak, especially in Asia. The dominancy of this clade in the Iranian outbreak is consistent with Iran being one of the first countries infected by the virus. Therefore, our results confirm B.4 as the dominant lineage in Iranian outbreak during the February–June 2020 interval and introduce B, B.1, B.1.* and B.4 as the circulating lineages in the country. Sequencing more SARS‐CoV‐2 samples from the end of June onwards is required to evaluate whether B.4 still persists as the dominant lineage. However, as shown in Figure 1, the current data already exhibit a trend towards the appearance of other SARS‐CoV‐2 lineages and reduction in dominancy of B.4. As of May, the 20A clade, being the dominant European clade in early 2020, started to appear and more B.1* lineages can be observed in Iranian epidemic. This is explicable by new sources of virus introductions to the country at that time interval. Its appearance could also be due to SARS‐CoV‐2 genome mutations in the existing Iranian B lineages, which were circulating alongside the B.4 lineage in early phase of the epidemic, although with a lower proportion.

SARS‐CoV‐2 entry and circulating lineages

The phylogenetic tree of SARS‐CoV‐2 genomes from the Iranian outbreak clearly revealed two different circulating clusters (Figure 2), suggesting at least two separate introductions into the country.

FIGURE 2

Tempo‐spatial phylogenetic tree of SARS‐CoV‐2 emergence in Iran

Tempo‐spatial phylogenetic tree of SARS‐CoV‐2 emergence in Iran The older green cluster is comprised of 36 genomes almost all of the B.4 lineage [B.4/19A], carrying [G1397A‐T28688C‐G29742T] substitutions (Eden et al., 2020). These genomes were spread across different geographical regions including as follows: Alborz (n = 2), Gilan (n = 4), Khuzestan (n = 3), Qom (n = 1), Sistan and Baluchestan (n = 3), Semnan (n = 1), South Khorasan (n = 1), Tehran (n = 18) and unknown (n = 3). This indicates that the [B.4/19A] cluster originated very early in 2020 the latest and began circulating around the country thereafter, reflecting multiple local transmissions. Additionally, there are two samples with [B/19A] lineage in this older cluster. These samples were collected back in early March, showing that in the first phase of pandemic; at least two different lineages entered into the country. The [B/19A] samples also carry the G1397A and G29742T substitutions but not T28688C. The red cluster is comprised of nine genomes the entire [B.1.*/20A] lineage and collected after May 15, compatible with the pattern shown in Figure 1. The [B.1.*/20A] samples did not show [G1397A‐T28688C‐G29742T] substitutions but instead harboured [C241T‐C3037T‐C14408T‐A23403G] or [C241T‐C3037T‐C14408T‐A23403G‐G25563T], which are the common patterns of variant co‐occurrence of B.1 and B.1.* lineages in Europe and North America (Mercatelli & Giorgi, 2020).

TMRCA estimates

The TMRCA of B4 clade was estimated as 29–12–2019 with 95% highest posterior density (HPD) intervals of [03–11–2019 to 06–02–2020], considering Wuhan‐1 sample (EPI_ISL_402125) as an outgroup. Additionally, to track the appearance of [B.1.*/20A] cluster in the country, the TMRCA of the nine B.1 samples, placing all the other genomic samples in this study as outgroup was estimated. The TMRCA was 22–02–2020, with HPD intervals of [12–01–2020 to 29–03–020]. Clearly, more high‐quality B.1* genomes are required to narrow the credible interval and predict a more precise TMRCA for entry of this lineage. However, the above results still indicate that the new lineage might have been introduced separately, after the entry of B.4, and then gradually increased in the population, becoming detectable in our cohort since mid‐May.

Sources of SARS‐CoV‐2 entries

Subsequent analysis in the context of 216 genomes from around the world clarified the location of two main clusters among the global samples (Figure 3).

FIGURE 3

Radial phylogenetic tree of SARS‐CoV‐2 genomes from the Iranian outbreak in the context of 216 set of genomes from around the world. The major B.4 cluster in the Iranian SARS‐CoV‐2 outbreak is highlighted in light green. The [B.1.*/20A] cluster is highlighted in light red. The red circles denote the two B.4 samples from China located near the major Iranian cluster. The green circles denote the five B.4 samples from Australia, Canada and New Zealand located within the major Iranian cluster As expected, the B.4 cluster was linked to the very early samples collected in January–February and mostly in China. This supports the hypothesis of an early virus introduction to Iran and most likely from China, which is consistent with Iran's health ministry statements that the virus was brought from China by travellers (https://en.wikipedia.org/wiki/COVID‐19_pandemic_in_Iran). Interestingly, the Iranian B.4 cluster is closely linked to the two B.4 samples collected on mid‐January (19–01–2020 and 18–01–2020) in Hubei/Wuhan and Shandong/Qingdao in China (EPI_ISL_408482 and EPI_ISL_412981). This suggests that the three [G1397A‐T28688C‐G29742T] substitutions might have occurred before their introduction to Iran, subsequently becoming the major lineage and driving the epidemic in the country. Afterwards, the virus was transferred to the other countries, such as Canada, Australia and New Zealand, by travellers (Eden et al., 2020). This is now confirmed by locating five samples from these countries within the major Iranian cluster (EPI_ISL_412965, EPI_ISL_413213, EPI_ISL_412975, EPI_ISL_413214 and EPI_ISL_413490). All these five samples were collected in late February having a travel history to Iran (Figure 3, Figure 4).

FIGURE 4

The zoomed and collapsed phylogenetic tree of SARS‐CoV‐2 genomes from the Iranian outbreak in the context of 216 set of genomes from around the world. The two main clusters circulating in Iran are zoomed. The major B.4 cluster in SARS‐CoV‐2 Iranian outbreak is highlighted in light green and the lines corresponding to Iranian samples within this cluster are also shown in green while other global samples are shown in red lines. The [B.1.*/20A] cluster is highlighted in light red, and the lines corresponding to Iranian samples within this cluster are also shown in red, while other global samples are shown in blue lines The [B.1*/20A] cluster localizes in a completely different position, namely among the samples from various parts of the world, prominently Europe. This corroborates the hypothesis of new sources of virus introduction to the country (probably in late February as suggested by TMRCA of B.1* samples) before suspension of air flights; as the international travel lockdown was started at 23 February 2020 for some neighbouring countries and expanded until 8 March 2020 by more countries. Furthermore, important foreign and Iranian airlines transferring passengers between Iran, Europe and North America were suspended on February 25 and March 8, respectively.

Variant analysis of SARS‐CoV‐2 genomes from Iranian outbreak

Common variants

We detected 14 different variants with >10% frequency in SARS‐CoV‐2 genomes from Iranian outbreak (Table 1). Notably, just four variants, namely G1397A, T28688C, G29742T and G11083T, contributed to >70% of samples; comprising the common co‐occurrence of variants, [G1397A‐T28688C‐G29742T] in B.4 lineage.

TABLE 1

Common and novel variants observed in SARS‐CoV‐2 genomes of the Iranian outbreak

No	Genomic change	Type of mutation	Gene/protein	Amino acid change	No. of samples	Sample lineages	Description
Common SARS‐CoV−2 variants observed in early phase of Iranian outbreak
1	G1397A	Non‐synonymous	nsp2	V198I	43 (81%)	B.4, B	Known coexistence of variants, constituting B.4 lineage, as suggested by Eden et al., 2020.
2	T28688C	Synonymous	N	L139L	41 (77%)	B.4
3	G29742T	Non‐coding	3ʹUTR	NA	43 (81%)	B.4, B
4	G11083T	Non‐synonymous	nsp6	L37F	39 (74%)	B.4, B	The most common mutation in Asia, during December 2019‐March 2020.
5	C241T	Non‐coding	5ʹ‐UTR	NA	10 (19%)	B.1*	Known coexistence of variants, constituting B.1 (G) and B.1.* (GH) clades, according to Mercatelli & Giorgi, 2020.
6	C3037T	Synonymous	nsp3	F106F	10 (19%)	B.1*
7	C14408T	Non‐synonymous	nsp12 (RdRp)	P323L	10 (19%)	B.1*
8	A23403G	Non‐synonymous	S	D614G	13 (24.5%)	B.1* (n = 10), B.4 (n = 3)
9	G25563T	Non‐synonymous	ORF3a	Q57H	10 (19%)	B.1*, B.4, B
Unique SARS‐CoV−2 haplotypes observed in early phase of Iranian outbreak
10	G20887A	Non‐synonymous	nsp16	G77R	9 (17%)	B.4	New coexistence of variants observed in same nine samples also carrying B.4 common variants.
11	C28830T	Non‐synonymous	N	S186F	9 (17%)	B.4
12	C21627T	Non‐synonymous	S	T22I	9 (17%)	B.4
13	G8653T	Non‐synonymous	nsp4	M33I	6 (11%)	B.4	New coexistence of variants observed in same six samples also carrying B.4 common variants.
14	C884T	Non‐synonymous	nsp2	R27C	6 (11%)	B.4
Unique SARS‐CoV−2 variants observed in early phase of Iranian outbreak
15	C28388G	Non‐synonymous	N	Q39E	1	B.1.1/20B	Known variants located at the same position: Q39L/ Q39H/ Q39R/ Q39*
16	G18712A	Non‐synonymous	nsp14	A225T	1	B.4/19A	Known variants located at the same position: A225D / A225S
17	T3926C	Non‐synonymous	nsp3	S403P	1	B.4/19A	Known variants located at the same position: S403L / S403A
18	G6461A	Non‐synonymous	nsp3	V1248M	1	B.4/19A	Known variants located at the same position: V1248G / V1248L

Abbreviations: N, nucleocapsid phosphoprotein; nsp14, 3'‐to‐5' exonuclease; nsp16, 2'‐O‐ribose methyltransferase; nsp2, Non‐Structural protein 2; nsp3, Predicted phosphoesterase, papain‐like proteinase; nsp4, Transmembrane protein; nsp6, Transmembrane protein; ORF, open reading frame; RdRp, RNA‐dependent RNA polymerase; S, Spike glycoprotein.

Common and novel variants observed in SARS‐CoV‐2 genomes of the Iranian outbreak Known coexistence of variants, constituting B.4 lineage, as suggested by Eden et al., 2020. Known coexistence of variants, constituting B.1 (G) and B.1.* (GH) clades, according to Mercatelli & Giorgi, 2020. New coexistence of variants observed in same nine samples also carrying B.4 common variants. New coexistence of variants observed in same six samples also carrying B.4 common variants. Abbreviations: N, nucleocapsid phosphoprotein; nsp14, 3'‐to‐5' exonuclease; nsp16, 2'‐O‐ribose methyltransferase; nsp2, Non‐Structural protein 2; nsp3, Predicted phosphoesterase, papain‐like proteinase; nsp4, Transmembrane protein; nsp6, Transmembrane protein; ORF, open reading frame; RdRp, RNA‐dependent RNA polymerase; S, Spike glycoprotein. G11083T is one of the most frequent mutations observed in Asia from December 2019 to March 2020 (Koyama et al., 2020; Mercatelli and Giorgi, 2020) and, not surprisingly, is observed mostly along with B.4 substitutions. Other less frequent variants were observed in ~19% of samples constituting the known co‐occurrence of variants; [C241T‐C3037T‐C14408T‐A23403G] and [C241T‐C3037T‐C14408T‐A23403G‐G25563T], occurring in clade G (B.1), which is prevalent in Europe, Oceania, South America and Africa and clade GH (B.1.*), prevalent in North America (Mercatelli and Giorgi, 2020).

Novel variants

Remarkably, we observed two specific haplotypes; the co‐occurrence of [G20887A‐C28830T‐C21627T] and [G8653T‐C884T] variants in 17% and 11% of samples, respectively. Both groups also carried B.4 [G1397A‐T28688C‐G29742T] variants. These haplotypes are less frequent in CoV‐GLU and 48,635 genomes investigated by Mercatelli & Giorgi (Mercatelli and Giorgi, 2020; Singer et al., 2020). However, further investigations are required to assess their extent of significance into SARS‐CoV2 genetic diversity in Iran. Analysis of viral isolates also revealed four samples harbouring unique variants (Table 1), not detected in sequences from SARS‐CoV‐2 pandemic (Mercatelli and Giorgi, 2020; Singer et al., 2020). Although the exact variants were unique, different missense variants at the same location were identified in other viral sequences around the world.

Variants located in spike protein

Spike is the key glycoprotein mediating entry of the virus to the cell and, therefore, the target of most vaccine strategies (Korber et al., 2020). We thus focussed on variants located in spike protein of viral isolates from Iran and identified spike variants in 28 samples (53%), in which D614G and T22I were occurring at higher frequencies of 24.5% and 17%, respectively (Table 2). T22I mutation was only observed in the [B.4/19A] cluster. D614G, the most prevalent mutation globally, was also the most prevalent spike mutation in viral isolates from Iran, while showing an increasing trend from mid‐May, observed mostly within the [B.1.*/20A] cluster. Remarkably, co‐occurrence of D614G mutation with B.4 [G1397A‐T28688C‐G29742T] and G11083T variants was also observed in three samples.

TABLE 2

Variants located in the spike, observed in SARS‐CoV‐2 genomes of the Iranian outbreak

Genomic change	Type of mutation	Gene/protein	Amino acid change	No. of samples	Sample lineages
A23403G	Non‐synonymous	S (S1)	D614G	13 (24.5%)	B.1/20A, B.1/20B, B.4/19A
C21627T	Non‐synonymous	S (S1)	T22I	9 (17%)	B.4/19A
G22100A	Non‐synonymous	S (S1)	E180K	2 (4%)	B.4/19A
G22592T	Non‐synonymous	S (S1‐RBD Domain)	A344S	1 (2%)	B.4/19A
C23679T	Non‐synonymous	S (S2)	A706V	1 (2%)	B.4/19A
G24348T	Non‐synonymous	S (S2)	S929I	1 (2%)	B.4/19A
G25249T	Non‐synonymous	S (S2)	M1229I	1 (2%)	B.4/19A

Abbreviation: S, spike glycoprotein.

Variants located in the spike, observed in SARS‐CoV‐2 genomes of the Iranian outbreak Abbreviation: S, spike glycoprotein. Moreover, Sanger sequencing of additional 67 SARS‐CoV2 positive samples confirmed the increase in D614G frequency till October, becoming the dominant mutation in the Iranian outbreak (Figure 5).

FIGURE 5

The frequency of D614G mutation during March–October interval in Iranian SARS‐CoV‐2 outbreak

DISCUSSION

The current study is the first comprehensive analysis of SARS‐CoV‐2 full genomes obtained from Iranian outbreak. Regarding the importance of real‐time sequencing of emerging viruses, and lack of full genomes from Iran, this study was designed to provide 50 SARS‐CoV‐2 full genome sequences of the early time interval of epidemic in Iran. Lineage assignment and phylogenetic analysis of these sequences clarified the origins and transmission dynamics of SARS‐CoV‐2 outbreak in Iran, in which two major introductions into the country were detected, constructing two major clusters of the virus in different times, followed by rapid community transmission throughout the country. These data confirm the B.4 as the dominant lineage in Iran, from the start of epidemic till the end of June. This lineage was primarily introduced as a prevalent distinct clade in Australian travellers from Iran in early April (Eden et al., 2020). Since then, it is recognized as the Iranian epidemic. Nonetheless, no specific study on Iranian SARS‐CoV‐2 samples was performed so far, which is addressed in this study. Furthermore, the B.4 clade TMRCA in this study along with the previous epidemiologic study that used genomic samples ascertained from travellers to Iran (Ghafari et al., 2020) may propose the possibility of unrecognized transmission of the virus in the country due to presence of asymptomatic or misdiagnosed patients or limited testing capacities for more than 1 month prior to official report of first COVID‐19 patients in the country. Therefore, the B.4 lineage originated first in late 2019/early 2020 followed by multiple local transmissions, developing the major SARS‐CoV‐2 clade in the beginning of outbreak. Our data suggest a reduction in B.4 dominancy, followed by a surge of B.1.* lineages, which has been exported from Europe globally. This new cluster might be explained by new sources of virus entries before suspension of flights in late February or being the mutant product of B lineage in early phase. In addition to outbreak tracing, these full genome sequences can provide beneficial information in understanding the genome diversity of viral isolates in Iran, helpful in adapting more specific diagnostic tests, therapeutic approaches and vaccines. Generally, RNA viruses are known to have a high mutation rate, explained by lack of proofreading activity of RNA‐dependent RNA polymerase (RdRp). However, coronaviruses are among the exceptions, with lower mutation rates due to presence of RdRp‐independent proofreading activity (Ahmadpour et al., 2020; Peck and Lauring, 2018). The mutation rate of 1.12 × 10–3 mutations per site year (Koyama et al., 2020) and the average 7.23 mutations per sample (Mercatelli and Giorgi, 2020) are in support of moderate mutation rate of SARS‐CoV‐2. Variation tracking of SARS‐CoV‐2 has shown that some mutations—such as P323L and D614G—are distributed globally, while some others are accumulated in specific geographical regions (Kannan et al., 2020). Investigation of prevalent mutations in early phase of outbreak in Iran indicated the co‐occurrence of some widespread mutations consistent with the two main lineages in the country. Furthermore, the unique mutations and also haplotypes of [G20887A‐C28830T‐C21627T] or [G8653T‐C884T] with [G1397A‐T28688C‐G29742T] were identified in low proportion of samples. Therefore, none of those could be considered as adapted geographically in Iranian SARS‐CoV‐2 samples. Indeed, massive sequencing of a larger cohort is a requisite for investigating the significance of these variants. Moreover, the impact of these country‐specific variants (yet to be defined) on the behaviour of virus (replication efficiency, virulence. etc.) needs further investigation. Despite the relatively low mutation rate of SARS‐CoV‐2, still a total of 353,341 mutations were identified in 48,635 SARS‐CoV‐2 genomes (Mercatelli and Giorgi, 2020). Among these, studying the mutations in spike protein is necessary, as this immunogenic structural protein mediates the virus entry to the host cells via interacting with cellular receptors such as angiotensin‐converting enzyme 2 (ACE2) and as it plays a key role in induction of neutralizing antibodies (Dearlove et al., 2020 ; Franco‐Muñoz et al., 2020). This glycoprotein is composed of two functional subunits, S1 and S2. S1 contains a receptor‐binding domain (RBD) through which SARS‐Co‐2 binds to the ACE2 receptor, while S2 is responsible for virus‐host membrane fusion (Yang et al., 2020). Furthermore, the spike protein is the target for many vaccine candidates currently in development, and therefore, tracking the mutations in this protein (especially RBD region) is crucial (Dearlove et al., 2020 ). Monitoring the spike mutations in Iranian SARS‐CoV‐2 genomes revealed no novel mutations in this genomic region, while determined two commonly known mutations, T22I and D614G. Both of these variants are outside RBD region, suggesting no negative effect for efficacy of the future vaccines on the viral lineages currently circulating in Iran (Dearlove et al., 2020 ). While the T22I mutation is less frequent in other regions, the D614G variant is now the most prevalent mutation in COVID‐19 pandemic (Ahmadpour et al., 2020; Korber et al., 2020; Mercatelli and Giorgi, 2020). Recent studies suggested a fitness advantage for G614, rapidly making it the dominant form in each geographical location (Korber et al., 2020). A significant conformational change in spike protein may lead to more feasible virus‐host cell membrane fusion. As a consequence, increased infectivity, transmission and replication fitness is reported for G614 (Hu et al., 2020), but there are still some debates about its transmission effect (Grubaugh et al., 2020 , van Dorp et al., 2020). Hopefully, the mutation is not related to disease severity and does not reduce the effect of neutralizing antibodies (Korber et al., 2020; Fau et al., 2021.). Notably, in accordance with the increasing [B.1*/20A] clade among the viral isolates of the Iranian epidemic as of mid‐May, the D614G variant is dominating in this population. This can partly explain the accelerated transmission of the virus in recent months, although the negative effect of relaxed quarantine policies should not be overlooked. Due to the strong fitness of G614, we could also observe its co‐occurrence with common B.4 variants, while the mutation is known as always co‐occurring with variants defining the G clade (Mercatelli and Giorgi, 2020). In conclusion, genomic sequencing and phylogenetic analysis suggested that SARS‐CoV‐2 entered in very late 2019/early 2020 in Iran and circulated among vulnerable patients. The increase in frequency of D614G mutation and B.1* lineages from mid‐May onwards predicts a rapid viral transmission followed by considerable change in the composition of viral lineages circulating in the country.

CONFLICT OF INTEREST

The authors have no conflict of interest to declare.

ETHICS STATEMENT

The authors confirm that the ethical policies of the journal, as noted on the journal's author guidelines page, have been adhered to and the appropriate ethical review committee approval has been received (Institutional ethical approval number of IR.USWR.REC.1399.094). Table S1 Click here for additional data file. Table S2 Click here for additional data file.

27 in total

1. Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities.

Authors: Patrick D Schloss; Sarah L Westcott; Thomas Ryabin; Justine R Hall; Martin Hartmann; Emily B Hollister; Ryan A Lesniewski; Brian B Oakley; Donovan H Parks; Courtney J Robinson; Jason W Sahl; Blaz Stres; Gerhard G Thallinger; David J Van Horn; Carolyn F Weber
Journal: Appl Environ Microbiol Date: 2009-10-02 Impact factor: 4.792

2. Fast gapped-read alignment with Bowtie 2.

Authors: Ben Langmead; Steven L Salzberg
Journal: Nat Methods Date: 2012-03-04 Impact factor: 28.547

3. Rapid SARS-CoV-2 whole-genome sequencing and analysis for informed public health decision-making in the Netherlands.

Authors: Aura Timen; Marion Koopmans; Bas B Oude Munnink; David F Nieuwenhuijse; Mart Stein; Áine O'Toole; Manon Haverkate; Madelief Mollers; Sandra K Kamga; Claudia Schapendonk; Mark Pronk; Pascal Lexmond; Anne van der Linden; Theo Bestebroer; Irina Chestakova; Ronald J Overmars; Stefan van Nieuwkoop; Richard Molenkamp; Annemiek A van der Eijk; Corine GeurtsvanKessel; Harry Vennema; Adam Meijer; Andrew Rambaut; Jaap van Dissel; Reina S Sikkema
Journal: Nat Med Date: 2020-07-16 Impact factor: 53.440

4. Spike mutation D614G alters SARS-CoV-2 fitness.

Authors: Jessica A Plante; Yang Liu; Jianying Liu; Hongjie Xia; Bryan A Johnson; Kumari G Lokugamage; Xianwen Zhang; Antonio E Muruato; Jing Zou; Camila R Fontes-Garfias; Divya Mirchandani; Dionna Scharton; John P Bilello; Zhiqiang Ku; Zhiqiang An; Birte Kalveram; Alexander N Freiberg; Vineet D Menachery; Xuping Xie; Kenneth S Plante; Scott C Weaver; Pei-Yong Shi
Journal: Nature Date: 2020-10-26 Impact factor: 49.962

5. fastp: an ultra-fast all-in-one FASTQ preprocessor.

Authors: Shifu Chen; Yanqing Zhou; Yaru Chen; Jia Gu
Journal: Bioinformatics Date: 2018-09-01 Impact factor: 6.937

6. IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era.

Authors: Bui Quang Minh; Heiko A Schmidt; Olga Chernomor; Dominik Schrempf; Michael D Woodhams; Arndt von Haeseler; Robert Lanfear
Journal: Mol Biol Evol Date: 2020-05-01 Impact factor: 16.240

7. No evidence for increased transmissibility from recurrent mutations in SARS-CoV-2.

Authors: Lucy van Dorp; Damien Richard; Cedric C S Tan; Liam P Shaw; Mislav Acman; François Balloux
Journal: Nat Commun Date: 2020-11-25 Impact factor: 14.919

8. An emergent clade of SARS-CoV-2 linked to returned travellers from Iran.

Authors: John-Sebastian Eden; Rebecca Rockett; Ian Carter; Hossinur Rahman; Joep de Ligt; James Hadfield; Matthew Storey; Xiaoyun Ren; Rachel Tulloch; Kerri Basile; Jessica Wells; Roy Byun; Nicky Gilroy; Matthew V O'Sullivan; Vitali Sintchenko; Sharon C Chen; Susan Maddocks; Tania C Sorrell; Edward C Holmes; Dominic E Dwyer; Jen Kok
Journal: Virus Evol Date: 2020-04-10

9. Molecular interaction and inhibition of SARS-CoV-2 binding to the ACE2 receptor.

Authors: Jinsung Yang; Simon J L Petitjean; Melanie Koehler; Qingrong Zhang; Andra C Dumitru; Wenzhang Chen; Sylvie Derclaye; Stéphane P Vincent; Patrice Soumillion; David Alsteens
Journal: Nat Commun Date: 2020-09-11 Impact factor: 14.919

10. Tracking Changes in SARS-CoV-2 Spike: Evidence that D614G Increases Infectivity of the COVID-19 Virus.

Authors: Bette Korber; Will M Fischer; Sandrasegaram Gnanakaran; Hyejin Yoon; James Theiler; Werner Abfalterer; Nick Hengartner; Elena E Giorgi; Tanmoy Bhattacharya; Brian Foley; Kathryn M Hastie; Matthew D Parker; David G Partridge; Cariad M Evans; Timothy M Freeman; Thushan I de Silva; Charlene McDanal; Lautaro G Perez; Haili Tang; Alex Moon-Walker; Sean P Whelan; Celia C LaBranche; Erica O Saphire; David C Montefiori
Journal: Cell Date: 2020-07-03 Impact factor: 66.850

6 in total

1. A framework for reconstructing SARS-CoV-2 transmission dynamics using excess mortality data.

Authors: Mahan Ghafari; Oliver J Watson; Ariel Karlinsky; Luca Ferretti; Aris Katzourakis
Journal: Nat Commun Date: 2022-05-31 Impact factor: 17.694

2. SARS-COV-2 RBD (Receptor binding domain) mutations and variants (A sectional-analytical study).

Authors: Faezeh Hajizadeh; Sayyad Khanizadeh; Hamidreza Khodadadi; Yaser Mokhayeri; Mehdi Ajorloo; Asra Malekshahi; Ezatoallah Heydari
Journal: Microb Pathog Date: 2022-05-18 Impact factor: 3.848

3. SARS-CoV-2 outbreak in Iran: The dynamics of the epidemic and evidence on two independent introductions.

Authors: Zohreh Fattahi; Marzieh Mohseni; Khadijeh Jalalvand; Fatemeh Aghakhani Moghadam; Azam Ghaziasadi; Fatemeh Keshavarzi; Jila Yavarian; Ali Jafarpour; Seyedeh Elham Mortazavi; Fatemeh Ghodratpour; Hanieh Behravan; Mohammad Khazeni; Seyed Amir Momeni; Issa Jahanzad; Abdolvahab Moradi; Alijan Tabarraei; Sadegh Ali Azimi; Ebrahim Kord; Seyed Mohammad Hashemi-Shahri; Azarakhsh Azaran; Farid Yousefi; Zakiye Mokhames; Alireza Soleimani; Shokouh Ghafari; Masood Ziaee; Shahram Habibzadeh; Farhad Jeddi; Azar Hadadi; Alireza Abdollahi; Gholam Abbas Kaydani; Saber Soltani; Talat Mokhtari-Azad; Reza Najafipour; Reza Malekzadeh; Kimia Kahrizi; Seyed Mohammad Jazayeri; Hossein Najmabadi
Journal: Transbound Emerg Dis Date: 2021-05-22 Impact factor: 4.521

Review 4. Clinical Symptoms and Types of Samples Are Critical Factors for the Molecular Diagnosis of Symptomatic COVID-19 Patients: A Systematic Literature Review.

Authors: Milad Zandi; Abbas Farahani; Armin Zakeri; Sara Akhavan Rezayat; Ramin Mohammadi; Umashankar Das; Jonathan R Dimmock; Shervin Afzali; Mohammadvala Ashtar Nakhaei; Alireza Doroudi; Yousef Erfani; Saber Soltani
Journal: Int J Microbiol Date: 2021-09-06

5. Susceptibility and Severity of COVID-19 Are Both Associated With Lower Overall Viral-Peptide Binding Repertoire of HLA Class I Molecules, Especially in Younger People.

Authors: Hamid Reza Ghasemi Basir; Mohammad Mahdi Majzoobi; Samaneh Ebrahimi; Mina Noroozbeygi; Seyed Hamid Hashemi; Fariba Keramat; Mojgan Mamani; Peyman Eini; Saeed Alizadeh; Ghasem Solgi; Da Di
Journal: Front Immunol Date: 2022-07-07 Impact factor: 8.786

6. Genomic Epidemiology of SARS-CoV-2 Divulge B.1, B.1.36, and B.1.1.7 as the Most Dominant Lineages in First, Second, and Third Wave of SARS-CoV-2 Infections in Pakistan.

Authors: Atia Basheer; Imran Zahoor
Journal: Microorganisms Date: 2021-12-17

6 in total