Literature DB >> 35552003

Phylogenetic and amino acid signature analysis of the SARS-CoV-2s lineages circulating in Tunisia.

Mouna Ben Sassi¹, Sana Ferjani², Imen Mkada³, Marwa Arbi⁴, Mouna Safer⁵, Awatef Elmoussi⁶, Salma Abid⁶, Oussema Souiai⁴, Alya Gharbi⁷, Asma Tejouri⁸, Emna Gaies¹, Hanene Eljabri⁹, Samia Ayed¹⁰, Aicha Hechaichi⁵, Riadh Daghfous¹, Riadh Gouider⁷, Jalila Ben Khelil¹⁰, Maher Kharrat⁸, Imen Kacem⁷, Nissaf Ben Alya⁵, Alia Benkahla⁴, Sameh Trabelsi¹, Ilhem Boutiba-Ben Boubaker¹¹.

Abstract

Since the beginning of the Coronavirus disease-2019 pandemic, there has been a growing interest in exploring SARS-CoV-2 genetic variation to understand the origin and spread of the pandemic, improve diagnostic methods and develop the appropriate vaccines. The objective of this study was to identify the SARS-CoV-2s lineages circulating in Tunisia and to explore their amino acid signature in order to follow their genome dynamics. Whole genome sequencing and genetic analyses of fifty-eight SARS-CoV-2 samples collected during one-year between March 2020 and March 2021 from the National Influenza Center were performed using three sampling strategies.. Multiple lineage introductions were noted during the initial phase of the pandemic, including B.4, B.1.1, B.1.428.2, B.1.540 and B.1.1.189. Subsequently, lineages B1.160 (24.2%) and B1.177 (22.4%) were dominant throughout the year. The Alpha variant (B.1.1.7 lineage) was identified in February 2021 and firstly observed in the center of our country. In addition, A clear diversity of lineages was observed in the North of the country. A total of 335 mutations including 10 deletions were found. The SARS-CoV-2 proteins ORF1ab, Spike, ORF3a, and Nucleocapsid were observed as mutation hotspots with a mutation frequency exceeding 20%. The 2 most frequent mutations, D614G in S protein and P314L in Nsp12 appeared simultaneously and are often associated with increased viral infectivity. Interestingly, deletions in coding regions causing consequent deletions of amino acids and frame shifts were identified in NSP3, NSP6, S, E, ORF7a, ORF8 and N proteins. These findings contribute to define the COVID-19 outbreak in Tunisia. Despite the country's limited resources, surveillance of SARS-CoV-2 genomic variation should be continued to control the occurrence of new variants.

Entities: Chemical

Keywords: Amino acid change analysis; Amino acid signature; Coronavirus disease-2019; Lineages phylogenetic; SARS-CoV-2; Tunisia; Whole genome sequencing

Mesh：

Substances：

Year: 2022 PMID： 35552003 PMCID： PMC9085353 DOI： 10.1016/j.meegid.2022.105300

Source DB: PubMed Journal: Infect Genet Evol ISSN： 1567-1348 Impact factor: 4.393

Introduction

Coronavirus disease-2019 (COVID-19) caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a growing public health concern. In some people, produces an asymptomatic disease or mild symptoms disease that does not require particular medical care. However, in specific groups of patients, particularly the elderly and those with chronic health diseases, the infection progresses into severe respiratory distress, requiring hospitalization in intensive care units (Thielen et al., 2021). SARS-CoV-2 genome like other RNA viruses, shows a high mutation rates. Initially, the virus emerged from an animal reservoir in the city of Wuhan, China. Then, a human-to-human transmission with a rapid spread worldwide has been established (Chunyang et al., 2020). Over one year of COVID-19 pandemic, new SARS-CoV-2 genome mutations were constantly emerging and more than 4000 variants have been reported (Bian et al., 2021). Epidemiological and phylogenetic studies revealed that the significant increase in the rate of infection and/or death was correlated with the emergence of fourVariant of Concern (VOC) belonging to B.1.1.7, B.1.351, B.1.617.2 and B.1.1.28 lineages. On May 31, 2021, WHO published a new nomenclature proposal applying to VOC based on the Greek alphabet: Alpha, Beta, Delta and Gamma, respectively (Parums, 2021). During the initial stage of the pandemic, due to the lack of specific treatments that prevent or block viral replication, massive prevention strategies were applied by most countries. A wide difference in case fatality rates was observed, probably due to a diverse demographic composition and the type of measures that were taken in different countries to limit viral spread (Rader et al., 2021). Subsequently, following the widespread vaccination, SARS-CoV-2 infections and deaths declined and social and economic conditions improved relatively. Globally, 196.553.009 confirmed cases of COVID-19 have been identified, with 4.200.412 deaths as of 30 of July 2021(). In Tunisia, the first case of COVID-19 was identified on 3 March 2020. Preventive strategies were quickly put in place, in particular, lockdown and enhanced contact tracing around all positive cases (Chakroun et al., 2020). As of 25 May 2020, the cumulative number of confirmed cases of COVID-19 was 1051 corresponding to a cumulative incidence of 8.87/100,000 inhabitants and an average of daily incidence around 13 cases (Abid et al., 2020). In view of the critical socio-economic situation, the Tunisian authorities allowed the reduction of restrictions. Accordingly, the virus has continued to spread with alarming rates, recording 595.532 positive cases and 20.067 deaths on 30th July 2021 (). The low vaccine administration rate (11%) coupled with the emergence and wide circulation of the different Variants of Concern (VOC) had certainly played an important role in the evolution of the epidemiological situation in Tunisia (). The aim of this study was to identify the different SARS-CoV-2 introduction events in Tunisia, and to explore mutation profile to follow their genome dynamics through a collection of SARS-CoV-2 strains (n = 58) from National Influenza Center over one year of the COVID-19 pandemic (March 2020 to March 2021).

Materials and methods

Ethical statement

The PRFCOVID-GP3 project titled “SARS-CoV-2 genome Sequencing and study of host-pathogen molecular interactions in Tunisia: epidemiological, clinical and therapeutic impact” was approved by the medical ethics committee of Razi Hospital of Tunis. All procedures involving human participants were in accordance with the ethical standards of the Medical Ethics Committee of Razi Hospital and with the 1964 Helsinki declaration.

Sampling strategies

In total 90 SARS-CoV-2 strains resulting from three sampling strategies during one year of the COVID-19 pandemic were included in this study. Initially, a stratified random sampling (n = 32) was performed from February 5 to July 17, 2020 (first phase of the epidemic, well-controlled) according to the following criteria: super spread events (n = 1), extreme evolution (n = 3), death (n = 3), local infection (n = 12) and imported infection (n = 13) [France (n = 4), Italy (n = 1), England (n = 1), Egypt (n = 1), Switzerland (n = 1), Turkey (n = 4) and Spain (n = 1)]. Then from July 18 to December 23, 2020, a simple random sampling was applied from the list of samples with positive RT-PCR (n = 32). Finally, from December 24 to March 20, 2021, the sequencing indications were in accordance with the national sequencing strategy of SARS- CoV-2. It aimed to identify and monitor the VOCs emergence in Tunisia (n = 26). Thirty-two samples were excluded, due to high Ct value (n = 10), amplification failure during sequence processing (n = 17) or poor genomic coverage (<60%, n = 5). Thus, 58 samples were included in this study (Supplementary Data Table S1). None of the included samples were collected in May and June 2020, due to the absence of COVID-19 cases in Tunisia.

Viral RNA extraction and Real-time RT-PCR

All nasopharyngeal samples included in this study were collected at the National Influenza Center (NIC), also nominated as National Reference Lab for SARS-CoV-2 and other Respiratory Viruses and hosted at the Microbiology lab of Charles Nicolle Hospital of Tunis. RNA extraction was performed using the Chemagic™ automate and the viral RNA 300 Kit H96 (Perkin Elmer, Hamburg, Germany) according to the manufacturer's instructions. Also, manual extraction using the Qiagen Viral RNA Mini Kit (QIAGEN, California, USA) was used depending on the availability of reagents. SARS-CoV-2 was detected by the Hong Kong RT-PCR assay using AgPath-ID™ one-Step RT-PCR Reagents and ABI 7500 instrument (WHO, Laboratory and diagnosis, 2020). Which is a qualitative real time RT-PCR TaqMan method. According to this assay, a positive COVID-19 result was determined when both targets N and ORF1b-nsp14 reach a defined threshold below 0.2 and Ct value below 40.

Library preparation, sequencing, and read processing

All samples included in this study had a Ct value <30. WGS of SARS-CoV-2 strains were performed using the Illumina® RNA Prep with Enrichment with Illumina Respiratory Virus Oligos Panel at the National Center of Pharmacovigilance Chalbi Belkahia of Tunis. Initially, RNA was quantified using Spectrophotometer/FluorometerDeNovix® DS-11 series. Input amounts of Illumina® RNA Prep ranged from 10 to 100 ng of RNA. Tiling Polymerase Chain Reaction cDNA synthesis was performed using the IlluminacDNA Synthesis kit. The cDNA was quantified using dsDNA High Sensitivity DeNovix® kit. To tag the cDNA, Enrichment Bead-Linked Transposomes were used. After tagging, the fragments were purified and amplified to add index adapter sequences for dual indexing. Then, samples were enriched as single-plex reactions using the Respiratory Virus Oligos Panel v2 (Illumina, Catalog no. 20044311), which features ~7800 probes designed to detect respiratory viruses, recent flu strains, and SARS-CoV-2, as well as human probes to act as positive controls in each reaction. Libraries were run on an IlluminaiSeq100™. The prepared libraries were diluted to a final loading concentration of 100 pM, according to the iSeq 100 System denature and dilute Libraries Guide and sequenced on the iSeq 100 System at 2 × 150 bp read length. FASTQ sequencing data files were input to the DRAGEN RNA Pathogen Detection pipeline® and the ID by DNAExplify Respiratory Virus Oligos Panel Platform® for analysis and viral detection. These platforms were accessed in BaseSpace Sequence Hub.

Lineage assignment

Clades were assigned to SARS-CoV-2 genome sequences (n = 58) using the Phylogenetic Assignment of Named Global Outbreak LINeages tool (PANGOLIN), NextStrain and GISAID nomenclature systems. These are updated based on newly observed viral lineages. PANGOLIN available at https://cov-lineages.org, currently recognizes two lineages: A and B. Lineage A genomes are characterized by two unique mutations (8782C > T and 28,144 T > C), compared to lineage B. From these lineages, sub-lineages (e.g. A.1, A.2, A.3, …) and sub sub-lineages (e.g. A.1.1) are designated, each defined by additional mutations and specific epidemiological characteristics (Rambaut et al., 2020). Next Strain clade designations, (https://clades.nextstrain.org/), separate viruses that originated in China in 2019 (Clade 19) from those that subsequently introduced into Europe in early 2020 (Clade 20). There were two clades identified in 2019 (19A and 19B) and 9 more in 2020 (20A to 20I). Subclades within a major clade were designated by specific nucleotide mutations. GISAID (https://www.gisaid.org/) uses specific combinations of genetic markers. Currently eight clades are defined: S and L, to the further evolution of L into V and G, and later of G into GH, GR and GV, and more recently GR into GRY. The lineages assigned by the PANGOLIN nomenclature system were used to discuss viral diversity throughout this manuscript (updated on august 29, 2021).

Phylogenetic reconstructions and analysis

A phylogenetic tree was built from the 58 full-length Tunisian sequences and the reference NC_045512 sequence using approximate maximum-likelihood (ML) method of MEGA X software (Sneath and Sokal, 1973) based on 1000 bootstrap replicates.

Genomic analysis

The 58 SARS-CoV-2 genome sequences were aligned using Clustal W program (Larkin et al., 2007) implemented in MEGA X software (Kumar et al., 2018). Multiple alignments were manually edited by trimming the 5'and 3′ untranslated regions, removing gaps and low-quality sequences, and then visualized using MEGA X. In addition, Open Reading Frames (ORFs) were predicted and annotated following the annotation of the SARS-CoV-2 reference genome generated from the Wuhan-Hu-1 sequence (access number: NC_045512). Each genome was compared to the reference NC_045512, then, genomic variants were identified using Geneious software (Kearse et al., 2012). Frequencies of identified variants were calculated and plotted according to their position on NC_045512 using Graph Pad Prism v8 (Graph Pad Software, Inc. San Diego California, USA). Mutations with frequencies above 20% were considered as hot spots.

Results

Lineage analysis

A total of 15 different lineages were identified among the 58 SARS-CoV-2 genome sequences generated in this study. Most sequences belonged to lineage B1.160 (n = 14; 24.1%) and B1.177 (n = 13; 22.4%). Most health care workers included in this study (n = 8) carried B1.160 or B1.177 (SARS-CoV-2 lineages). A.27 (n = 6; 10.3%), B.1.1.7 (n = 6; 10.3%), B.1.1 (n = 4; 6.9%) and B.1.428.2 (n = 3; 5.2%) were the least frequent lineages. B.4, B.1.1.189 and B.1.540 were identified in tow patient each one (n = 2; 3.4%). B.55, B.1.9, B.1.177.6, B.1.333, B.1.356 and B.1.597 lineages were identified in one patient each (Fig. 1 ). The geographical distribution of SARS-CoV-2 lineages in Tunisia showed that B1.160 and B1.177 lineages circulated in most governorates. The Alpha variant B.1.1.7 has been identified mainly in Kasserine but also in Tunis and Ariana. The governorates of Tunis, Ben Arous, and Nabeul showed a higher diversity of lineage than the other governorates. A local cluster of the A.27 lineage was detected in Zaghouan and Sousse (Supplementary data Fig. S1).

Fig. 1

Distribution of the 58 Tunisian SARS-CoV-2 genetic lineages in Tunisia during the first year of the pandemic: March 2020 to March 2021. *No COVID-19 cases was detected in Tunisia during June 2020, ** According to our sampling strategy, no COVID-19 cases were included during August 2020.

SARS-CoV-2 phylogenetic and clades genetic characterization

A total of 29,424 positions were found in the final dataset. According to the Nextclade nomenclature, the 58 Tunisian isolates were dispersed in 7 different SARS-CoV-2 subclades including 19A, 19B, 20A, 20B, 20C, 20 E and 20I (Fig. 2 ). The first two sub-clusters 19A and 19B were clustered together from a single node C1. The second node C2 contained sequences belonging mainly to the 2020 clades (20A, 20B, 20C, 20 E and 20I). The similarity score between C1 and C2 ranged between 99.81% and 99.97% relative to the reference sequence. The two main nodes shared 4 non-synonymous mutations, one in ORF1ab gene (L3606F), and three in Spike protein (L18F, N501Y and K1191N) (Supplementary Data Tables S3–S4).

Fig. 2

Phylogenetic analysis of the 58 Tunisian SARS-CoV-2 genome sequences.

Phylogenetic analysis of 58 Tunisian SARS-CoV-2 sequences, compared with SARS-CoV-2 reference sequence of Wuhan*: NC_045512, inferred by Neighbor-Joining method. Branches are colored according to the Nexclade Clade Nomenclature. The evolutionary distances were computed using the Maximum Likely hood method.

Phylogenetic analysis of the 58 Tunisian SARS-CoV-2 genome sequences. Phylogenetic analysis of 58 Tunisian SARS-CoV-2 sequences, compared with SARS-CoV-2 reference sequence of Wuhan*: NC_045512, inferred by Neighbor-Joining method. Branches are colored according to the Nexclade Clade Nomenclature. The evolutionary distances were computed using the Maximum Likely hood method. The 19Asubclade gathered the Wuhan reference sequence with sequences “6736” and “7899” collected in the first pandemic period (March and April 2020). These sequences showed an identity score of 97.97% when compared to the NC_045512. The difference between these 2 and the reference sequence was in 8 locations affecting the nonstructural proteins NSP2 (n = 2), NSP4 (n = 2), and NSP6 (n = 1), and the structural N-Protein (n = 3). Among these, 5 caused changes in the protein sequences of ORF1a (V378I, G3072C, and L3606F) and N (M1I and S188P) (Table 1 , Supplementary Data Tables S3–S4). The subclade 19B, groups strains isolated between February and March 2021 and has a similarity score ranging from 99.81% to 99.87% to the Wuhan reference. The variability profile of 19B subclade sequences was also characterized by the emergence of mutation sets affecting ORF1ab (n = 12), S-Protein (n = 13), ORF3a (n = 2), ORF8 (n = 5) and N-Protein (n = 2). The 6 nucleotide deletions in ORF8 were responsible for 2 amino acid deletions “D119” and “F120”.

Table 1

Non-synonymous mutations among SARS-CoV-2 clades from 58 Tunisian samples.

		20B (N = 6)	20I (Alpha) (N = 6)	20C (N = 6)	20 A (N = 17)	20 E (N = 14)	19 B (N = 7)	19 A (N = 2)
ORF1a	NSP1				E93K¹, R124C¹, G192D¹
	NSP2	G392C¹, E489D¹	L730F⁴	T265I⁶, S318L¹ T346I³, H388Y³ A482V¹	K292R¹, E342G¹, H1141Y¹, M1312I¹	D194N¹	P286L⁶	V378I²
	NSP3	A591V¹	T1001I⁶	P1596L³, T1908I¹, T2154I¹, S2535L¹, A2690V³	P1596L¹, K1895N¹, L2688F¹, M3087I¹⁴, L3201I¹⁰	V559M¹, K1247N¹², S1515F¹, P1659S¹, P1803S²
	NSP4	P1158S¹	A1708D⁶	T3058I¹	T3284I¹, K3353R¹		D2980G⁶, T3082I⁴	G3072C²
	NSP5	I1232V¹, P3359S²	I2230T⁶		S3384L¹, L3711F¹, P4223L¹	P2018S¹	S3386F¹
	NSP6	C2210F², T3716A¹	S3675-⁶, G3676-⁶, F3677-⁶			A2345V¹, A3209V¹, A3497V¹	N3651S⁶	L3606F²
	NSP7	S2500F² S3884L¹				L3606F²
	NSP8					A3623S¹²
	NSP9						P4197S¹
	NSP10			T4304I¹
ORF1b	NSP12	P314L⁶	T132I¹, P314L⁶	P314L⁶	A176S¹⁴, P314L¹⁷, T730I¹, V767L¹⁴, S904L¹	D275Y², P314L¹⁴
	NSP13		K1383R²	M1156I¹ M1499I¹	P976L¹, K1141R¹⁴, E1184D¹⁴, M1352I², S1408L¹	P1001S¹, Y1229C¹	P1000L⁶
	NSP14	T1545A¹	E1871G²		T1540I¹, T1555I¹R1737L¹, T1747N¹, V1840F¹, T2040I¹
	NSP15			M2269I¹	D2179Y¹, Q2247H², E2253G¹, D2263N¹	P2313S¹
	NSP16	Q2635H²				A2559S¹
S-Protein		D111N¹, D614G⁶ A684V², I770V¹ A892S¹	V6A², H69⁵, V70⁵, D138H¹, Y144⁶, N501Y⁶, A570D⁶, D614G⁶, P681H⁶, T716I⁶, S982A⁶, D1118H⁶	D614G⁶, E780Q¹	L5F¹, V120I¹, L176F¹, I233V², G261S¹, S477N¹⁴, D614G¹⁷, D627E¹, S640F¹, D936Y¹, A1020S¹, H1101Y¹	L5F², L18F¹, Q23H¹, A222V¹⁴, T572I¹, D614G¹⁴, G932C¹, H1101Y¹, K1191N³	L18F⁶, V227A⁴, L452R⁶, N501Y⁶, A653V⁵, H655Y⁶, Q677H¹, D796Y⁶, K1191N⁵, G1219V⁶
ORF3a		F230V¹	L140F¹, G174D³	Q57H⁶, A99V¹, G224C¹	Q57H¹⁵, V97I¹, T223I¹	A39T¹, D155Y¹, L101F¹, W131C¹, T223I²	V50A¹
M-Protein					H125Y¹	V70F¹
ORF7a		F6S¹		T14I¹	T120I¹	I10L¹
ORF8		S21N²	Q27* ⁶, R52I⁶, Y73C⁵		H28Y¹, A51S¹, Q72L¹, C83F¹	V5I¹, E64*¹	D119-⁶, F120-⁶, A65S⁴, L84S⁶
ORF9b						S6I¹, Q18H¹
N-Protein		R203K⁶, G204L¹, G204R⁵, 5G321F² T325I¹	D3L⁵, R203K⁶, G204R⁶, S235F⁶	S186Y¹, T205I³, D348H¹	K80E³, G19R¹, Q83R³, M234I¹⁴, A376T¹⁴, Q384H²	Q9H¹, D22Y¹⁴, A220V¹, G236C¹Q418L¹	S202N⁶	M1I², S188P²

Superscript number: number of isolates that harboured mutation; - : deletion; *: stop codon; E: envelope protein; M: membrane glycoprotein; N: nucleocapsid phosphoprotein; ORF: open reading frame; S: spike glycoprotein.

Non-synonymous mutations among SARS-CoV-2 clades from 58 Tunisian samples. Superscript number: number of isolates that harboured mutation; - : deletion; *: stop codon; E: envelope protein; M: membrane glycoprotein; N: nucleocapsid phosphoprotein; ORF: open reading frame; S: spike glycoprotein. The second major node C2 revealed that the Tunisian SARS-CoV-2 sequences were different from the reference sequence and were split into 5 clades 20B, 20I, 20C, 20A and 20E (Fig. 2), all sharing the spike mutations D614G and NSP12-RdRp mutation P314L (Table 1, Supplementary Data Tables S3–S4). 20A and 20E represented the 2 main subclades (in purple and green) (Fig. 2) bringing together sequences of SARS-CoV-2 viruses isolated at different times and locations in Tunisia. Two mutations, T223I and H1101Y/L5F in ORF3a and S-Protein, respectively, were shared between these subclades. Moreover, in February 2021, the clade 20I (Alpha, V1) emerged (red cluster). This clade shares mutations in the N-Protein (R203K andG204R) with clade 20B. A unique mutation in the clades that constitute their genetic signature was also observed (Table 1, Supplementary Data Tables S3–S4). Mainly in 20I (Alpha, V1) with 18, 19B with 14, 19A with 7, 20E with 6, 20A with 4, 20C with 1 and 20B with none.

SARS-CoV-2 genomic characterization

The multiple sequence alignment of the fifty-eight Tunisian sequences according to the Wuhan-Hu-1 reference sequence (NC_045512) revealed a total of 335 mutation events including 325 single nucleotide polymorphisms (SNPs) and 10 deletions. Among the 325 SNPs, 239 transitions and 86 transversions were observed. These variations represented 134 synonymous and 191 non-synonymous changes (Table 1, Supplementary Data Table S4). Among all amino acid changes, 62 were found in structural proteins where 38 were observed in the spike (S) glycoprotein; 22 in the nucleocapsid (N) and only 2 in the membrane protein (M). No changes were seen in the envelope protein (E). Ninety-six additional amino acid changes were identified in non-structural proteins (NSPs 1–16 in ORF1ab), and 33 in accessory protein genes such as ORF3a (n = 13), ORF7a (n = 4), ORF8 (n = 13) and ORF9b (n = 3) (Table 1, Supplementary Data Tables S3–S4). Deletions were observed at 10 sites and were identified in the genomic sequences of NSP3, NSP 6, protein S, protein E, ORF7a, ORF8 and protein N. Seven genomic deletions caused consequently the deletion of amino acids and six others caused frameshifts (Table 2 ).

Table 2

Deletion characteristics: 58 Tunisian SARS-CoV-2 whole genome sequences.

Deletions	Number of nucleotides	Position (bp)	AA change	Frameshift	Corresponding Protein	Clades	Number of affected sequences	Sequences reference
Deletion 1	7	6833	I-K	Yes	Nsp3	20 E	1	55,400
Deletion 2	9	11,288	S-G-F	No	Nsp6	20I (Alpha. V1)	6	Q8734/18267/19152/18506/18507/18915
Deletion 3	6	21,765	H-H	No	Protein S	20I (Alpha. V1)	6	Q8734/18267/19152/18506/18507/18915
Deletion 4	3	21,992	Y	No	Protein S	20I (Alpha. V1)	6	Q8734/18267/19152/18506/18507/18915
Deletion 5	4	26,158	V	Yes	Protein E	20B	1	55,304
Deletion 6	8	26,161	N-P	Yes	Protein E	19B	6	G6590/19153/G6575bis/5509/4409/14670
Deletion 7	1	27,293	–	Yes	Orf7a	20A	1	55,319
Deletion 8	1	27,388	–	Yes	Orf7a	19B	6	G6590/19153/G6575bis/5509/4409/14670
Deletion 9	6	28,248	D-F	No	Orf8	19B	6	G6590/19153/G6575bis/5509/4409/14670
Deletion 10	1	28,271	–	Yes	Protein N	20I (Alpha. V1)	5	18,267/19152/18506/18507/18915

Bp: Base pairs; AA: Amino Acids; Nsp: Non structural protein; Protein S: Spike glycoprotein; Protein E: Envelopeprotein; Protein N: Nucleocapsid protein.

Deletion characteristics: 58 Tunisian SARS-CoV-2 whole genome sequences. Bp: Base pairs; AA: Amino Acids; Nsp: Non structural protein; Protein S: Spike glycoprotein; Protein E: Envelopeprotein; Protein N: Nucleocapsid protein.

Hyper-variable genomic hotspots

Interestingly, among the 191 non synonymous mutations, 17 were found as hotspots with more than 20% of mutation frequency in SARS-CoV-2 genomes derived from Tunisian patients. Eight out of them were found in ORF1ab, four in the spike glycoprotein (S), four in the nucleocapsid and one in accessory protein of ORF3a.In addition, some variants which presented the P314L in ORF1b-non-structural protein RNA-dependent RNA polymerase (RdRp), the D614G in Spike and the Q57H in ORF3a mutations, have a frequency that exceeded 30% in Tunisian sequences (Fig. 3 ).

Fig. 3

Genomic variation frequency of Tunisian SARS-CoV-2 sequences (n = 58). Genomic variants were identified by referring to the first diagnosed NC_045512 Wuhan variant using MEGA X (Kumar et al., 2018). The locations and the mutations frequencies of the variants were plotted along genomic sequence of NC_045512. The open reading frames (ORFs) of SARS-CoV-2 were shown as rectangles that were aligned with nucleotide positions of the coronavirus. The frequency of each mutation in the population is presented by color coded circles. Abbreviations: ORF: Open Reading Frame; E: Envelope; M: Membrane protein; N: Nucleocapsid protein.

Discussion

Face to the unusual SARS-CoV-2 pandemic, our country has increased its genomic capacity to track the rapid evolution of the virus. A national strategy which includes federated multi-disciplinary research projects (PRF) has been implemented during the early pandemic stage. The participation of the National Observatory of New and Emerging Diseases was to define sampling strategies during the different waves of pandemic and to capture strains with particular priority for sequencing. The first strain introduced in Tunisia was completely sequenced in the framework of this national project. The implemented national strategy also allowed the detection of different VOCs as soon as they were introduced in the country (https://www.gisaid.org/). Our data revealed that the Tunisian SARS-COV-2 pandemic started with multiple introduction events. Indeed, among the 14 different lineages generated in this study six were identified during the early phase including B.1, B.1.1, B.1.1.189, B.1.428.2, B.1.597 and B4. During this period the number of positive cases in Tunisia had reached 75 with 3 deaths. Preventive strategies were implemented, particularly, lockdown during 75 days and enhancement of contact tracing around all positive cases (NONED, 28 March, 2020). Since then, (Jun, 2020), the country recorded zero new cases for 15 consecutive days. These findings support the efficacity of the implemented strategies to contain the spread of SARS-CoV-2 in the country. Following the opening of the frontiers on Jun 27, 2020, the epidemic situation in Tunisia changed with diverse new introduction of SARS-COV-2 lineages. As the majority of European countries B.1.160, B.1.177 and B.1.1.7 were the predominant lineages during the second and the third waives (Hodcroft et al., 2021). Our data revealed a clear difference between the North of the country (Tunis, Nabeul, Ben Arous and Ariana) where diverse lineages circulated and the South (Gafsa, Djerba, Sidi Bouzid) where only one lineage was exclusively present. Of note, Tunis, the most important urban area of Tunisia, is the only governorate where the majority of lineages (9/14) circulated. Given the central situation of Tunis and its intense economical the simultaneous circulation of the different lineages in this area would seem plausible. Our results identified 17 hyper-variable genomic hotspots with frequencies above 20%. These hotspots and other mutations constitute genetic signatures that have allowed us to conduct the classification of the sequenced SARS-CoV-2 viruses. Among the 11 described clades (Rambaut et al., 2020), only 7 were identified in our sequence analysis. Sequence analysis allowed identifying 239 transitions consisting in an interchange between purine (A, G) or pyrimidine (C, T) bases and 86 transversions consisting in an interchange of purine into pyrimidine bases involving structural change. Many SARS-CoV-2 studies observed the predominance of transition type particularly when C is replaced by T (Koyama et al., 2020; Wang et al., 2020). The occurrence of such change is likely to reflect the virus adaptation process in its hosts (Matyášek and Kovařík, 2020). Among the191 mutations/aa changes, the 2 most frequent mutations, D614G in S protein and P314L in Nsp12 appeared simultaneously and are often associated with increased viral infectivity (Korber et al., 2020). They were shared by all classified clades except 19A and 19B clades explaining the likely occurrence during the transition between the first cluster reported in Wuhan and the subsequent clusters that spread globally. Moreover, the high frequency of these mutations shows that they probably improved the fitness of the virus (Plante et al., 2021) by facilitating its functional cooperation and maintaining the stability of the virus, thus increasing the transmission rate and improving the adaptability of the virus (Laha et al., 2020). The third most frequent mutation Q57H (36.2%) in ORF3a causes the appearance of an early stop codon resulting in a truncated form of the accessory protein ORF3b (13 amino acid instead of 57). The ORF3b-Q57H-SARS-CoV-2 variant was found to be responsible for the fourth epidemic wave of COVID-19 in Hong Kong (Chu et al., 2021). Indeed, ORF3b protein is an interferon antagonist, and its functional failure might decrease its virulence and increase its transmissibility (Lam et al., 2020; Chu et al., 2021) demonstrated that variants carrying this mutation could evade induction of cytokine, chemokine, and interferon-stimulated gene expression in primary human respiratory cells. This mutation is found in an increasing number of sequences and coincided with the appearance of the D614G mutation. A hypothesis was raised on the link between the two mutations acting simultaneously. Other less frequent hotspots were often associated with specific clades like R203K mutation in M protein found in clades 20B and 20I which was almost always associated with G204R and have been previously correlated with enhanced virulence (Wu et al., 2021). A222V mutation in S protein found in area defined as possible immune cells epitope were described elsewhere in 20 E clade (Bao-zhong Zhang et al., 2020). Moreover, six relatively recent sequences (collected in February and March 2021) corresponded to 19B clade which appeared in early 2020, was expected to disappear over time (Murall et al., 2021), but which frequency increased worldwide probably due to convergent mutations affecting S protein including D614G (Volz et al., 2021). In our sequences we did not observe any D614G mutation in this clade, but rather a combination of L18F, N501Y, L452R, H655Y, D796Y and G1219V mutations in S protein. These later, (L18F, N501Y, L452R and H655Y) were previously described in France (Fourati et al., 2021) and were suggested to improve the interaction of S protein with ACE2 viral receptor (Weisblum et al., 2020) and conducting to increased resistance to neutralizing antibodies (Choi et al., 2020; Baum et al., 2020). Moreover, 3 deletion sites causing 8 nucleotides missing in protein E, a frameshift in ORF7a and 6 nucleotides missing in ORF8 were also found in all 19 B sequences. Protein E is involved in several aspects such as assembly, budding, envelope formation and pathogenesis (Schoeman and Fielding, 2019; Chai et al., 2021). Accessories proteins, ORF7a and ORF8, interfere with host protein and mediate immune response. Their function alteration may influence the virulence (Tse et al., 2021 and Zhang et al., 2020). Any such event resulting in the relative re-emergence of a clade should be closely monitored in the future. Notably, 3 deletions causing frameshift in NSP3, protein E and ORF7a were affecting only one sequence and will be verified with Sanger sequencing. Likewise, it would be interesting to study in vitro and in vivo the functional consequences of those deletions on the virus if confirmed. The rate of mutations causes genome dynamics, flexibility, plasticity and variability, all leading to virus evolution, and the spread of polymorphic variants. Most mutations that were responsible for variants classification occurred in S protein, but several other emerging mutations were observed in both coding and non-coding sequences of other genes especially NSP12, NSP9 and N proteins (Khailany et al., 2020). In this study, we identified and characterized the Alpha variant (clade 20I) or B.1.1.7 by its characteristic mutation and deletion affecting the S protein (69 del, 70 del, 144 del, N501Y, A570D, D614G, P681H, T716I, S982A and D1118H) (Meng et al., 2021). The deletion 69–70 probably alters the N-terminal domain loop conformation in the S protein and could be associated with increased infectivity (Kemp et al., 2020). In addition, one previously non described one nucleotide deletion in N protein was also found in five of the six alpha variant sequences. This deletion might involve a frameshift in the sequence. According to the Center for Disease Control and Prevention, alpha variant is categorized among the variants of concern (VOC) which contains variants presenting diagnostic interferences and increased resistance to therapy or vaccination in addition to evidence of increased transmissibility and disease severity. This variant first appeared in Tunisia on 2 March 2021 and its highly accelerated transmission (Davies et al., 2021) allowed it to be in the first rank during the third wave of COVID-19 pandemic in spring 2021 Tunisia. It affected many cases and put health facilities under strain. It is important to notice that the mutation K1191N observed in the alpha variant in the USA (Washington et al., 2021) was also identified in our 20E and 19B clades. As seen, the appearance of some mutations and/or variants can be a major event in the epidemic evolution and the spread of genetically polymorphic variants. Continuing to sequence new variants as soon as they appear could therefore be useful, in monitoring and managing the epidemy, as well as treatment and vaccine development.

Data availability

Genome sequences generated in this study were deposited in the GISAID (https://www.gisaid.org) and GenBank (https://www.ncbi.nlm.nih.gov) databases. Accession IDs are available in Supplementary Table Data S2.

Funding

This work was supported by the Tunisian Ministry of High Education and Research: Federated Research Projects: PRFCOVID-GP3.

Sequence data

Sequence data from this article have been deposited in GISAID (https://www.gisaid.org) and GenBank (https://www.ncbi.nlm.nih.gov) databases.

CRediT authorship contribution statement

Mouna Ben Sassi: Investigation, Resources, Data curation, Writing – original draft, Validation. Sana Ferjani: Conceptualization, Investigation, Resources, Formal analysis, Data curation, Visualization, Writing – original draft, Validation. Imen Mkada: Conceptualization, Resources, Data curation, Formal analysis, Writing – original draft, Visualization, Validation. Marwa Arbi: Conceptualization, Resources, Data curation, Formal analysis, Writing – original draft, Visualization, Validation. Mouna Safer: Methodology, Formal analysis, Validation. Awatef Elmoussi: Investigation, Resources, Validation. Salma Abid: Investigation, Resources, Validation. Oussema Souiai: Conceptualization, Formal analysis, Resources, Data curation, Writing – review & editing, Validation. Alya Gharbi: Resources, Validation. Asma Tejouri: Validation. Emna Gaies: Validation. Hanene Eljabri: Validation. Samia Ayed: Resources, Validation. Aicha Hechaichi: Validation. Riadh Daghfous: Validation. Riadh Gouider: Project administration, Supervision, Validation. Jalila Ben Khelil: Supervision, Validation. Maher Kharrat: Supervision, Validation. Imen Kacem: Resources, Validation. Nissaf Ben Alya: Methodology, Validation. Alia Benkahla: Conceptualization, Methodology, Investigation, Resources, Data curation, Writing – review & editing, Supervision, Validation. Sameh Trabelsi: Supervision, Validation. Ilhem Boutiba-Ben Boubaker: Conceptualization, Methodology, Investigation, Resources, Data curation, Writing – review & editing, Supervision, Project administration, Validation.

Declaration of Competing Interest

The authors declare that there is no competing interest.

35 in total

1. Clustal W and Clustal X version 2.0.

Authors: M A Larkin; G Blackshields; N P Brown; R Chenna; P A McGettigan; H McWilliam; F Valentin; I M Wallace; A Wilm; R Lopez; J D Thompson; T J Gibson; D G Higgins
Journal: Bioinformatics Date: 2007-09-10 Impact factor: 6.937

2. MEGA X: Molecular Evolutionary Genetics Analysis across Computing Platforms.

Authors: Sudhir Kumar; Glen Stecher; Michael Li; Christina Knyaz; Koichiro Tamura
Journal: Mol Biol Evol Date: 2018-06-01 Impact factor: 16.240

3. Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data.

Authors: Matthew Kearse; Richard Moir; Amy Wilson; Steven Stones-Havas; Matthew Cheung; Shane Sturrock; Simon Buxton; Alex Cooper; Sidney Markowitz; Chris Duran; Tobias Thierer; Bruce Ashton; Peter Meintjes; Alexei Drummond
Journal: Bioinformatics Date: 2012-04-27 Impact factor: 6.937

4. Mining of epitopes on spike protein of SARS-CoV-2 from COVID-19 patients.

Authors: Bao-Zhong Zhang; Ye-Fan Hu; Lin-Lei Chen; Thomas Yau; Yi-Gang Tong; Jing-Chu Hu; Jian-Piao Cai; Kwok-Hung Chan; Ying Dou; Jian Deng; Xiao-Lei Wang; Ivan Fan-Ngai Hung; Kelvin Kai-Wang To; Kwok Yung Yuen; Jian-Dong Huang
Journal: Cell Res Date: 2020-07-01 Impact factor: 25.617

5. Evaluating the Effects of SARS-CoV-2 Spike Mutation D614G on Transmissibility and Pathogenicity.

Authors: Erik Volz; Verity Hill; John T McCrone; Anna Price; David Jorgensen; Áine O'Toole; Joel Southgate; Robert Johnson; Ben Jackson; Fabricia F Nascimento; Sara M Rey; Samuel M Nicholls; Rachel M Colquhoun; Ana da Silva Filipe; James Shepherd; David J Pascall; Rajiv Shah; Natasha Jesudason; Kathy Li; Ruth Jarrett; Nicole Pacchiarini; Matthew Bull; Lily Geidelberg; Igor Siveroni; Ian Goodfellow; Nicholas J Loman; Oliver G Pybus; David L Robertson; Emma C Thomson; Andrew Rambaut; Thomas R Connor
Journal: Cell Date: 2020-11-19 Impact factor: 41.582

6. Novel SARS-CoV-2 Variant Derived from Clade 19B, France.

Authors: Slim Fourati; Jean-Winoc Decousser; Souraya Khouider; Melissa N'Debi; Vanessa Demontant; Elisabeth Trawinski; Aurélie Gourgeon; Christine Gangloff; Grégory Destras; Antonin Bal; Laurence Josset; Alexandre Soulier; Yannick Costa; Guillaume Gricourt; Bruno Lina; Raphaël Lepeule; Jean-Michel Pawlotsky; Christophe Rodriguez
Journal: Emerg Infect Dis Date: 2021-05 Impact factor: 6.883

7. Effects of SARS-CoV-2 variants on vaccine efficacy and response strategies.

Authors: Lianlian Bian; Fan Gao; Jialu Zhang; Qian He; Qunying Mao; Miao Xu; Zhenglun Liang
Journal: Expert Rev Vaccines Date: 2021-04-14 Impact factor: 5.217

8. Editorial: Revised World Health Organization (WHO) Terminology for Variants of Concern and Variants of Interest of SARS-CoV-2.

Authors: Dinah Parums
Journal: Med Sci Monit Date: 2021-06-21

9. Assessment of sample pooling for SARS-CoV-2 molecular testing for screening of asymptomatic persons in Tunisia.

Authors: Salma Abid; Sana Ferjani; Awatef El Moussi; Asma Ferjani; Mejda Nasr; Ichrak Landolsi; Karima Saidi; Hanène Gharbi; Hajer Letaief; Aicha Hechaichi; Mouna Safer; Nissaf Bouafif Ep Ben Alaya; Ilhem Boutiba-Ben Boubaker
Journal: Diagn Microbiol Infect Dis Date: 2020-07-05 Impact factor: 2.803

10. Effects of SARS-CoV-2 mutations on protein structures and intraviral protein-protein interactions.

Authors: Siqi Wu; Chang Tian; Panpan Liu; Dongjie Guo; Wei Zheng; Xiaoqiang Huang; Yang Zhang; Lijun Liu
Journal: J Med Virol Date: 2020-11-01 Impact factor: 20.693