Literature DB >> 33748571

SARS-CoV-2 Genome from the Khyber Pakhtunkhwa Province of Pakistan.

Muhammad Tahir Khan^1,2,3, Sajid Ali⁴, Anwar Sheed Khan⁵, Noor Muhammad⁵, Faiza Khalil^6,7, Muhammad Ishfaq⁸, Muhammad Irfan⁹, Abdullah G Al-Sehemi^10,11, Shabbir Muhammad^10,11,2,3, Arif Malik¹, Taj Ali Khan¹², Dong Qing Wei^2,3.

Abstract

Among viral outbreaks, the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is one of the deadliest ones, and it has triggered the global COVID-19 pandemic. In Pakistan, until 5th September 2020, a total of 6342 deaths have been reported, of which 1255 were from the Khyber Pakhtunkhwa (KPK) province. To understand the disease progression and control and also to produce vaccines and therapeutic efforts, whole genome sequence analysis is important. In the current investigation, we sequenced a single sample of SARS-CoV-2 genomes (accession no. MT879619) from a male suspect from Peshawar, the KPK capital city, during the first wave of infection. The local SARS-CoV-2 strain shows some unique characteristics compared to neighboring Iranian and Chinese isolates in phylogenetic tree and mutations. The circulating strains of SARS-CoV-2 represent an intermediate evolution from China and Iran. Furthermore, eight complete whole genome sequences, including the current Pakistani isolates which have been submitted to Global Initiative on Sharing All Influenza Data (GSAID), were also investigated for specific mutations and characters. Some novel mutations [NSP2 (D268del), NSP5 (N228K), and NS3 (F105S)] and specific characters have been detected in the coding regions, which may affect viral transmission, epidemiology, and disease severity. The computational modeling revealed that a majority of these mutations may have a stabilizing effect on the viral protein structure. In conclusion, the genome sequencing of local strains is important for better understanding the pathogenicity, immunogenicity, and epidemiology of causative agents.

Entities: Chemical Disease Gene Mutation Species

Year: 2021 PMID： 33748571 PMCID： PMC7944396 DOI： 10.1021/acsomega.0c05163

Source DB: PubMed Journal: ACS Omega ISSN： 2470-1343

Introduction

Severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) is a positive RNA virus (+ssRNA) with a single-stranded genome of the betacoronavirus family, which also includes MERS-CoV and SARS-CoV. Among the RNA viruses, coronaviruses possess the largest genome (30 kb) that contains structural and accessory genes, ample replicas, and other nonstructural proteins (NSPs).[1−4] The pandemic appears to be spreading worldwide through human-to-human transmission.[5] According to the World Health Organization report of 3rd December 2020, globally there have been 63,965,092 confirmed cases and 1,488,120 deaths. In a recent study,[6] the SARS-Cov-2 trajectory prediction was carried out, and it was concluded that during the beginning of the pandemic, Wuhan performed a more active and effective nonpharmaceutical intervention. The ORF1a/b, the largest region (two-third) of the SARS-CoV-2 genome, is translated into two large polyproteins, pp1a and pp1ab (NSP1-NSP16).[7] The matrix (M), nucleocapsid phosphoprotein (N), envelope (E), and spike (S) are structural proteins, functioning along with RNA and NSP1-16, to facilitate the replication of the virus within the host cell. The complete virion includes the structural proteins in combination with the RNA genome. The processing of pp1a and pp1ab resulted in 16 viral NSPs,[8] which were assembled into replication and transcription complexes. These complexes are involved in multiple functions, ranging from replication to the processing of polypeptides.[9−17] The SARS-CoV-2 genome continuously undergoes significant variation in NSPs particularly, S protein, NSP3, and RNA-dependent RNA polymerase (RdRp).[18,19] The S protein is the key factor of evolution, pathogenicity, and transmission that might be important in vaccine development. Moreover, RdRp might be an important target for designing antivirals against SARS-CoV-2.[19] In a recent study, 12,706 mutations were detected within 46,723 data sets of SARS-CoV-2 genomic isolates from worldwide patients, among which 398 were strongly supported recurrent mutations.[20] A virus’s sequence mutation rate is driven by multiple factors, including selective pressure, and processes such as 3′-exonuclease activity, replicative repairing, and so on[21,22] play an important role in evolution. During a pandemic, the whole genome sequence analysis is an important strategy to monitor the disease progression, control, and therapeutic efforts.[23] Many important proteins harbor mutations that could possibly make them poor drug targets. Mutations also play a key role in the transmission and pathogenicity of the virus. Therefore, before the development of drugs and diagnostic tools, constant molecular scrutiny should be performed for the better management of COVID-19. In the current study, a single genome of the SARS-CoV-2 from a male suspect was sequenced to analyze the common mutations along with seven other whole genome sequences from other locations in Pakistan, which were retrieved from GISAID.[24] The genomes harbor some novel mutations and characters that could be useful in the analysis of the COVID-19 epidemiology and severity.

Results and Discussion

The sequence of SARS-CoV-2 (SARS-CoV-2-KPK-KUST-SJTU/2020) has been submitted to GenBank under accession number MT879619. The statistics of the sequence are shown in Table . The sequencing coverage was found to be 99.97% with 367,235 reads.

Table 1

Statistics of the Whole Genome Sequence of SARS-CoV-2

statistics type	number
aligned bases	45,417,209
aligned reads number	367,235
coverage %	99.97
duplication rate %	31.4093
indel rate %	0.0055
mean read length	125.6
mismatch rate %	0.24
sequencing depth	1438.63
variants number	15

A total of fifteen mutations were detected, among which eight were nonsynonymous. Four of these were detected in the orf1ab, one in each of S, ORF3a, N, and orf10, respectively (Table ). The mutation L3606F in orf1ab was also reported in Japanese isolate (accession no. LC528232), while 2702Q > H and 5561A > T seem novel in Pakistani isolates. Pro4715Leu has been detected in NSP12/RdRP (orf1ab).

Table 2

Mutations Detected in SARS-CoV-2 Whole Genome of the KPK Isolate

s.no.	position	Ref:a	Alt:a	gene	variant type	protein position	codon position
1	241	C	T	START:5′UTR	upstream	QHD43415.1	gene-orf1ab
2	2416	C	T	orf1ab	synonymous	717Y > Y	2151TAC > TAT
3	3037	C	T	orf1ab	synonymous	924F > F	2772TTC > TTT
4	8371	G	T	orf1ab	missense	2702Q > Hb	8106CAG > CAT
5	9208	T	C	orf1ab	synonymous	2981S > S	8943TCT > TCC
6	10741	C	T	orf1ab	synonymous	3492D > D	10476GAC > GAT
7	11083	G	T	orf1ab	missense	3606L > F (L37F on NSP6)	10818TTG > TTT
8	12565	G	A	orf1ab	synonymous	4100Q > Q	12300CAG > CAA
9	14408	C	T	orf1ab	missense	4715P > L	14144CCT > CTT
10	16945	G	A	orf1ab	missense	5561A > T	16681GCA > ACA
11	22477	C	T	S	synonymous	305S > S	915TCC > TCT
12	23403	A	G	S	missense	614D > G	1841GAT > GGT
13	25563	G	T	orf3a	missense	57Q > H	171CAG > CAT
14	29253	C	T	N	missense	327S > L	980TCG > TTG
15	29645	G	T	orf10	missense	30V > L	88GTA > TTA

Ref: Reference, Alt: Alteration.

Novel.

Ref: Reference, Alt: Alteration. Novel. The variant D614G in S protein detected in the current study is more commonly present in European isolates, such as those from Spain, Belgium, France, Italy, Switzerland, and the Netherlands, and appears more severe and fatal, accounting for a huge death toll (https://www.worldometers.info/coronavirus/);[25] Germany, Kuwait, and Pakistan have the wild-type 614D at S in a majority of strains, with a lower death toll. The scenario in Germany, compared to other European countries, remains uncertain. Therefore, continuous follow-up will be vital to assess the SARS-CoV-2 genomic variations and severity. The mutation 327S > L is present in the C-terminal domain of N proteins. Recently, it was observed that 327S > L may increase the stability in N proteins but decrease the molecular flexibility.[26] The nucleocapsid (N), orf3a, and orf10 harbored only one mutation, and these proteins have many conserved regions as variants, which have not been detected in the majority of cases.[27] Among the major targets, the nucleocapsid (N) is an important protein involved in RNA binding and is essential for RNA activities such as replication. The N protein primarily promotes the binding and packing of the RNA ribonucleoprotein complex (nucleocapsid).[28−31] A single mutation 241C > T in the 5′ UTR region was detected in the current study, which was also reported recently in a whole genome sequence from Gilgit (accession no. MT240479), Pakistan. Mutation Q57H in our genome (Table ) seems very common in Indian isolates.[32] A total of 128 genomes from Indian patients were analyzed, among which all the SARS-CoV2 genomes had mutations at Q57H of the protein ORF3a, except a single genome (accession no. MT509503) from Junagadh, apart from other mutations (Table ).

Table 4

Sociodemographic Information of SARS-CoV-2 Genomic Isolates

virus name	accession id	collection date	location	gender	age
HCOV-19/PAKISTAN/NIH-HAS001/2020	EPI_ISL_468163	02/06/2020	islamabad	male	23
HCOV-19/PAKISTAN/NIH-45579/2020	EPI_ISL_468162	02/06/2020	islamabad	female	46
HCOV-19/PAKISTAN/NIH-45090/2020	EPI_ISL_468161	02/06/2020	islamabad	female	49
HCOV-19/PAKISTAN/NIH-45143/2020	EPI_ISL_468160	02/06/2020	islamabad	female	55
HCOV-19/PAKISTAN/NIH-44905/2020	EPI_ISL_468159	02/06/2020	islamabad	male	87
HCOV-19/PAKISTAN/KHI1/2020	EPI_ISL_451958	16/03/2020	karachi	unknown	unknown
HCOV-19/PAKISTAN/GILGIT1/2020	EPI_ISL_417444	04/03/2020	gilgit	female	40
HCOV-19/PAKISTAN/KPK-KUST-SJTU/2020a	EPI_ISL_513925	15/05/2020	peshawar	male	54

Current genome (KPK).

Table 3

Mutations Detected in Whole Genome Sequences of Pakistani Isolates

query	length (nt)	length (aa)	Mutsc	novel muts	existing muts	novel muts	existing muts & freqc	clade	special charc
hCoV-19/Pakistan/Gilgit1	29,836	9710	4	0	4		(NSP2_V198I, NSP2_R27C, NSP4_P202L, NSP6_L37F)	other
hCoV-19/Pakistan/KHI1	29,819	9709	1	1	0	NSP2 (D268del)b		Ld
hCoV-19/Pakistan/NIH-44905	29,876	9710	5	0	5		(NSP6_M86I, Spike_D830A, NS8_E92K, NS8_L84S, N_S202N)	S
hCoV-19/Pakistan/NIH-45143	29,877	9710	7	2	5	NS3 (F105S)b NSP5 (N228K)b	(NSP3_Q1884H, NSP6_L37F, NSP12_P323L, Spike_D614G, NS3_Q57H)	GH
hCoV-19/Pakistan/NIH-45090	29,880	9710	6	0	6		(NSP3_Q1884H, NSP6_L37F, NSP12_P323L, Spike_D614G, NS3_Q57H, NS8_W45L)	GH
hCoV-19/Pakistan/NIH-45579	29,880	9710	8	0	8		(NSP2_L270F, NSP3_Q1884H, NSP6_L37F, NSP12_P323L, NSP14_T250I, Spike_D614G, NS3_Q57H, N_S202N)	GH
hCoV-19/Pakistan/NIH-HAS001	29,881	9710	5	0	5		(NSP6_M86I, Spike_D830A, NS8 (ORF8)b E92K, NS8_L84S, N_S202N)	S
hCoV-19/Pakistan/KPK-KUST-SJTUa	29897	9710	7	0	7		(NSP3_Q1884H, NSP6_L37F, NSP12_P323L, NSP13_A237T, Spike_D614G, NS3_Q57H, N_S327L)	GH	special char existb

KPK isolate and mutations.

Novel.

Muts: mutants.

L: reference clade, Char: Characters. Freq: Frequency of each mutations, NSP6_L37F = 5, NSP6_M86I = 2, NSP3_Q1884H = 4, NS3_Q57H = 4, NSP12_P323L = 4, N_S202N = 3, N_S327L = 1, Spike_D614G = 4, Spike_D830A = 2, NS8_L84S = 2, NS8_E92K = 2, NS8_W45L = 1, NSP2_L270F = 1, NSP13_A237T = 1, and NSP14_T250I = 1.

KPK isolate and mutations. Novel. Muts: mutants. L: reference clade, Char: Characters. Freq: Frequency of each mutations, NSP6_L37F = 5, NSP6_M86I = 2, NSP3_Q1884H = 4, NS3_Q57H = 4, NSP12_P323L = 4, N_S202N = 3, N_S327L = 1, Spike_D614G = 4, Spike_D830A = 2, NS8_L84S = 2, NS8_E92K = 2, NS8_W45L = 1, NSP2_L270F = 1, NSP13_A237T = 1, and NSP14_T250I = 1. The genome size of the SARS-CoV-2 is about 29.8 kb to 29.9 kb. The 3′ genes are E, S, and N proteins and M (Figure ). Six proteins, encoded by ORF3a, ORF6, ORF7a, ORF7b, and ORF8 genes, are also known as accessory proteins.[33−35]

Figure 1

SARS-CoV-2 genome organization. Mutations in orf1ab, S, orf3a, orf10, and N protein have been shown at each segment with a red arrow. Four missense mutations have been detected in the orf1ab, one in each, S, orf3a, orf10, and N gene. Mutations shown with red arrows have been detected in the current genome sequencing. The phylogenetic tree from 943 isolates of our two severely affected neighbors Iran (25) and China (917), along with the current isolate, downloaded from GISAID,[24] was constructed using Archaeopteryx and Phylo.io[36,37] (https://mafft.cbrc.jp/alignment/server/) from MAFT server (Figure ). Phylo.io with distinctive features is the scalability to large trees, rooting, and leaf order identification with the best matching, high usability, and standard HTML5 implementation. The tree based on nucleotide sequences showed the incidence of many clades and clusters of the SARS-CoV-2. The current isolate seems close to the Iranian isolate (Figure A) and more divergent from that collected from the National Institute of Health (NIH), Islamabad, Pakistan.

Figure 2

Phylogenetic analysis of SARS-CoV-2-KPK-KUST-SJTU/2020 (accession no. MT879619). (A) SARS-CoV-2-KPK-KUST-SJTU/2020 (red arrow), Iranian isolate (number 25), and Chinese isolates (total number 917). (B) SARS-CoV-2-KPK-KUST-SJTU/2020 (red arrow) and seven other isolates. The long name at the end of each node represents the serial number among the country isolates followed by country name, specific name given to each isolate, GISAID accession ID, and date of collection. Phylogenetically, the current genome (SARS-CoV-2-KPK-KUST-SJTU/2020) shows a unique position in the tree (Figure ). The most common mutations detected in the Pakistani isolates were NSP6_L37F (5), spike_D614G, NSP12_P323L (4), and NS3_Q57H (4). In infected individuals, G614 has been associated with high viral loads in the upper respiratory tract, but it is not involved with disease severity;[38] however, it has a stabilizing effect on the protein structure (Figure ). NS3_Q57H mutation is a clade determinant (GH) mutation. All the GH clades harbor Q57H mutation in NS3 proteins.[39]

Figure 3

Effect of point mutation (D614G) on spike protein dynamics. ΔΔG; Free energy difference. ΔΔSVib ENCoM; vibrational entropy energy. This effect has been predicted through DynaMut online server. (A) Increase in molecular flexibility (red region) due to D614G point mutation. The total energy calculated for mutants (MT) shows a stabilizing effect on the protein structure. (B) Interactions of amino acids in wild type (WT) with surrounding residues. (C) Interactions of amino acid G614 (MT) with surrounding residues.

Mutation in NSP6

Mutations (L3606F) (Table ), which link with the position L37F of NSP6 (Table ), are present in isolates from the USA, China, Hong Kong, France, Singapore, and Italy. In the current study, NSP6 harbors two mutations—L37F (5 isolates) and M86I (2 isolates). The SARS-CoV-2 NSP6 is associated with NSP3 and NSP4, which are transmembrane proteins, forming double-membrane-like vesicles.[41] Mutation L37F is present outside the trans-membrane and as a part coil segment. The residue Val at position 37 is conserved among all the sarbecoviruses’ NSP6 protein, except in SARS-CoV-2 (L37). A substitution of aliphatic Leu with an aromatic Phe amino acid might have functional implications, where the Phe residue performs cation−π interactions, affecting protein interactions in the L37F MT. The effect of this mutation on structural dynamics could not be unveiled due to the lack of experimental data in the Protein Data Bank for NSP6 homology modeling using templates. Mutation L37F in NSP6 leads to a weak SARS-CoV-2 subtype which may help in SARS-CoV-2 transmission and evolution across various regions over time during the pandemic.[42] The effect of mutation NSP3_Q1884H could not be evaluated due to the unavailability of the complete crystal structure and suitable temple for the homology model.

Mutation in NSP12 (RdRp)

Mutation P323L in NSP12 (RdRp) in Pakistani isolates is the second-most common variant (Table ). Due to the unique property of proline, it cyclizes back from the side chain onto the amide position of the backbone, contributing to the development of the secondary structure because of the immense pyrrolidine ring. Mutant P323L shifts the structure integrity and might have functional consequences.[43] Here, in MT, we detected a decrease in RdRp flexibility but a more stabilizing effect (Figure ), which might be stable while interacting with RNA during infection. Mutation P323L lies in residue A250–R365, which is known as the interface domain of the RdRp. This domain is a putative docking site,[43] and mutation in this interface region may affect the interactions with some antivirals such as filibuvir, simeprevir, tegobuvir, and so on.

Figure 4

Effect of point mutation (P323L) on NSP12 (RdRp) dynamics. This effect has been predicted through DynaMut online server. (A) Decrease in RdRp flexibility (blue region). The effect of P323L seems stabilizing on the protein structure. Interactions of amino acids in WT and MT with surrounding residues have been encircled. MT has more interactions than WT.

Mutations in NS8 (ORF8)

Among the three mutations (NS8_L84S, NS8_E92K, and NS8_W45L) in NS8 (ORF8) of SARS-CoV-2, the appearance of E92K has not been reported in earlier studies.[44,45] Mutation L84S in the current study has been detected in two isolates, which is a strain determining mutation of clade S (Table ). ORF8 is remarkably divergent and consists of a predicted Ig-like fold next to the N-terminal signal sequence.[46] ORF8 from SARS-CoV and SARS-CoV-2 owns a signal sequence for endoplasmic reticulum import. Here, in the lumen, SARS-CoV-2 ORF8 interacts with ER-associated degradation factors and a variety of other host proteins.[47] It is secreted because ORF8 antibodies are among the major markers in SARS-CoV-2 patients.[48] ORF8 disrupts IFN-I signaling and downregulates MHC-I in cells.[49] Mutation L84S in ORF8 has been associated with decreased stability of ORF8.[44] L84S destabilizes the folding, which may cause upregulation of the host-immune activity[50] (Figure ). We predicted the stability effect of MTs NS8_E92K and NS8_W45L through DynaMut online server.[40] The result shows a destabilizing effect, which might be important for the upregulation of the immune activity and, thus, the successful eradication of the infection.

Figure 5

Effect of point mutations (L84S, E93K, and W45L) on NS8 (ORF8) structure and dynamics. L84S, E93K, and W45L have a destabilizing effect. E93K and W45L exhibited an increase in flexibility while L84S shows decrease.

Mutations in Nucleocapsid (N) Proteins

In the current study, two mutations, N_S202N and N_S327L, were detected (Figures , 6 and Table ). S202N is present in the serine–arginine-rich region (residues 184–2024), close to the N-terminal domain (N-NTD). Due to the unavailability of the crystal structure of this region, its impact on the dynamics of N proteins could not be explored. A recent study[51,52] reported that the interactions between Nsp3 and N are mediated by residues 1 to 194 (N1a–N1b) and 195 to 257 (N2a). Mutation N_S202N was detected in N2a; however, its impacts on the interaction with Nsp3 need to be evaluated. Mutation S327L has been detected in the C-terminal domain (247–364) (Figure ), and its impact on the N-CTD dynamics is shown in Figure . Compared to wild type, the MT L327 seems more stabilized and has molecular flexibility. The N protein is 419 amino acid long and present in the nucleocapsid of the SARS-CoV-2. The N protein comprises the N-terminal domain (NTD) (46–176 aa), also called the RNA-binding domain, that binds to the 3′-end of the viral RNA, a linker of the serine–arginine-rich region (182–247 aa) which interacts directly with RNA and plays a part in cell signaling,[53,54] and residues 247 to 364 of the C-terminal domain (CTD).[55,56] All the three domains electrostatically interact with the SARS-CoV-2 genomic RNA and modulate unwinding. Screening mutation in the local isolate might help identify new targets and design a vaccine for the better management of COVID-19.

Figure 6

Effect of S327L mutation on N protein structure and dynamics (PDB ID 6yun). Flexibility seems increased due to substitution of leucine in place of serine at position 327. This mutation has a stabilizing effect as shown in blue.

Mutation in NSP13 (Helicase)

Helicases play a key role in viral RNA replication and a vital step in viral propagation and pathogenesis. Therefore, they are ideal targets for antiviral drugs. In the current study, a single substitution A237T was detected in the NSP13, which shows a more stabilizing effect on the helicase structure activity (Figure ). Furthermore, molecular flexibility decreases in the MT residue T237, forming more interactions. Before designing a potential drug, mutations should be screened and characterized for the better management of inhibitors.

Figure 7

Mutation in NSP13 (Helicase) at position A237T and its dynamic effect. The structure was downloaded from Swiss-Model server (PRO_0000449630). The MT exhibits lower flexibility than WT. This mutation has a stabilizing effect as shown in blue. The MT and WT residue has been shown in light green, depicting the interaction with surrounding residues. MTs seem to form more interactions than WT. The mutations in ORF10 may alter the interaction potency to human leukocyte antigen alleles and might be involved in changing the immunogenicity of SARS-CoV-2 ORF10.[57] About 35% of the variants, including V30A, contain high and low binding affinity epitopes. It has been found that variants in ORF10 may decline the epitope’s affinity to escape the host-immune system. However, further investigations are needed to predict the binding affinity of all variants in orf10 for immunological studies. We detected a novel mutation, V30L in ORF10 (Table ); however, its effect needs to be elucidated for a better understanding of immunogenicity. In January 2020, clades 19A/L/V/O (Nextstrain/GISAID nomenclatures) were more widespread in Europe than the 20A/G clades; however, small numbers of sequenced viral genomes and travel history in Europe during the early pandemic days may affect these findings. The 20A/G clades are characterized by D614G mutation in spike protein, suggesting increased transmissibility but not pathogenicity.[38] In a recent study,[33] although the 95 SARS-CoV-2 genome was analyzed, no mutation was reported from orf10. Similarly, among the 128 genomes from Indian strains, no mutation has been found in the ORF10. In this situation, for the better management of viral infections, continuous assessment needs to be conducted in newly diagnosed cases through whole genome analysis to explore the genome variation.

Mutation (N228K) in NSP5

Due to its vital role in processing the polyproteins translated from SARS-CoV-2 RNA,[58−60] one of the most attractive drug targets in SARS-CoV-2 is the main protease (Mpro, 3CLpro). A single novel mutation (N228K) was detected in the hCoV-19/Pakistan/NIH-45143 isolate, in a 55-year-old female patient (Table ). The effect of this mutation on the main protease seems destabilizing and decreases in residue flexibility (Figure ).

Figure 8

Mutation (N228K) in NSP5 (main protease) and its dynamic effect.

Mutation (N228K) in NSP5 (main protease) and its dynamic effect. The substrate-binding sites of Mpro are domains I and II (residues 10–99 and 100–182) in picornavirus, and they form six-stranded antiparallel β-barrels (Figure ). Residues 198–303 form domain III, consisting of five helices that regulate the dimerization of the Mpro between Glu290 and Arg4 of different protomers, primarily through salt bridges.[61] Amino acids C145 and H41 form the catalytic site. In CoV-2, residue T285 is substituted by A285 and I286 is substituted by L286 F.[58] Substituting S284, T285, and I286 for alanine in Mpro led to a threefold increase in enzymatic activity.[60] Although mutation N228K is present far from the catalytic site in domain III residues, it may affect dimerization, required for catalytic activity depending on the dimerization of the enzyme. In all novel mutations where the crystal structure is not available, the MTs’ [NSP2(D268del), NS3 (F105S), and orf1ab (Q2702H)] function implication is difficult to be predicted and need further validation. Alternatively, large data set collected about such MTs may be helpful to draw any conclusion on the effect of viral transmission or virulency.

Conclusions

In conclusion, whole genome sequencing and sequence analysis of locally prevalent isolates in Pakistan harbor some specific mutations and characters. The mutations detected in the main targets have a stabilizing effect in most of the cases. Some novel mutations have been detected in NSP2 (D268del), NSP5 (N228K), and NS3 (F105S). The 20A/G clades are characterized by D614G mutation in spike protein, suggesting increased transmissibility but not pathogenicity. Mutation S327L in CTD of N increases the flexibility, while it stabilizes the structure. Point mutations (L84S, E93K, and W45L) in NS8 (ORF8) have a destabilizing effect on the structure. However, E93K and W45L exhibited increased flexibility compared to L84S. A single novel mutation N228K detected in Mpro, located in domain III, may have catalytic consequences, which need to be validated through experimental approaches. For the better management of SARS-CoV-2 infections, whole genome sequence analysis may offer useful information behind the transmission, pathogenicity, and severity of SARS-CoV-2 isolates in specific geographic regions.

Materials and Methods

Area of Sample Collection

A single sample of nasopharyngeal swab was collected from a suspected SARS-CoV-2 male patient of district Peshawar, Khyber Pakhtunkhwa (KPK), during the first wave of infection on June 15th, 2020. The person has no idea regarding from whom he contracted the infection. Four persons in his family developed similar symptoms. However, we collected the sample from a single 54-year-old person who had developed severe symptoms including, fever, fatigue, dry cough, bone pain, shortness of breath, and loss of smell, taste, and sensation. The patient also complained about the loss of sexual desire for more than a month. The duration of smell and taste loss was not confirmed by the patient. The infection was not fatal and the patient felt better 22 days after the appearance of first symptoms of dry cough and fever. The samples (SARS-CoV-2-KPK-KUST-SJTU/2020) were submitted to GenBank under accession number MT879619 (https://www.ncbi.nlm.nih.gov/nuccore/MT879619).

Sample Processing and Confirmation

The sample was collected according to the complete protocol of the biosafety interim guideline. To confirm the sample was COVID-19 positive, the nasopharyngeal swab specimen was taken and placed in 3 mL of the normal saline medium. It was mixed by inverting it a minimum of five times. The diluted specimen was then transferred to a SARS-CoV-2 Xpert cartridge through a sterile dropper and loaded into the GeneXpert System platform (Rapidmicrobiology Xpert Xpress SARS-CoV-2 Point-of-Care Test).[62,63] The GeneXpert Dx System (Cepheid, Sunnyvale, CA) is an automated sample-processing and real-time PCR component with a completely closed cartridge, containing sample-processing and lyophilized form of the real-time RT-PCR reagent. The machine consists of modules, each for separate testing purposes, to avoid cross-contamination. It has built-in auto-sample preparation, extraction, amplification, and detection for target sequence detection. The Xpert SARS-CoV-2 test targets the E and N2 genes.[64] The system consists of a software-based auto-interpretation of results,[65] which are automatically compared and analyzed through auto built-in, pre-established software.

Viral RNA Extraction and Sequencing

The sample was processed for viral RNA extraction using a QIAamp kit (Qiagen, Germany), and amplification was carried out according to the protocol of ARTIC nCoV-2019.[66] RNA Quality Control was checked using a Qubit RNA BR assay kit (Invitrogen), and cDNA was synthesized using Revert Aid First Strand cDNA synthesis kit. (Thermo Scientific) PCR was carried out using Phusion Flash High-Fidelity PCR Master Mix, and (Thermo Scientific) library preparation was performed through NextEra XT DNA library preparation kit, Illumina, San Diego, CA. Sequencing was performed using Illumina MiSeq at Rehman Medical Institute, Peshawar, Pakistan.

Data Analysis

The read quality of fastq files was checked using the FastQC tool (v0.11.8). The Trimmomatic tool (v0.39) was used to remove low-quality base calls (Q < 30) and index adapter sequences from both ends of the sequenced reads. The filtered reads were aligned with the Wuhan reference genome (accession no. NC 045512) using the default settings for the Burrows Wheeler Aligner (BWA, v0.6). Using Picard Tools (v2.21.6), the PCR duplicates were removed from the reads. To solve the mapping problems resulting from the existence of small Indels, mapped reads were analyzed using the Genome Analysis Toolkit and command-line tools “RealignerTargetCreator” and “InDelRealigner” (GATK v. 3.3.0). GATK tool “HaplotypeCaller” was used to call SNPs and Indels for variant calling, through local de-novo assembly of haplotypes in the regions showing deviation. Using default settings, GATK “Variant Filtration” was used to exclude potential false variants from the raw call set of variants. The annotation was performed using publicly available tools, at China’s National Genomics Data Center (NGDC). Variants were detected through Genome-to-Variants [https://bigd.big.ac.cn/ncov/online/tool/variation]. The identified variants were cross-validated manually by loading the sequences in BioEdit[67] and checking the reported variants one-by-one. Seven other whole genome complete sequences originating in Pakistan were also downloaded, and the variants were analyzed through GISAID[24] (Table ). Current genome (KPK).

Mutation Effect on Viral Protein

The mutations in the current study were studied to explore their effect on protein structures and dynamics using the DynaMut server.[40] The server implements two distinct, normal-mode methods, which can be used to analyze and assess the mutations’ effect on protein stability and dynamics, resulting from changes in vibrational entropy. The impact of a mutation is predicted on protein stability through the integration of normal-mode dynamics along with graph-based signatures. This approach outperforms to predict the mutations’ effect on protein stability and flexibility (P-value < 0.001). The results are displayed in good resolution in different tabs to observe the analyses available for mutations’ effect on protein dynamics and stability.

4 in total

4. Coronavirus Genomes and Unique Mutations in Structural and Non-Structural Proteins in Pakistani SARS-CoV-2 Delta Variants during the Fourth Wave of the Pandemic.

Authors: Muhammad Zeeshan Anwar; Madeeha Shahzad Lodhi; Muhammad Tahir Khan; Malik Ihsanullah Khan; Sumaira Sharif
Journal: Genes (Basel) Date: 2022-03-21 Impact factor: 4.096