| Literature DB >> 33748571 |
Muhammad Tahir Khan1,2,3, Sajid Ali4, Anwar Sheed Khan5, Noor Muhammad5, Faiza Khalil6,7, Muhammad Ishfaq8, Muhammad Irfan9, Abdullah G Al-Sehemi10,11, Shabbir Muhammad10,11,2,3, Arif Malik1, Taj Ali Khan12, Dong Qing Wei2,3.
Abstract
Among viral outbreaks, the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is one of the deadliest ones, and it has triggered the global COVID-19 pandemic. In Pakistan, until 5th September 2020, a total of 6342 deaths have been reported, of which 1255 were from the Khyber Pakhtunkhwa (KPK) province. To understand the disease progression and control and also to produce vaccines and therapeutic efforts, whole genome sequence analysis is important. In the current investigation, we sequenced a single sample of SARS-CoV-2 genomes (accession no. MT879619) from a male suspect from Peshawar, the KPK capital city, during the first wave of infection. The local SARS-CoV-2 strain shows some unique characteristics compared to neighboring Iranian and Chinese isolates in phylogenetic tree and mutations. The circulating strains of SARS-CoV-2 represent an intermediate evolution from China and Iran. Furthermore, eight complete whole genome sequences, including the current Pakistani isolates which have been submitted to Global Initiative on Sharing All Influenza Data (GSAID), were also investigated for specific mutations and characters. Some novel mutations [NSP2 (D268del), NSP5 (N228K), and NS3 (F105S)] and specific characters have been detected in the coding regions, which may affect viral transmission, epidemiology, and disease severity. The computational modeling revealed that a majority of these mutations may have a stabilizing effect on the viral protein structure. In conclusion, the genome sequencing of local strains is important for better understanding the pathogenicity, immunogenicity, and epidemiology of causative agents.Entities:
Year: 2021 PMID: 33748571 PMCID: PMC7944396 DOI: 10.1021/acsomega.0c05163
Source DB: PubMed Journal: ACS Omega ISSN: 2470-1343
Statistics of the Whole Genome Sequence of SARS-CoV-2
| statistics type | number |
|---|---|
| aligned bases | 45,417,209 |
| aligned reads number | 367,235 |
| coverage % | 99.97 |
| duplication rate % | 31.4093 |
| indel rate % | 0.0055 |
| mean read length | 125.6 |
| mismatch rate % | 0.24 |
| sequencing depth | 1438.63 |
| variants number | 15 |
Mutations Detected in SARS-CoV-2 Whole Genome of the KPK Isolate
| s.no. | position | Ref: | Alt: | gene | variant type | protein position | codon position |
|---|---|---|---|---|---|---|---|
| 1 | 241 | C | T | START:5′UTR | upstream | QHD43415.1 | gene-orf1ab |
| 2 | 2416 | C | T | orf1ab | synonymous | 717Y > Y | 2151TAC > TAT |
| 3 | 3037 | C | T | orf1ab | synonymous | 924F > F | 2772TTC > TTT |
| 4 | 8371 | G | T | orf1ab | missense | 2702Q > H | 8106CAG > CAT |
| 5 | 9208 | T | C | orf1ab | synonymous | 2981S > S | 8943TCT > TCC |
| 6 | 10741 | C | T | orf1ab | synonymous | 3492D > D | 10476GAC > GAT |
| 7 | 11083 | G | T | orf1ab | missense | 3606L > F (L37F on NSP6) | 10818TTG > TTT |
| 8 | 12565 | G | A | orf1ab | synonymous | 4100Q > Q | 12300CAG > CAA |
| 9 | 14408 | C | T | orf1ab | missense | 4715P > L | 14144CCT > CTT |
| 10 | 16945 | G | A | orf1ab | missense | 5561A > T | 16681GCA > ACA |
| 11 | 22477 | C | T | S | synonymous | 305S > S | 915TCC > TCT |
| 12 | 23403 | A | G | S | missense | 614D > G | 1841GAT > GGT |
| 13 | 25563 | G | T | orf3a | missense | 57Q > H | 171CAG > CAT |
| 14 | 29253 | C | T | N | missense | 327S > L | 980TCG > TTG |
| 15 | 29645 | G | T | orf10 | missense | 30V > L | 88GTA > TTA |
Ref: Reference, Alt: Alteration.
Novel.
Sociodemographic Information of SARS-CoV-2 Genomic Isolates
| virus name | accession id | collection date | location | gender | age |
|---|---|---|---|---|---|
| HCOV-19/PAKISTAN/NIH-HAS001/2020 | EPI_ISL_468163 | 02/06/2020 | islamabad | male | 23 |
| HCOV-19/PAKISTAN/NIH-45579/2020 | EPI_ISL_468162 | 02/06/2020 | islamabad | female | 46 |
| HCOV-19/PAKISTAN/NIH-45090/2020 | EPI_ISL_468161 | 02/06/2020 | islamabad | female | 49 |
| HCOV-19/PAKISTAN/NIH-45143/2020 | EPI_ISL_468160 | 02/06/2020 | islamabad | female | 55 |
| HCOV-19/PAKISTAN/NIH-44905/2020 | EPI_ISL_468159 | 02/06/2020 | islamabad | male | 87 |
| HCOV-19/PAKISTAN/KHI1/2020 | EPI_ISL_451958 | 16/03/2020 | karachi | unknown | unknown |
| HCOV-19/PAKISTAN/GILGIT1/2020 | EPI_ISL_417444 | 04/03/2020 | gilgit | female | 40 |
| HCOV-19/PAKISTAN/KPK-KUST-SJTU/2020 | EPI_ISL_513925 | 15/05/2020 | peshawar | male | 54 |
Current genome (KPK).
Mutations Detected in Whole Genome Sequences of Pakistani Isolates
| query | length (nt) | length (aa) | Muts | novel muts | existing muts | novel muts | existing muts & freq | clade | special char |
|---|---|---|---|---|---|---|---|---|---|
| hCoV-19/Pakistan/Gilgit1 | 29,836 | 9710 | 4 | 0 | 4 | (NSP2_V198I, NSP2_R27C, NSP4_P202L, NSP6_L37F) | other | ||
| hCoV-19/Pakistan/KHI1 | 29,819 | 9709 | 1 | 1 | 0 | NSP2 (D268del) | L | ||
| hCoV-19/Pakistan/NIH-44905 | 29,876 | 9710 | 5 | 0 | 5 | (NSP6_M86I, Spike_D830A, NS8_E92K, NS8_L84S, N_S202N) | S | ||
| hCoV-19/Pakistan/NIH-45143 | 29,877 | 9710 | 7 | 2 | 5 | NS3 (F105S) | (NSP3_Q1884H, NSP6_L37F, NSP12_P323L, Spike_D614G, NS3_Q57H) | GH | |
| hCoV-19/Pakistan/NIH-45090 | 29,880 | 9710 | 6 | 0 | 6 | (NSP3_Q1884H, NSP6_L37F, NSP12_P323L, Spike_D614G, NS3_Q57H, NS8_W45L) | GH | ||
| hCoV-19/Pakistan/NIH-45579 | 29,880 | 9710 | 8 | 0 | 8 | (NSP2_L270F, NSP3_Q1884H, NSP6_L37F, NSP12_P323L, NSP14_T250I, Spike_D614G, NS3_Q57H, N_S202N) | GH | ||
| hCoV-19/Pakistan/NIH-HAS001 | 29,881 | 9710 | 5 | 0 | 5 | (NSP6_M86I, Spike_D830A,
NS8 (ORF8) | S | ||
| hCoV-19/Pakistan/KPK-KUST-SJTU | 29897 | 9710 | 7 | 0 | 7 | (NSP3_Q1884H, NSP6_L37F, NSP12_P323L, NSP13_A237T, Spike_D614G, NS3_Q57H, N_S327L) | GH | special
char exist |
KPK isolate and mutations.
Novel.
Muts: mutants.
L: reference clade, Char: Characters. Freq: Frequency of each mutations, NSP6_L37F = 5, NSP6_M86I = 2, NSP3_Q1884H = 4, NS3_Q57H = 4, NSP12_P323L = 4, N_S202N = 3, N_S327L = 1, Spike_D614G = 4, Spike_D830A = 2, NS8_L84S = 2, NS8_E92K = 2, NS8_W45L = 1, NSP2_L270F = 1, NSP13_A237T = 1, and NSP14_T250I = 1.
Figure 1SARS-CoV-2 genome organization. Mutations in orf1ab, S, orf3a, orf10, and N protein have been shown at each segment with a red arrow. Four missense mutations have been detected in the orf1ab, one in each, S, orf3a, orf10, and N gene. Mutations shown with red arrows have been detected in the current genome sequencing.
Figure 2Phylogenetic analysis of SARS-CoV-2-KPK-KUST-SJTU/2020 (accession no. MT879619). (A) SARS-CoV-2-KPK-KUST-SJTU/2020 (red arrow), Iranian isolate (number 25), and Chinese isolates (total number 917). (B) SARS-CoV-2-KPK-KUST-SJTU/2020 (red arrow) and seven other isolates. The long name at the end of each node represents the serial number among the country isolates followed by country name, specific name given to each isolate, GISAID accession ID, and date of collection.
Figure 3Effect of point mutation (D614G) on spike protein dynamics. ΔΔG; Free energy difference. ΔΔSVib ENCoM; vibrational entropy energy. This effect has been predicted through DynaMut online server. (A) Increase in molecular flexibility (red region) due to D614G point mutation. The total energy calculated for mutants (MT) shows a stabilizing effect on the protein structure. (B) Interactions of amino acids in wild type (WT) with surrounding residues. (C) Interactions of amino acid G614 (MT) with surrounding residues.
Figure 4Effect of point mutation (P323L) on NSP12 (RdRp) dynamics. This effect has been predicted through DynaMut online server. (A) Decrease in RdRp flexibility (blue region). The effect of P323L seems stabilizing on the protein structure. Interactions of amino acids in WT and MT with surrounding residues have been encircled. MT has more interactions than WT.
Figure 5Effect of point mutations (L84S, E93K, and W45L) on NS8 (ORF8) structure and dynamics. L84S, E93K, and W45L have a destabilizing effect. E93K and W45L exhibited an increase in flexibility while L84S shows decrease.
Figure 6Effect of S327L mutation on N protein structure and dynamics (PDB ID 6yun). Flexibility seems increased due to substitution of leucine in place of serine at position 327. This mutation has a stabilizing effect as shown in blue.
Figure 7Mutation in NSP13 (Helicase) at position A237T and its dynamic effect. The structure was downloaded from Swiss-Model server (PRO_0000449630). The MT exhibits lower flexibility than WT. This mutation has a stabilizing effect as shown in blue. The MT and WT residue has been shown in light green, depicting the interaction with surrounding residues. MTs seem to form more interactions than WT.
Figure 8Mutation (N228K) in NSP5 (main protease) and its dynamic effect.