Literature DB >> 35399222

The status and analysis of common mutations found in the SARS-CoV-2 whole genome sequences from Bangladesh.

Sadniman Rahman1, Md Asaduzzaman Shishir2, Md Ismail Hosen3, Miftahul Jannat Khan4, Ashiqul Arefin5, Ashfaqul Muid Khandaker1.   

Abstract

Rapid emergence of covid-19 variants by continuous mutation made the world experience continuous waves of infections and as a result, a huge number of death-toll recorded so far. It is, therefore, very important to investigate the diversity and nature of the mutations in the SARS-CoV-2 genomes. In this study, the common mutations occurred in the whole genome sequences of SARS-CoV-2 variants of Bangladesh in a certain timeline were analyzed to better understand its status. Hence, a total of 78 complete genome sequences available in the NCBI database were obtained, aligned and further analyzed. Scattered Single Nucleotide Polymorphisms (SNPs) were identified throughout the genome of variants and common SNPs such as: 241:C>T in the 5'UTR of Open Reading Frame 1A (ORF1A), 3037: C>T in Non-structural Protein 3 (NSP3), 14,408: C>T in ORF6 and 23,402: A>G, 23,403: A>G in Spike Protein (S) were observed, but all of them were synonymous mutations. About 97% of the studied genomes showed a block of tri-nucleotide alteration (GGG>AAC), the most common non-synonymous mutation in the 28,881-28,883 location of the genome. This block results in two amino acid changes (203-204: RG>KR) in the SR rich motif of the nucleocapsid (N) protein of SARS-CoV-2, introducing a lysine in between serine and arginine. The N protein structure of the mutant was predicted through protein modeling. However, no observable difference was found between the mutant and the reference (Wuhan) protein. Further, the protein stability changes upon mutations were analyzed using the I-Mutant2.0 tool. The alteration of the arginine to lysine at the amino acid position 203, showed reduction of entropy, suggesting a possible impact on the overall stability of the N protein. The estimation of the non-synonymous to synonymous substitution ratio (dN/dS) were analyzed for the common mutations and the results showed that the overall mean distance among the N-protein variants were statistically significant, supporting the non-synonymous nature of the mutations. The phylogenetic analysis of the selected 78 genomes, compared with the most common genomic variants of this virus across the globe showed a distinct cluster for the analyzed Bangladeshi sequences. Further studies are warranted for conferring any plausible association of these mutations with the clinical manifestation.
© 2022 Published by Elsevier Inc.

Entities:  

Keywords:  +ssRNA, positive single-stranded RNA; ACE2, Angiotensin-Converting Enzyme 2; Block mutation; CDK, Cyclin Dependent Kinases; COX2, Cyclooxygenase 2; CTD, C-terminal Domain; CoVs, Coronaviruses; Common mutations; DGHS, General of Health Services; ECM, Extracellular Matrix Protein; ERGIC, ER-Golgi intermediate compartment; GSK3, Glycogen Synthase Kinase 3; IRF3, Interferon Regulatory Factor 3; NFkB, Nuclear Factor kappa B; NSP, Nonstructural Protein; NTD, N-terminal Domain; ORFs, Open Reading Frames; PLP, Papain-like Protease; RBD, Receptor-Binding Domain; RTC, Replication–Transcription Complex; RdRp, RNA-dependent RNA polymerase; SARS-CoV-2; SNP, Single Nucleotide Polymorphism; SR rich motif; TMPRSS2, Transmembrane Protease Serine 2; sgRNAs, Sub-genomic RNAs

Year:  2022        PMID: 35399222      PMCID: PMC8977224          DOI: 10.1016/j.genrep.2022.101608

Source DB:  PubMed          Journal:  Gene Rep        ISSN: 2452-0144


Introduction

SARS-CoV-2, the causative agent of COVID-19 infection, is a faster spreading pathogenic virus than the earlier SARS and MERS coronaviruses and belongs to the β- coronavirus genera (Naqvi et al., 2020). SARS-CoV-2 pathogenesis involves both the innate as well as the adaptive immune system (Morse et al., 2020) leading to the activation of signaling cascades, culminating in the release of cytokines, and chemokines and causes the recruitment of immune cells to the site of infection (Fung and Liu, 2014). And the dysregulation of the host's immune response leads to excessive inflammation, altered adaptive immune response, and sometimes even to death (Moens and Meyts, 2020). Furthermore, emergence of new variants due to the mutation in the viral genome is facilitating newer clinical manifestations (Bakhshandeh et al., 2021). Although most mutations in the SARS-CoV-2 genome are predicted to be very insignificant, a small proportion might affect the functional properties, modify the infectivity, severity of disease or interactions with host immunity (Harvey et al., 2021). The complete genome of SARS-CoV-2 is about 29.9 kb (Wuhan variant) with a GC content of 38% and composed of 12 functional open reading frames (ORFs) (Khailany et al., 2020; Naqvi et al., 2020). The ORF1a and ORF1b (5′-3′) encode 16 non-structural proteins (NSP1–NSP16), i.e. polyproteins (Alanagreh et al., 2020) among which NSP3 (4955–5900 bp) and NSP5 (10,055–10,977 bp) encode for proteases (Fig. 1 ) that engage in the cleaving of polypeptides and blocking of the host's innate immune response (Rastogi et al., 2020).
Fig. 1

SARS-CoV-2 whole genome structure and organization.

SARS-CoV-2 whole genome structure and organization. Worldwide, multiple genomic variants, harboring different mutations in the spike protein of the SARS-CoV-2 have been detected, such as the B.1.1.7 (first detected in UK, September 2020), B.1.351 (South Africa, December 2020), P.1 (Detected in Japan from Brazilian travelers, January 2021), B.1.427/B.1.429 (USA, February 2021) and B.1.617.2 (India, 2021) variants (Davies and Jarvis, 2021; Zhou et al., 2021; Sabino et al., 2021; McCallum et al., 2021; Adam, 2021). The first positive case of SARS-CoV-2 infection in Bangladesh was detected through RT-PCR assays in three Bangladeshi individuals on 07th March 2020 (Anwar et al., 2020). Since then, there are certain reports on the genome analysis of the SARS-CoV-2 from Bangladesh, the physiological conditions of the patients, association of the comorbidities to the severity as well as the comparison of global and local mutations (Hasan et al., 2021; Mannan et al., 2021; Rahman et al., 2021). In this present study, we specifically monitored and analyzed 78 curated whole-genome sequences of SARS-CoV-2 submitted at the NCBI genome databases from Bangladesh to understand the commonly found mutations and the nature of those mutations. Thus, understanding the nature of common mutations in a timeline will help in analyzing the diverse SARS-CoV-2 genomes in the country.

Methodology

This study has been conducted based on the analysis of the whole genome sequences of the SARS-CoV-2 submitted from Bangladesh to the NCBI database (https://www.ncbi.nlm.nih.gov/sars-cov-2/) from June to October 2020. Sequences used in this study have been given in the Table 1 with their accession numbers.
Table 1

Details of mutations identified in the SARS-CoV-2 whole genome sequences submitted from Bangladesh from June 2020 to December 2021.

PositionFromToType of mutationAccession no.Submitted onMutation frequency (%)
205TCSMMT876547.1MT876554.1MT876599.1MT876606.112 August 202012 August 202012 August 202012 August 20206.55
*241CTSMAll from BD100
683CTSMMT876433.112 August 20201.63
751CTSMMT607252.115 June 20201.63
1036TCSMMT655744.123 June 20201.63
1148GTSMMT876572.112 August 20201.63
1457CTSMMT876526.112 August 20201.63
1734CTSMMT876547.112 August 20201.63
1820GASMMT657958.1MT876547.123 June 202012 August 20203.27
2057AGSMMT607252.115 June 20201.63
2110CASMMT876556.112 August 20201.63
2113CTSMMT731937.108 July 20201.63
2210GTSMMT581414.1MW531680.131 October 202027 January 20213.27
2288GASMMT876599.112 August 20201.63
2388CTSMMT731932.1MT731933.1MT731935.108 July 202008 July 202008 July 20204.91
2805ACSMMT876556.112 August 20201.63
2910CTSMMT876555.1MW624725.112 August 202018 February 20213.27
*3037CTSMAll from BD100
3077GASMMT581419.1MT581416.131 October 20203.27
3163TCSMMT876556.112 August 20201.63
3234AGSMMT657958.123 June 20201.63
3533TCSMMT666099.125 June 20201.63
3688CTSMMT581419.1MT581418.1MT581417.1MT581416.1MT581415.131 October 20208.19
3754AGSMMT666068.125 June 20201.63
3871GTSMMT581411.131 October 20201.63
3961CTSMMT601275.1MT648676.1MT655744.1MT655746.1MT876432.1MT876555.1MT876525.1MT876572.1MT581413MT581410.112 June 202022 June 202023 June 202023 June 202012 August 202012 August 202012 August 202012 August 202031 October 202031 October 202016.39
4024GASMMT876572.112 August 20201.63
4113CTSMMT876607.112 August 20201.63
4298GTSMMT581415.131 October 20201.63
4300GTSMMT876525.112 August 20201.63
4444GTSMMT581423.1MT581422.1MT581420.131 October 20204.91
4503ATSMMT664171.125 June 20201.63
4522GTSMMT601281.112 June 20201.63
4579TASMMT667351.125 June 20201.63
4778AGSMMT876598.112 August 20201.63
5037GCSMMT876599.112 August 20201.63
5366AGSMMT655744.123 June 20201.63
5621CTSMMT607252.115 June 20201.63
5832CTSMMT655744.123 June 20201.63
5950GTSMMT876555.112 August 20201.63
6120CTSMMT731937.108 July 20201.63
6359GASMMT664106.1MT664109.125 June 202025 June 20203.27
6578CTSMMT581416.131 October 20201.63
6807CTSMMT876547.112 August 20201.63
7113CTSMMT876527.112 August 20201.63
7528CTSMMT581419.131 October 20201.63
8026AGSMMT731937.108 July 20201.63
8127CTSMMT664171.125 June 20201.63
8156TGSMMT876599.112 August 20201.63
8327CTSMMT666099.125 June 20201.63
8371GTSMMT581423.1MT581422.1MT581420.131 October 20204.91
9050CTSMMT731936.108 July 20201.63
9223CTSMMT657271.123 June 20201.63
9246CTSMMT655744.123 June 20201.63
9502CTSMMT731937.108 July 20201.63
9532CTSMMT876556.112 August 20201.63
9565CTSMMT666099.125 June 20201.63
9828AGSMMT731936.108 July 20201.63
10,198CTSMMT731935.108 July 20201.63
10,252CTSMMT731932.1MT58141308 July 202031 October 20203.27
10,323AGSMMT876606.112 August 20201.63
10,834CTSMMT876525.112 August 20201.63
10,870GTSMMT876571.112 August 20201.63
11,036CASMMT876555.112 August 20201.63
11,042GTSMMT876554.112 August 20201.63
11,083GTSMMT731937.108 July 20201.63
11,719GASMMT876526.112 August 20201.63
11,761GTSMMT667351.125 June 20201.63
11,824CTSMMT876599.112 August 20201.63
12,061AGSMMT876527.112 August 20201.63
12,070GTSMMT876607.112 August 20201.63
12,085CTSMMT731937.108 July 20201.63
12,357CTSMMT664107.125 June 20201.63
12,672CTSMMT648676.122 June 20201.63
12,936ACSMMT876547.112 August 20201.63
13,201GTSMMT664107.125 June 20201.63
13,348GTSMMT601281.112 June 20201.63
13,812GTSMMT731932.108 July 20201.63
13,920GASMMT876525.112 August 20201.63
14,110CASMMT655744.123 June 20201.63
*14408CTSMAll from BD100
14,645CTSMMT581410.1MW624725.131 October 202018 February 20213.27
15,324CTSMMT581419.1MT581418.1MT581417.1MT581416.1MT581415.131 October 20208.19
15,540CTSMMT58141331 October 20201.63
15,543GTSMMT876554.112 August 20201.63
15,714CTSMMT601281.112 June 20201.63
15,738CTSMMT876546.112 August 20201.63
15,960CTSMMT731937.108 July 20201.63
15,982GTSMMT876525.112 August 20201.63
16,596CTSMMT876607.112 August 20201.63
16,830CTSMMT731934.108 July 20201.63
16,939TCSMMT731936.108 July 20201.63
17,259GTSMMT667351.125 June 20201.63
17,427GTSMMT664171.125 June 20201.63
17,678CTSMMT876554.112 August 20201.63
18,105GTSMMT664106.125 June 20201.63
18,131CTSMMT876607.112 August 20201.63
18,457CTSMMT876607.112 August 20201.63
18,735AGSMMT601275.112 June 20201.63
18,877CTSMMT601281.1MT664107.112 June 202025 June 20203.27
19,162GTSMMT581419.131 October 20201.63
19,273CTSMMT655948.123 June 20201.63
19,398GASMMT655750.123 June 20201.63
19,723GTSMMT731932.108 July 20201.63
20,436CTSMMT655744.123 June 20201.63
20,480CTSMMT655746.123 June 20201.63
20,628CTSMMT731936.108 July 20201.63
20,679GTSMMT876548.112 August 20201.63
20,774GASMMT58141231 October 20201.63
20,893GTSMMT664107.125 June 20201.63
20,955TCSMMT876571.112 August 20201.63
21,204GTSMMT655746.123 June 20201.63
21,216CTSMMT876554.112 August 20201.63
21,306CTSMMT655746.123 June 20201.63
21,595CTSMMT876431.112 August 20201.63
21,639CTSMMT655746.123 June 20201.63
21,707CTSMMT876546.112 August 20201.63
21,855CTSMMT876555.112 August 20201.63
21,941GTSMMT655745.123 June 20201.63
21,998CTSMMT876606.1MW624725.112 August 202018 February 20203.27
22,199GTSMMT58141331 October 20201.63
22,343GCSMMT655742.123 June 20201.63
22,444CTSMMT655742.1MT664107.123 June 202025 June 20203.27
22,501TCSMMT876598.112 August 20201.63
23,029CTSMMT664106.1MT664109.125 June 202025 June 20203.27
23,095AGSMMT664105.1MT664175.125 June 202025 June 20203.27
23,101TGSMMT648676.122 June 20201.63
23,202CTSMMT581410.131 October 20201.63
23,230CTSMMT581418.131 October 20201.63
23,268TGSMMT581418.131 October 20201.63
*23402AGSMAll from BD except MT664107.1October 202098.36
*23403AGSMAll sequencesOctober 2020100
23,952TGSMMT581419.131 October 20201.63
23,586AGSMMT731933.108 July 20201.63
23,587GTSMMT648676.122 June 20201.63
23,599TGSMMT876571.112 August 20201.63
23,608GTSMMT657271.123 June 20201.63
23,934CTSMMT655745.123 June 20201.63
23,957GASMMT581419.131 October 20201.63
24,181CTSMMT876555.112 August 20201.63
24,383AGSMMT876607.112 August 20201.63
25,494GTSMMT601281.112 June 20201.63
25,563GTSMMT601281.112 June 20201.63
25,597TCSMMT655750.123 June 20201.63
25,615AGSMMT657271.123 June 20201.63
25,644GTSMMT731935.18 July 20201.63
25,713TCSMMT876548.112 August 20201.63
25,883GTSMMT876554.112 August 20201.63
25,906GTSMMT581423.1MT581422.131 October 20203.27
26,058CTSMMT876556.112 August 20201.63
26,302TCSMMT666099.125 June 20201.63
26,735CTSMMT601281.1MT664107.112 June 202025 June 20201.63
26,895CTSMMT731937.108 July 20201.63
27,199CTSMMT876606.112 August 20201.63
27,316ATSMMT601281.112 June 20201.63
27,389CTSMMT664109.125 June 20201.63
27,518–19GCTTNSMMT666099.125 June 20201.63
27,675ACSMMT876554.112 August 20201.63
27,944CTSMMT607252.115 June 20201.63
27,945CGSMMT581414.131 October 20201.63
28,008AGSMMT876599.112 August 20201.63
28,085GASMMT655742.123 June 20201.63
28,115CTSMMT664107.125 June 20201.63
28,304AGSMMT876556.112 August 20201.63
28,321GTSMMT601281.112 June 20201.63
28,435CTSMMT731937.108 July 20201.63
28,521AGSMMT667351.125 June 20201.63
28,687CTSMMT581411.131 October 20201.63
28,690GASMMT601287.112 June 20201.63
28,831CTSMMT876571.112 August 20201.63
28,854CTSMMT601281.1MT664107.112 June 202025 June 20203.27
**28,881–83GGGAAGNSMAll except for Ref seq,MT601281.1and MT664107.1Sequences submitted in 202196.72
28,888TCSMMT876546.112 August 20201.63
28,893CTSMMT581415.131 October 20201.63
28,903GTSMMT581415.131 October 20201.63
28,960GTSMMT731935.18 July 20201.63
29,081GTSMMT664107.125 June 20201.63
29,085CTSMMT664107.125 June 20201.63
29,218CTSMMT655744.123 June 20201.63
29,260GTSMMT657958.123 June 20201.63
29,296CTSMMT876554.112 August 20201.63
29,403AGSMMT581423.1MT581422.1MT581420.131 October 20204.91
29,431GTSMMT601281.112 June 20201.63
29,614CTSMMT664107.125 June 20201.63
29,661TCSMMT876555.112 August 20201.63
29,666CTSMMT876527.112 August 20201.63
29,688GTSMMT876571.112 August 20201.63
29,736GTSMMT657958.123 June 20201.63
29,741CTSMMT876547.1MT581414.112 August 202031 October 20203.27
29,447GTSMMT655742.123 June 20201.63
29,753TCSMMT876546.112 August 20201.63
29,785TCSMMT648676.122 June 20201.63

Here, SM = Synonymous Mutation; NSM = Non-synonymous Mutation. * showing the nucleotide positions of commonly found SNPs whereas ** showing commonly found triple base (block) mutations.

Details of mutations identified in the SARS-CoV-2 whole genome sequences submitted from Bangladesh from June 2020 to December 2021. Here, SM = Synonymous Mutation; NSM = Non-synonymous Mutation. * showing the nucleotide positions of commonly found SNPs whereas ** showing commonly found triple base (block) mutations. Alignment of multiple sequences was performed with the submitted whole genome sequences of SARS-CoV-2 from Bangladesh using MEGA X software. The whole genome sequence of Wuhan variant was used as the reference (Accession: NC_045512.2) sequence. At first, polymorphism along with conserved regions was counted from the aligned sequences. Then the mutation position and the specific mutations subtypes were recorded. The frequency of the mutations occurring at the analyzed genomes were calculated and presented in Table 1. We further predicted the structure of the N-protein with or without the mutations, using the SWISS-MODEL tool. At first, the Template for building the model was searched with BLAST and HHBlits against the SWISS-MODEL template library (SMTL, last update: 2020-12-30, last included PDB release: 2020-12-25). The target sequence was searched with BLAST against the primary amino acid sequence contained in the SMTL. A total of 67 templates were found. For each identified template, the template's quality was predicted from features of the target-template alignment. The templates with the highest quality were selected for model building. Models were built based on the target-template alignment using ProMod3 of the SWISS-MODEL. The global and per-residue model quality was assessed using the QMEAN scoring function. Further, the I-Mutant2.0: tool was applied for predicting the stability changes of the N-protein upon mutations (Capriotti et al., 2005). The phylogenetic analysis and the difference between the nonsynonymous and synonymous distances (dN-dS) per site from averaging over all sequence pairs of each gene were calculated using the MEGA X. The dN-dS analyses were conducted using the Nei-Gojobori model. The genetic relatedness of the Corona virus strains of Bangladesh was estimated with other variants by using the Neighbor-Joining method (Saitou and Nei, 1987). The bootstrap consensus tree inferred from 1000 replicates (Felsenstein, 1985) is taken to represent the evolutionary history of the taxa analyzed (Felsenstein, 1985). The evolutionary distances were computed using the maximum composite likelihood method (Felsenstein, 1985). This analysis involved 71 and 45 genome sequences among which majority were from Bangladesh and the rest of the sequences were of different variants. Branches corresponding to partitions reproduced in less than 50% bootstrap replicates are collapsed. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (1000 replicates) is shown next to the branches (Felsenstein, 1985). The evolutionary distances were computed using the Maximum Composite Likelihood method (Tamura et al., 2004) and are in the units of the number of base substitutions per site. This analysis involved 45 nucleotide sequences. All ambiguous positions were removed for each sequence pair (pairwise deletion option). There were a total of 29,677 positions in the final dataset. Evolutionary analyses were conducted in MEGA X (Kumar et al., 2018).

Result and discussion

After analyzing all 78 complete genome sequences of SARS-CoV-2 submitted from Bangladesh, a bloc of tri-nucleotide of GGG>AAC (triple base mutation) was most commonly observed in the 28,881–28,883 location of the genome as missense in nature (non-synonymous) (Table 1). However, other mutations in the genome were found as single nucleotide polymorphism (SNPs), among them some were also common but synonymous mutations, such as: 241:C>T in the 5′UTR of ORF1A, 3037: C>T in ‘NSP3’ and 14,408: C>T in ORF6 (Table 1). The A>G mutations located in the Spike Glycoprotein of the virus at positions 23,402 and 23,403 were also very frequent (98.36% and 100% respectively, Table 1). However, these were synonymous mutations with no structural implications. Phylogenetic analysis of the whole genome sequences of the 71 SARS-CoV-2 sequences (61 Bangladeshi, 1 Wuhan and 9 most common variants) showed that the strains isolated from Bangladesh were more closely related to Wuhan variant. The tree generated two main clusters, cluster 1 and cluster 2. Cluster 1 was comprised of the variants emerged later viz. Gamma, Iota, Mu, Kappa, Beta, Delta, Alpha, Eta, Lambda variants. All the analyzed sequences from Bangladesh were in cluster 2 along with the variant from Wuhan. Cluster 2 formed five sub-clusters: sub-cluster 1 (SC1), SC2, SC3, SC4 and SC5. Since the strains were also labeled with their times, it could be speculated that the minimum mutations occurred during the month of October 2020 compared to the month of June, July and August (Fig. 2A).
Fig. 2

Phylogenetic analysis: the genetic relatedness of corona virus strains of Bangladesh with other variants was estimated by using the Neighbor-Joining method using MEGA X.

The phylogenetic tree revealed that during the period of June 2020 to October 2020, the causative strains of infection in Bangladesh were very similar to the Wuhan variant (Fig. 2A). The sequences from the year 2020 and 2021 resulted in two distinct clusters- C1 and C2 (Fig. 2B). Cluster 2 produced two separate sub-clusters C2a and C2b.

Phylogenetic analysis: the genetic relatedness of corona virus strains of Bangladesh with other variants was estimated by using the Neighbor-Joining method using MEGA X. The phylogenetic tree revealed that during the period of June 2020 to October 2020, the causative strains of infection in Bangladesh were very similar to the Wuhan variant (Fig. 2A). The sequences from the year 2020 and 2021 resulted in two distinct clusters- C1 and C2 (Fig. 2B). Cluster 2 produced two separate sub-clusters C2a and C2b. The tree generated from the estimation of genetic relatedness among the initially prevailing strains revealed that the causative strains of infection in Bangladesh were very similar to the Wuhan variant during the period of June 2020 to October 2020 (Fig. 2A). While considering the sequences of both 2020 and 2021 (representative 2 sequences from each month), two distinct clusters were found: cluster 1 (C1) and cluster 2 (C2) (Fig. 2B). One local strain (MT655746.1) of June 2020 belonging to the cluster 1, seemed to prevail up to January 2021 and its further circulation was not observed whereas another local strain (MT664175.1) of June 2020 belonging to the cluster 2a, seemed to circulate almost for one year, up to May 2021. Again, a strain similar to MT581415.1 (October 2020) was observed in September 2021. Cluster 2 produced two separate sub-clusters C2a and C2b. Sub-cluster C2a contains the sequences of June 2020 to October 2020 and a single sequence from the month of May 2021 which indicates that this particular strain of 2020 continued to circulate till mid of 2021. Eight (8) months' representative strains of 2021 were observed in the Sub-cluster C2b where different variants that occurred worldwide were also found. Sequence similarity was observed among three (3) strains from June 2020, one (1) strain from July 2021, and the variants that occurred worldwide. It could be possible that the strain prevailing in the month of June 2020 continued to circulate till July 2021. Another interesting observation was the prevalence of strains similar to the omicron variant during the period of Jan 2021 to April 2021 in Bangladesh although the variant was announced in the month of November 2021. This strain could be assumed to be introduced in the month of January 2021 but was no more detected after April 2021. Moreover, the strains of June 2020 could be classified into three types which continued to circulate till January, May and July 2021. In this current study we especially focused on the most prevalent non-synonymous mutation (28,881–28,883: GGG>AAC) in order to analyze the impact on the pathogenicity of SARS-CoV-2. According to the NCBI reference genome (Wuhan), 28,881–28,883: GGG>AAC bloc results in two amino acids (203–204: RG>KR) changes in the nucleocapsid (N) protein of the SARS-CoV-2 (Fig. 3 A, B). Garvin et al. (2020) and Dey et al. (2021) also noticed similar block mutations in N protein of SARS-CoV-2 around the globe.
Fig. 3

Analysis of the common block mutation: A. Representative image showing GGG>AAC (28881–28,883) mutation in the SARS-CoV-2 Bangladesh variants compared with Wuhan variant (first row) as revealed upon the alignment analyzed by clustalW. The block mark indicates the site of the triple base mutation. B. The mutation confers two amino acid changes (203–204: RG>KR) in the SR rich motif of N protein. NTD and CTD represent N terminal domain and C terminal domain respectively. The wild type (Wuhan) N protein with intact S-R dipeptide and the mutated (AAC mutant) N protein were intercalated with the insertion of lysine between S and R amino acids. Bars on the top indicate the wild type and mutated amino acids respectively. The bottom bars indicate the insertion of lysine between S and R amino acids. C. Predicted structure of N protein from the GGG>AAC mutated sequence (GenBank accession: MT876546.1) using SWISS-MODEL tool.

Analysis of the common block mutation: A. Representative image showing GGG>AAC (28881–28,883) mutation in the SARS-CoV-2 Bangladesh variants compared with Wuhan variant (first row) as revealed upon the alignment analyzed by clustalW. The block mark indicates the site of the triple base mutation. B. The mutation confers two amino acid changes (203–204: RG>KR) in the SR rich motif of N protein. NTD and CTD represent N terminal domain and C terminal domain respectively. The wild type (Wuhan) N protein with intact S-R dipeptide and the mutated (AAC mutant) N protein were intercalated with the insertion of lysine between S and R amino acids. Bars on the top indicate the wild type and mutated amino acids respectively. The bottom bars indicate the insertion of lysine between S and R amino acids. C. Predicted structure of N protein from the GGG>AAC mutated sequence (GenBank accession: MT876546.1) using SWISS-MODEL tool. We performed dN-dS analysis for estimating the non-synonymous to synonymous substitution ratio (dN/dS) for the N-, S- and the NSP 3 genes. Our results for dN-dS analysis of the N-protein of the 61 analyzed sequences showed that the overall dN-dS p-value for all the three genes to be <1.00, indicating a constraint selection (amino acid changes disfavored, Table 2 ).
Table 2

Predicting Non-synonymous to synonymous substitution rate of the common mutations.

Target gene harboring common mutationsdN-dS analysis for overall mean distance
p-ValueStd error
N0.000.00
S0.000.00
NSP30.000.00
Predicting Non-synonymous to synonymous substitution rate of the common mutations. While looking at the surrounding sequence of these amino acids (Fig. 3B), it appears that the mutation discontinues a serine-arginine (S-R) dipeptide by introducing a lysine in-between them which is a basic and polar hydrophilic charged (+) amino acid. Basically, arginine provides the protein structure with more stability than lysine (Sokalingam et al., 2012). So, the incorporation of lysine in the motif could have impacts on the overall distinctive properties of the protein as reported before (Tylor et al., 2009). Especially, the serine-arginine dipeptide disordering may hamper the phosphorylation of the SR-rich domain. This phosphorylation event is critical for cellular localization and regulation of the N protein synthesis (Maitra et al., 2020). Notably the GSK3 (glycogen synthase kinase 3) phosphorylation site at Ser202 and a CDK (cyclin dependent kinase) phosphorylation site at Ser206 are in the vicinity of our identified block mutation. We thought that this interaction would contribute to reduction of conformational entropy and might affect protein structure. In this study, the change of N protein stability upon mutations at the amino acid positions 203–204 (RG>KR) was predicted using I-Mutant 2.0 tool and found that the incorporation of Lysine in 203 amino acid position predicted a reduction of entropy (∆∆G = −2.26) and thus affecting its stability (Table 3 ). The structure of the protein with the block mutation was predicted and compared with the reference sequence (Wuhan variant) by SWISS-MODEL, a protein modeling tool (Fig. 3C). However, no observable difference was found in the block mutation area of the predicted N protein (GGG>AAC). On the other hand, Maitra et al., 2020 found that three miRNAs binding in the mutation site 28,881–3 can regulate the mutant pathogenicity. Taken together with these data, we suggest that the block mutation may regulate the stability and function of N protein rather than the structure of the protein.
Table 3

Prediction of protein stability changes upon mutations using I-Mutant v2.0 (Capriotti et al., 2005).

Position of the amino acid in N-proteinWTMutantDDGpHT
203RK−2.267.025
204GR0.007.025

WT: amino acid in wild-type protein, mutant: new amino acid after mutation, DDG: DG(NewProtein)-DG(WildType) in Kcal/mol, DDG < 0: decrease stability, DDG > 0: increase stability, T: temperature in Celsius degrees, pH: -log[H+].

Prediction of protein stability changes upon mutations using I-Mutant v2.0 (Capriotti et al., 2005). WT: amino acid in wild-type protein, mutant: new amino acid after mutation, DDG: DG(NewProtein)-DG(WildType) in Kcal/mol, DDG < 0: decrease stability, DDG > 0: increase stability, T: temperature in Celsius degrees, pH: -log[H+]. Kang et al. (2020) reported that SARS-CoV-2 N protein which is 419 amino acid (aa) long, consists of three highly conserved parts: an N-terminal domain (NTD) that binds RNA, a C-terminal domain (CTD) for dimerization of the protein, and a linker region called SR-rich (serine-arginine) motif. The SR-rich motif is located in the middle region covering amino acids 177–207 in between NTD and CTD (Surjit and Lal, 2010). Our observed common triple base mutation results in amino acid changes of R203K and G204R which occurred in the SR-rich motif (Fig. 3B). Tylor et al. (2009) revealed that SR rich motif is important for viral replication, N protein multimerization and RNA splicing. So, any disruption in the motif could affect the overall structure and function of the N protein. Tylor et al. (2009) also found that the mutated SR rich motif resulted in an extreme reduction of viral infectivity and brought about a remarkable deficiency in the viral replication capacity. Mutational analysis also showed that the N-protein of the SARS-CoVs suppresses the activity of the cyclin–CDK complex function leading to hypophosphorylation followed by cell cycle arrest (Surjit and Lal, 2010). Again, the interaction between hnRNPA1 and N-protein through the middle region (aa 161–210) of N-protein was considered to regulate the viral RNA synthesis (Luo et al., 2004). Mutation in this region might hamper the viral RNA synthesis. N-protein also interacts with B23, a phosphoprotein in the nucleus, through aa 175–210 i.e. the SR-rich motif which may contribute a significant role in centrosome duplication (Zeng et al., 2008). In addition, N-protein governs the upregulation of Cyclooxygenase 2 (COX2), the inflammatory agent. It was further revealed that the N-protein binds to the NFkB response element present in the COX2 promoter region through a 68 aa residue binding domain (aa 136–204) and activates its transcription (Yan et al., 2006). Interestingly the common block mutation (R203K and G204R) in some of the genomes from Bangladesh occurred in the above-mentioned binding domain. So, it could be presumed that the triple base block mutation might have affected the COX2 transcriptional upregulation leading to reduced inflammation. However, COX2 expression was not investigated in this study, further experiments are required for the prediction.

Conclusion

The GGG>AAC non-synonymous mutation remained most frequent in the Bangladeshi population during the study period. We predicted that the mutation is responsible for the reduced stability of the N protein due to the intercalation of the amino acid. However, due to the lack of experimental evidence, many questions regarding the influence of these mutations still remain elusive.

Funding

This research work was funded by the Centennial research grant, University of Dhaka, Bangladesh.

CRediT authorship contribution statement

SH was involved in Conceptualization; Data curation; Formal analysis; original draft writing. MAS was involved in the Data curation; Formal analysis, Methodology, original draft writing, review and editing. MIH was involved with Data Curation, Formal analysis, Methodology, Validation; Visualization and revised manuscript review and editing. MJK was involved in Data curation; Formal analysis; original draft writing, review & editing. AA was involved in Data curation; Formal analysis; original draft writing, review & editing. AMK was involved in Conceptualization; Methodology, Data curation; Formal analysis; original draft writing, revised manuscript review and editing, overall Supervision; Validation; Visualization.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
  32 in total

1.  The neighbor-joining method: a new method for reconstructing phylogenetic trees.

Authors:  N Saitou; M Nei
Journal:  Mol Biol Evol       Date:  1987-07       Impact factor: 16.240

Review 2.  Recent human genetic errors of innate immunity leading to increased susceptibility to infection.

Authors:  Leen Moens; Isabelle Meyts
Journal:  Curr Opin Immunol       Date:  2020-01-11       Impact factor: 7.486

3.  The SR-rich motif in SARS-CoV nucleocapsid protein is important for virus replication.

Authors:  Shaun Tylor; Anton Andonov; Todd Cutts; Jingxin Cao; Elsey Grudesky; Gary Van Domselaar; Xuguang Li; Runtao He
Journal:  Can J Microbiol       Date:  2009-03       Impact factor: 2.419

4.  Crystal structure of SARS-CoV-2 nucleocapsid protein RNA binding domain reveals potential unique drug targeting sites.

Authors:  Sisi Kang; Mei Yang; Zhongsi Hong; Liping Zhang; Zhaoxia Huang; Xiaoxue Chen; Suhua He; Ziliang Zhou; Zhechong Zhou; Qiuyue Chen; Yan Yan; Changsheng Zhang; Hong Shan; Shoudeng Chen
Journal:  Acta Pharm Sin B       Date:  2020-04-20       Impact factor: 11.413

5.  A multi-centre, cross-sectional study on coronavirus disease 2019 in Bangladesh: clinical epidemiology and short-term outcomes in recovered individuals.

Authors:  A Mannan; H M H Mehedi; N U H A Chy; Md O Qayum; F Akter; M A Rob; P Biswas; S Hossain; M Ibn Ayub
Journal:  New Microbes New Infect       Date:  2021-01-08

6.  Evidence of escape of SARS-CoV-2 variant B.1.351 from natural and vaccine-induced sera.

Authors:  Daming Zhou; Wanwisa Dejnirattisai; Piyada Supasa; Chang Liu; Alexander J Mentzer; Helen M Ginn; Yuguang Zhao; Helen M E Duyvesteyn; Aekkachai Tuekprakhon; Rungtiwa Nutalai; Beibei Wang; Guido C Paesen; Cesar Lopez-Camacho; Jose Slon-Campos; Bassam Hallis; Naomi Coombes; Kevin Bewley; Sue Charlton; Thomas S Walter; Donal Skelly; Sheila F Lumley; Christina Dold; Robert Levin; Tao Dong; Andrew J Pollard; Julian C Knight; Derrick Crook; Teresa Lambe; Elizabeth Clutterbuck; Sagida Bibi; Amy Flaxman; Mustapha Bittaye; Sandra Belij-Rammerstorfer; Sarah Gilbert; William James; Miles W Carroll; Paul Klenerman; Eleanor Barnes; Susanna J Dunachie; Elizabeth E Fry; Juthathip Mongkolsapaya; Jingshan Ren; David I Stuart; Gavin R Screaton
Journal:  Cell       Date:  2021-02-23       Impact factor: 41.582

7.  Resurgence of COVID-19 in Manaus, Brazil, despite high seroprevalence.

Authors:  Ester C Sabino; Lewis F Buss; Maria P S Carvalho; Carlos A Prete; Myuki A E Crispim; Nelson A Fraiji; Rafael H M Pereira; Kris V Parag; Pedro da Silva Peixoto; Moritz U G Kraemer; Marcio K Oikawa; Tassila Salomon; Zulma M Cucunuba; Márcia C Castro; Andreza Aruska de Souza Santos; Vítor H Nascimento; Henrique S Pereira; Neil M Ferguson; Oliver G Pybus; Adam Kucharski; Michael P Busch; Christopher Dye; Nuno R Faria
Journal:  Lancet       Date:  2021-01-27       Impact factor: 79.321

8.  Potentially adaptive SARS-CoV-2 mutations discovered with novel spatiotemporal and explainable AI models.

Authors:  Michael R Garvin; Erica T Prates; Mirko Pavicic; Piet Jones; B Kirtley Amos; Armin Geiger; Manesh B Shah; Jared Streich; Joao Gabriel Felipe Machado Gazolla; David Kainer; Ashley Cliff; Jonathon Romero; Nathan Keith; James B Brown; Daniel Jacobson
Journal:  Genome Biol       Date:  2020-12-23       Impact factor: 13.583

Review 9.  SARS-CoV-2 variants, spike mutations and immune escape.

Authors:  William T Harvey; Alessandro M Carabelli; Ben Jackson; Ravindra K Gupta; Emma C Thomson; Ewan M Harrison; Catherine Ludden; Richard Reeve; Andrew Rambaut; Sharon J Peacock; David L Robertson
Journal:  Nat Rev Microbiol       Date:  2021-06-01       Impact factor: 78.297

Review 10.  Insights into SARS-CoV-2 genome, structure, evolution, pathogenesis and therapies: Structural genomics approach.

Authors:  Ahmad Abu Turab Naqvi; Kisa Fatima; Taj Mohammad; Urooj Fatima; Indrakant K Singh; Archana Singh; Shaikh Muhammad Atif; Gururao Hariprasad; Gulam Mustafa Hasan; Md Imtaiyaz Hassan
Journal:  Biochim Biophys Acta Mol Basis Dis       Date:  2020-06-13       Impact factor: 5.187

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.