Literature DB >> 33737154

Global and local mutations in Bangladeshi SARS-CoV-2 genomes.

Md Mahbub Hasan¹, Rasel Das², Md Rasheduzzaman³, Md Hamed Hussain⁴, Nazmul Hasan Muzahid⁴, Asma Salauddin³, Meheadi Hasan Rumi³, S M Mahbubur Rashid⁵, Amam Zonaed Siddiki⁶, Adnan Mannan⁷.

Abstract

Coronavirus Disease 2019 (COVID-19) warrants comprehensive investigations of publicly available Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) genomes to gain new insight about their epidemiology, mutations, and pathogenesis. Nearly 0.4 million mutations have been identified so far among the ∼60,000 SARS-CoV-2 genomic sequences. In this study, we compared a total of 371 SARS-CoV-2 published whole genomes reported from different parts of Bangladesh with 467 sequences reported globally to understand the origin of viruses, possible patterns of mutations, and availability of unique mutations. Phylogenetic analyses indicated that SARS-CoV-2 viruses might have transmitted through infected travelers from European countries, and the GR clade was found as predominant in Bangladesh. Our analyses revealed 4604 mutations at the RNA level including 2862 missense mutations, 1192 synonymous mutations, 25 insertions and deletions and 525 other types of mutation. In line with the global trend, D614G mutation in spike glycoprotein was predominantly high (98 %) in Bangladeshi isolates. Interestingly, we found the average number of mutations in ORF1ab, S, ORF3a, M, and N were significantly higher (p < 0.001) for sequences containing the G614 variant compared to those having D614. Previously reported frequent mutations, such as R203K, D614G, G204R, P4715L and I300F at protein levels were also prevalent in Bangladeshi isolates. Additionally, 34 unique amino acid changes were revealed and categorized as originating from different cities. These analyses may increase our understanding of variations in SARS-CoV-2 virus genomes, circulating in Bangladesh and elsewhere.

Entities: CellLine Chemical Disease Gene Mutation Species

Keywords: Bangladesh; COVID-19; Mutation; SARS-CoV-2; Whole-genome sequence

Year: 2021 PMID： 33737154 PMCID： PMC7959702 DOI： 10.1016/j.virusres.2021.198390

Source DB: PubMed Journal: Virus Res ISSN： 0168-1702 Impact factor: 3.303

Introduction

Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) has been identified as the etiological agent of the disease called Coronavirus Disease 2019 (COVID-19). To elucidate the viral pathogenesis, modern genomic tools are highly crucial and have been employed by researchers around the world. An increasing number of whole-genome datasets of the SARS-CoV-2 virus are now being submitted in publicly accessible databases from different parts of the globe everyday. It is high time to analyze the variations among those sequences which will help future strategic efforts for its preventive measures, such as vaccine design and development of potential novel therapeutics. SARS-CoV-2 consists of positive-sense single-stranded RNA with a genome size ranging from ∼29.8 to 29.9 Kb (Khailany et al., 2020; Mannan et al., 2021) It contains a variable number (9–11) of open-reading frames (ORFs) and the first ORF covers almost two-thirds of the whole genome (Khailany et al., 2020; Kim et al., 2020). The genome encodes 4 structural, 16 non-structural proteins (nsp), and 6 accessory proteins (Astuti, 2020; Chen et al., 2020; O’Meara et al., 2020; Yoshimoto, 2020). According to a recent study on 48,635 SARS-CoV-2 genome sequences, a total of 353,341 mutations have been observed globally compared with that of the reference genome from Wuhan (Accession ID MN908947). Of them, sequences from India, Congo, Bangladesh, and Kazakhstan were reported to show significantly high numbers of mutations per sample compared with the global average (Mercatelli and Giorgi, 2020). Out of these mutations, D614G mutation (causing aspartate to glycine at 614 in S protein) is reported to be the most prevalent mutations in Europe, Oceania, South America, Africa, Middle East, and India (Gong et al., 2020; Raghav et al., 2020; Zhang et al., 2020). Several laboratory experiment-based investigations have reported that the level of angiotensin-converting enzyme 2 (ACE2) expression was distinctly higher by the retroviruses pseudo-typed with G614 compared with that of D614 in cell culture experiments (Dumonteil and Herrera, 2020; Zhang et al., 2020). Another reported mutation of ORF1ab is P4715 L linked with D614G, which is linked with higher fatality rates in 28 countries and 17 states of the United States (Mercatelli and Giorgi, 2020; Toyoshima et al., 2020). Three other mutations namely C14408T (ORF1b), C241T (5′ UTR), C3037T (ORF1a) are reported to be common and coexisting in the same genome, while G11083T has been found mostly in Asian countries (Toyoshima et al., 2020). In Bangladesh, until November 2020, nearly 465,000 people have been infected and 6644 people have died due to COVID-19 (https://iedcr.gov.bd/). During this period, a total of 371 complete and high coverage SARS-CoV-2 whole-genome sequences were submitted in the GISAID database (https://www.gisaid.org). Analyses of these sequences are sparsely reported in the literature. As for instance, one study reported analyses of 60 SARS-CoV-2 genome sequences from Bangladesh and compared them with 6 Southeast Asian countries (Islam et al., 2020). Another group analyzed only 14 isolates and found 42 mutations (Parvez et al., 2020). A separate study identified only 9 variants where unique mutations (UMs) were found prevalent mostly in ORF1ab (Hasan et al., 2020). Another study involved 64 SARS-CoV-2 whole-genome sequences and identified the presence of 180 mutations in the coding regions of the viruses, and mutations at nsp2 were the most prevalent (Ahmed et al., 2020). Due to the small numbers of genome sequences analyzed, most of these findings were not conclusive and truly representative of the whole country. Further comprehensive analyses are therefore necessary to better understand the circulating virus in this highly populated country. The present study was based on SARS-CoV-2 genome sequences submitted in the global publicly accessible GISAID database (GISAID, 2020). According to the GISAID database, clades are characterized by genomic specificity and classified by calculating genome distances of the variants through phylogenetic cluster analysis. Currently, all of the SARS-CoV-2 variants are classified into 7 clades- S, L, V, G, GH, GR, and GV. We used this clade-based classification in this study. We compared a total of 371 whole-genome sequences obtained from isolates collected from Bangladeshi COVID-19 patients with that of 467 global sequences that are from Asia, Africa, Europe, Australia and North American countries using time-resolved phylogenetic analysis. We studied the frequency of mutations in different ORFs of SARS-CoV-2 genomes, having the mutation of D614G. In addition, we explored the origin and distribution of SARS-CoV-2 in Bangladesh along with the circulating variants present in different parts of the country. This was achieved through identifying the pattern of mutations including the UMs. These pave the way to increase our understanding of the distribution of different variants of SARS-CoV-2 virus in different regions and associated mutation events.

Materials and methods

Dataset

A total of 35,723 complete high-coverage genomic RNA sequences of SARS-CoV-2 were submitted to GISAID until June 30, 2020 when the study was initiated. From these downloaded sequences, a custom python script was used to retrieve unique sequences. The same script also removed any sequence, containing “N” and other ambiguous IUPAC codes (Korber et al., 2020a). A total of 8723 complete genomic sequences were returned upon running the script as per the aforementioned logic. To select representative sequences from curated 8723 global sequences and make a comparison against sequences from Bangladesh, priorities were given to those countries that had a higher number of infections in each continent (https://www.worldometers.info/coronavirus/). We selected these sequences considering that each continent must be represented by at least one sequence from each GISAID clade. The number of sequences selected from a country was based on the total number of unique sequences retrieved. This resulted in a dataset of 839 sequences having 467 unique representative sequences from 41 countries (see Supplementary File Table S1).

Phylogenetic analysis

From Bangladesh, 518 whole-genome RNA sequences of SARS-CoV-2 were uploaded to GISAID until November 30, 2020. Only high coverage complete sequences (n = 371) were considered for analysis in this study. All of these 371 sequences were retrieved and aligned with the previously selected 467 representative sequences along with that of Wuhan-Hu-1 (Accession ID MN908947) as a reference sequence (Wu et al., 2020). A list of accession numbers of all sequences, used in this study, is provided in Supplementary File Table S2. To ensure comparability, the flanks of all the sequences were truncated to the consensus range from 56 to 29,797 (Forster et al., 2020), with nucleotide position numbering according to the Wuhan-1 reference sequence, prior to alignment. Multiple Sequence Alignment (MSA) and phylogenetic tree construction were carried out using Molecular Evolutionary Genetic Analysis X (MEGA X) software (version 10.1) (Kumar et al., 2018). The selected sequences were aligned using the MUSCLE software tool (Edgar, 2004). Later, a Neighbor-Joining (NJ) phylogenetic tree was constructed using the Tamura-Nei model (Saitou and Nei, 1987). Tree topology was assessed using a fast bootstrapping function with 1000 replicates. Tree visualization and annotations were performed in the Interactive Tree of Life (iTOL) v5 (Letunic and Bork, 2019).

Mutation analysis

The Genome Detective Coronavirus Typing Tool (version 1.13) was used for variant analyses at the RNA level of SARS-CoV-2 which is specially designed for this virus analysis (https://www.genomedetective.com/app/typingtool/cov/) (Cleemput et al., 2020). For investigating the UM at protein level among 371 genomic sequences from Bangladesh, we used a CoV server hosted by the GISAID server (https://www.gisaid.org/epiflu-applications/covsurver-mutations-app/). The server analyzed our dataset against all available genomic sequences of SARS-CoV-2 including the Wuhan reference sequence deposited on GISAID until November 30, 2020. The spatial map was created using layers downloaded from GeoDASH (The Bangladesh Geospatial Data Sharing Platform; https://geodash.gov.bd/) website on ArcGIS Desktop (Esri Inc., 109 Redlands, California, United States) licensed to King’s College London.

Statistical analysis

Descriptive and inferential statistics were used to analyze different mutations and their correlation with different categorical variables. For correlation, we used one-way ANOVA analysis of variance using SPSS Statistics 25 (IBM, Armonk, New York) licensed to King’s College London.

Results and discussion

Phylogenetic analysis of SARS-CoV-2 circulating in Bangladesh

To understand the SARS-CoV-2 viral transmission in Bangladesh, phylogenetic analysis was performed based on the selected 371 complete virus genomes from different regions of Bangladesh along with selected 467 globally submitted sequences from 41 countries of 6 continents as shown in Fig. 1 . This represents an overall clade distribution of all global sequences along with sequences from Bangladeshi isolates. Analysis of Fig. 1 depicts the GR clade, which is found predominant in Bangladesh. About 86 % of the sequences are grouped to this clade followed by G and GH with ∼6%, respectively. Similar clade distribution has been reported in SARS-CoV-2 isolates, originating from European countries (Hamed et al., 2020). We also compared the sequence data among different regions of Bangladesh, looking at the district-wise distribution of clades as shown in Fig. 2 . As can be seen in Fig. 2, the sequences from all districts largely clustered with that of GR clade except sequences from Chittagong.

Fig. 1

Fig. 2

District-wise distribution of GISAID clades among circulating SARS-CoV-2 strains in Bangladesh. Please be noted that district information of 72 sequences out of 371 deposited from Bangladesh is not available from respective metadata. Districts from where samples were collected for sequencing are shaded pink and the number of the sequences from each district is proportional to the circumference of the piecharts. The spatial map was created using layers downloaded from GeoDASH (The Bangladesh Geospatial Data Sharing Platform; https://geodash.gov.bd/) website on ArcGIS Desktop (Esri Inc.,109 Redlands, California, United States) licensed to King’s College London.

Phylogenetic analysis of 371 Bangladeshi SARS-CoV-2 genomes with 467 representative sequences from 41 different countries worldwide. Sequence having the original D614 in Spike glycoprotein mutation is ‘shaded cyan’ while G614 in yellow. Outside the main tree and labels, there are three different annotation panels sequentially to show the sequences from Bangladesh (Orange Star), followed by the burden of mutation in different ORFs of SARS-CoV-2 as ‘heatmap’ and finally, GISAID clades as a ‘color strip’. The reference sequence, Wuhan-Hu-1 (Accession number MN908947), is indicated with . District-wise distribution of GISAID clades among circulating SARS-CoV-2 strains in Bangladesh. Please be noted that district information of 72 sequences out of 371 deposited from Bangladesh is not available from respective metadata. Districts from where samples were collected for sequencing are shaded pink and the number of the sequences from each district is proportional to the circumference of the piecharts. The spatial map was created using layers downloaded from GeoDASH (The Bangladesh Geospatial Data Sharing Platform; https://geodash.gov.bd/) website on ArcGIS Desktop (Esri Inc.,109 Redlands, California, United States) licensed to King’s College London. We also observed that the major clade distribution of SARS-CoV-2 isolates from Chittagong is consists of both GR and GH (Fig. 2). The GH clade was predominantly found in Saudi Arabia (Mercatelli and Giorgi, 2020), which is clustered together with Chittagong as shown in Fig. 1. Based on this evidence, it is highly likely that the introduction of GH clade in Bangladesh could be of Middle-Eastern origin.

Mutation analysis in Bangladeshi SARS-CoV-2 genomic sequences

Prevalence of global common mutations

Our analyses revealed a total of 4604 mutations observed among 371 Bangladeshi SARS-CoV-2 genomic sequences that include 2862 missense, 1192 synonymous, 25 insertion/deletions and 525 other mutations (Ambiguous mutation and nucleotide mutation in UTR region) (See Table 1 ). Among the identified mutations, 28,881 G > A & 28,882 G > A (R203K; N protein), 23,403 A > G (D614G; S glycoprotein), 28,883 G > C (G204R; N protein), 14,408 C > T (P4715 L; nsp12), and 1163 A > T (I300 F; nsp2) are the most frequently occurring common mutations found in Bangladesh with a frequency of 650, 365, 325, 304 and 243, respectively (Fig. 3 ). Nucleotide mutations of 28,881 G > A and 28,882 G > A resulted in R203K due to codon degeneracy. Notably, no particular mutations occurred at any specific time rather they have been observed over the whole period of disease outbreak. However, this explanation is predicted and needs to be investigated in detail in the future. The RNA-dependent RNA polymerase (i.e. nsp12) is essential for replication and transcription of the viral RNA genome. The observed mutation P4715L in nsp12 was also found in most of the US states (28 out of 31 states from where the sequences were deposited). The same mutation was prevalent in European countries like Spain, France etc. This alteration could affect the pathogenesis triggered by antibody escape variants with epitope loss (Banerjee et al., 2020; Gupta and Mandal, 2020). In a separate study, it has been reported that the G614 type might have originated either in Europe or China (Korber et al., 2020b). They also reported that the original Wuhan D614 form was also predominant in Asian samples. Meanwhile, the G614 form had clearly been established and started expanding in countries outside of China. We noticed that 98 % of genomes from Bangladesh have D614G mutation, which is also dominant in the world. However, the average number of mutations per ORF is varied among genomes containing D614 and G614 that we have studied (n = 838) as revealed by Table 2 . The average number of mutations per genome between D614 and G614 is 6.12 and 10.68, respectively. The average number of mutations in ORF1ab, S, ORF3a, M and N of genomes having mutated G614 (n = 626) are significantly higher (p ≤ 0.001) than those having the original D614 (n = 212) in S glycoprotein (See Table 2 and Fig. 1). Interestingly, the average mutation number is declined in ORF8, having G614 mutation (p < 0.001). This correlation indicates that the genomes containing D614G mutation also bear more mutations which is aligned with the various reports on the link between the transmission and pathogenesis of SARS-CoV-2 (Grubaugh et al., 2020; Korber et al., 2020a; Zhang et al., 2020), and this mutation increases cell entry and transduction due to resistance to proteolytic cleavage (Daniloski et al., 2021; Ozono et al., 2021). However, without further experimental evidence this is not conclusive. Meanwhile, R203K and G204R mutations in N protein were previously reported in Indian, Spanish, Italian, and French samples (Koyama et al., 2020; Maitra et al., 2020). These mutations are located in the site of the SR-rich region which has been reported to be intrinsically disordered (Chang et al., 2014). This region further incorporates a few phosphorylation sites (Surjit et al., 2005), including the GSK3 (Glycogen synthase kinase 3) phosphorylation at Ser202 and a CDK (Cyclin-dependent kinase) phosphorylation site at Ser206 which are located close to the position of this mutation. The ‘SRGTS’ (202–206) and ‘SPAR’ (206–209) sequence motifs are dependent on GSK3 and CDK phosphorylation motifs, respectively. Other variations 28,881 G > A and 28,882 G > A together convert polar to non-polar amino acid (R203K) and 28,883 G > C variation converts nonpolar to polar amino acid (G204R) (See Fig. 3).

Table 1

Number of common gene variants among 371 SARS-CoV-2 genomes from Bangladesh.

Genome segment	Missense mutation	Synonymous mutation	Insertion Deletion	Others	Total
ORF1ab	1141	898	1	4	2044
S	493	118	3	1	615
ORF3a	89	49	7	0	145
E	3	1	0	0	4
M	20	55	0	1	76
ORF6	7	2	0	0	9
ORF7a	11	7	0	0	18
ORF7b	7	1	0	0	8
ORF8	30	16	2	0	48
ORF10	10	5	0	0	15
N	1051	40	2	1	1094
Intergenic	0	0	2	13	15
5′ UTR	0	0	4	414	418
3' UTR	0	0	4	91	95
Total	2862	1192	25	525	4604

Fig. 3

Frequency of common mutations observed in Bangladesh over time. The x-axis of the plot shows the timeline between April 2020 and October 2020 whereas, the Y-axis depicts the frequency of mutations reported from the sequences of the same collection date.

Table 2

Correlation of the average number of mutations per genome among different genomic segments with D614 G mutation.

Genome segment	Average number of mutations per genome		Change (%)	p value
Genome segment	D614 (wild; n = 212)	G614 (n = 626)	Change (%)	p value
5′ UTR	0.34	1.12	70	<0.001
ORF1ab	3.20	4.75	33	<0.001
S	0.51	1.50	66	<0.001
ORF3a	0.40	0.41	2	0.794
E	0.00	0.01	100	0.501
M	0.07	0.17	59	<0.001
ORF6	0.01	0.03	67	0.134
ORF7a	0.02	0.04	50	0.175
ORF7b	0.01	0.01	0	0.585
ORF8	0.45	0.10	−350	<0.001
N	0.58	2.24	74	<0.001
ORF10	0.01	0.03	67	0.092
3' UTR	0.52	0.22	−136	<0.001
Total	6.12	10.68	43

Number of common gene variants among 371 SARS-CoV-2 genomes from Bangladesh. Frequency of common mutations observed in Bangladesh over time. The x-axis of the plot shows the timeline between April 2020 and October 2020 whereas, the Y-axis depicts the frequency of mutations reported from the sequences of the same collection date. Correlation of the average number of mutations per genome among different genomic segments with D614 G mutation.

Unique mutations (UMs) existing in Bangladeshi SARS-CoV-2 isolates

We observed 34 unique shifts from the different proteins of SARS-CoV-2 isolated from Bangladesh (Table 3 ). Surprisingly, most of these UMs were found to be in patients in a specific area with a few exceptions. For example, some UMs as shown in Table 3 are only found in Barisal, Chandpur, Chittagong, Dhaka, Jessore, Moulvibazar, Mymensingh, and Rangpur districts. Interestingly, some UMs are concurrently present in multiple cities. These changes in amino acids might have occurred due to rapid mutation events and/or recombination with existing CoVs in the human body (Hassan et al., 2020). The circulation of a high number of these UMs in different cities indicates the possible emergence of community transmission in the Bangladeshi population (Samyuktha and Kumar, 2020).

Table 3

Unique mutations in 371 SARS-CoV-2 genomes from Bangladesh.

UM	Gene/ORF	District	Remarks
UMs localized to specific district
E194Q	ns3	Barisal	Negatively charged to Glu neutral Gln
Q62E	ns7a	Barisal	Neutral Gln to negatively charged Glu
N377D	nsp2	Chandpur	Polar Asn to acidic Asp
G773A	nsp3	Chandpur	Same type of change
Y660F	Spike	Chandpur	Aromatic Tyr to to aromatic Phe
N11D	n	Chittagong	Polar Asn to acidic Asp
N39Y	ns6	Chittagong	Polar aliphatic Asn to aromatic Tyr
A602S	nsp3	Chittagong	Hydrophobic, and non-polar Ala to hydrophilic Ser
V56A	nsp1	Dhaka	Hydrophobic Val to simple, and non-polar Ala
D85E	nsp4	Dhaka	Both are same type of amino acid
N133B	nsp5	Dhaka	Ambigious
L22I	nsp6	Jessore	Both are same type of amino acids
K84T	nsp9	Jessore	Positively charged ε-amino group containing Lys to hydroxyl group containing Thr
T80I	ns8	Moulvibazar	Hydroxyl group containing Thr to hydrophobic Ile
E242A	ns3	Mymensingh	Acidic Glu to hydrophobic Ala
G42V	ns7a	Rangpur	Glycine convert to hydrophobic valine

UMs shared between districts
P38R	NS8	Pabna, Chandpur, Brahmanbaria	Phenylalanine convert to Arginine which contains positively charged group.
D179N	nsp16	Khulna, Rajshahi	Acidic Asp to polar aliphatic Asn
V469A	nsp2	Mymensingh, Barisal, Chandpur	Hydrophobic Val to another hydrophobic, non-polar Ala
L373M	nsp3	Brahmanbaria, Moulvibazar, Rajshahi	Hydrophobic side chain containing Leu to sulfur containing Met
V120L	nsp6	Habiganj, Pabna	Valine to hydrophobic side chain containing Leu

District level location is not given (NG) in the metadata/dataset
H145N	n	NG	Polar, basic His to polar Asn
P104R	ns3	NG	Phenylalanine convert to positively charged Arg
V121D	nsp1	NG	Hydrophobic Val to acidic Asp
V31E	nsp12	NG	Hydrophobic Val to negatively charged Glu
F217I	nsp14	NG	Aromatic Phe to hydrophobic Ile
D1774H	nsp3	NG	Acidic Asp to polar, basic His
E120K	nsp3	NG	Negatively charged Glu to positively charged ε-amino group containing Lys
D1108N	nsp3	NG	Acidic Asp to polar Asn
V843F	nsp3	NG	Hydrophobic Val to aromatic Phe
A889V	nsp3	NG	Hydrophobic, and non-polar Ala to another hydrophobic Val
G1691C	nsp3	NG	Simple Gly to sulfur-containing, acidic Cys
E42A	nsp4	NG	Negatively charged Glu to hydrophobic, non-polar Ala
I569S	Spike	NG	Hydrophobic side chain containing amino acid Ile to polar Ser

Unique mutations in 371 SARS-CoV-2 genomes from Bangladesh. We observed 2 UMs in nsp1 (See Table 3), but the location of these amino acids was not in the KH domain (K164 and H165 of nsp1), which binds with the 40S subunit of ribosome (Thoms et al., 2020). However, nsp1 acts as a primary virulence factor in SARS-CoV-2 infection, and mutation in this protein could affect the structure and functional properties, thereby altering its virulence properties. Similar to nsp1, 2 UMs were seen in nsp2, but their effects on host cells were merely reported in the literature. Since nsp2 interacts with the host proteins and disrupts the host cell survival signaling pathway (Bianchi et al., 2020), any mutation in nsp2 may play a crucial role in SARS-CoV-2 infections. In general, mutations of nsp3 are responsible for affecting the virus assembly and hence their replication. One would assume that this is due to the disruption of the replicase polyprotein processing into nsps. These nsps assemble with cellular membranes and facilitate virus replication. Among 34 UMs, 9 mutations were located in nsp-3. The UMs found in nsp3 might have some significant impacts in the viral pathogenesis. Firstly, the UMs (A889V and V843F) were found in the main domain of nsp3 that is important for processing endopeptidases from coronaviruses. Secondly, some UMs (e.g., G1691C, A602S and L373M) are found in the topological (cytoplasmic) domain. Thirdly, we observed one UM (L373M) in the ADP-ribose-1′-phosphatase (ADRP) or (Macro) domain of nsp3. It has been shown that mutation of the ADRP domains does not diminish virus replication in mice, but reduces the production of the cytokine IL-6, which is an important proinflammatory molecule (Mielech et al., 2015). However, we could not find any mutation in the active sites and zinc finger motif, attributing normal catalytic activity of nsp3. Overall, our opinion is that the papain‐like protease (PL-PRO) domain is important for the development of antiviral drugs and the actual role of this enzyme is to cleave of several nsps from the polyprotein pp1a and also the biogenesis of the SARS-CoV replicase complex is yet to be explored. It can be postulated that the proteins like nsp3, nsp4 and nsp6 through their transmembrane domains, are involved in the replicative and transcription complex (Coppée et al., 2020). In this study, we observed only two UMs in nsp4 and nsp6, respectively. Meanwhile, nsp5 encodes 3C-like proteinase that cleaves the C-terminus of the replicase polyprotein at 11 sites (Lokhande et al., 2020; Roe et al., 2021). One UM that we found in nsp5 did not fall in its active sites (3304 and 3408 in ORF1ab) warrants further investigation with a large number of sequence datasets.

Conclusion

The present global outbreak of COVID-19, caused by SARS-CoV-2, has already taken more than 2.5 million lives. Bangladeshi citizens are highly vulnerable to COVID-19 as evident by a number of circulating variants in different regions of this country. To combat this deadly disease, we need a better understanding of the pathobiology of this deadly virus. Hence, it is essential to minimize the translational gap between viral genomic information and its clinical consequences for developing effective therapeutic strategies. In this study, we have attempted to explore genomic variations of Bangladeshi SARS-CoV-2 viral isolates while comparing them with a large cohort of global isolates. Our analyses will facilitate the understanding of the origin, mutation patterns, and their possible effect on viral pathogenicity. We have addressed the importance of the variations in the viral genomes and their necessity for therapeutic interventions. The unique insights from this study will undoubtedly be supportive for a better understanding of the molecular mechanism of SARS-CoV-2 which would help to understand pathophysiology of the virus in highly populated countries like Bangladesh.

Author statement

MMH, RD, SMM and AMN designed the study. MMH, RD, ZS and AMN reviewed the literature, extracted the data and checked the extracted data. MMH, RD, MR, MHH, NMH, AS, and MHR planned and conducted the analysis. MMH, RD, SMM, ZS and AMN wrote the first draft and all authors reviewed and approved the final manuscript paper; approved the final version and agreed to be accountable for the work.

Declaration of Competing Interest

Not applicable.

6 in total

1. Prevalence of COVID-19 in Bangladesh, April to October 2020-a cross-sectional study.

Authors: Arifa Nazneen; Rebeca Sultana; Mahmudur Rahman; Mahbubur Rahman; Firdausi Qadri; Nadia Ali Rimi; Md Kamal Hossain; Muhammad Rashedul Alam; Mustafizur Rahman; Nitai Chakraborty; Shariful Amin Sumon; Enayet Hussain; Md Zakiul Hassan; Sazzad Hossain Khan; Md Hasnatuzzaman Prodhan; Arifur Rahman Bablu; Kajal Chandra Banik; Md Habibullah Fahad; Marjahan Akhtar; Syed Moinuddin Satter; Shahriar Ahmed; Ahmed Ehsanur Rahman; Taufiqur Rahman Bhuiyan; A S M Alamgir; Shams El Arifeen; Tahmina Shirin; Sayera Banu; Meerjady Sabrina Flora
Journal: IJID Reg Date: 2021-10-14

2. Extensive genetic diversity with novel mutations in spike glycoprotein of severe acute respiratory syndrome coronavirus 2, Bangladesh in late 2020.

Authors: S Z Afrin; S K Paul; J A Begum; S A Nasreen; S Ahmed; F U Ahmad; M A Aziz; R Parvin; M S Aung; N Kobayashi
Journal: New Microbes New Infect Date: 2021-04-24

3. Molecular and Serological Characterization of the SARS-CoV-2 Delta Variant in Bangladesh in 2021.

Authors: Asish Kumar Ghosh; Marco Kaiser; Md Maruf Ahmed Molla; Tasnim Nafisa; Mahmuda Yeasmin; Rifat Hossain Ratul; Md Mohiuddin Sharif; Arifa Akram; Nur Hosen; Rashid Mamunur; Md Robed Amin; Alimul Islam; Md Ehsanul Hoque; Olfert Landt; Simon D Lytton
Journal: Viruses Date: 2021-11-19 Impact factor: 5.048