Literature DB >> 35966939

Characterization and Structural Prediction of Proteins in SARS-CoV-2 Bangladeshi Variant Through Bioinformatics.

Pinky Debnath¹, Umama Khan², Md Salauddin Khan³.

Abstract

The renowned respiratory disease induced by the severe acute respiratory syndrome-coronavirus-2 (SARS-CoV-2) has become a global epidemic in just less than a year by the first half of 2020. The subsequent efficient human-to-human transmission of this virus eventually affected millions of people worldwide. The most devastating thing is that the infection rate is continuously uprising and resulting in significant mortality especially among the older age population and those with health co-morbidities. This enveloped, positive-sense RNA virus is chiefly responsible for the infection of the upper respiratory system. The virulence of the SARS-CoV-2 is mostly regulated by its proteins such as entry to the host cell through fusion mechanism, fusion of infected cells with neighboring uninfected cells to spread virus, inhibition of host gene expression, cellular differentiation, apoptosis, mitochondrial biogenesis, etc. But very little is known about the protein structures and functionalities. Therefore, the main purpose of this study is to learn more about these proteins through bioinformatics approaches. In this study, ORF10, ORF7b, ORF7a, ORF6, membrane glycoprotein, and envelope protein have been selected from a Bangladeshi Corona-virus strain G039392 and a number of bioinformatics tools (MEGA-X-V10.1.7, PONDR, ProtScale, ProtParam, SCRIBER, NetSurfP v2.0, IntFOLD, UCSF Chimera, and PyMol) and strategies were implemented for multiple sequence alignment and phylogeny analysis with 9 different variants, predicting hydropathicity, amino acid compositions, protein-binding propensity, protein disorders, and 2D and 3D protein modeling. Selected proteins were characterized as highly flexible, structurally and electrostatically extremely stable, ordered, biologically active, hydrophobic, and closely related to proteins of different variants. This detailed information regarding the characterization and structure of proteins of SARS-CoV-2 Bangladeshi variant was performed for the first time ever to unveil the deep mechanism behind the virulence features. And this robust appraisal also paves the future way for molecular docking, vaccine development targeting these characterized proteins.

Entities: Chemical

Keywords: Bangladeshi covid-19 variant; ORF proteins; SARS-CoV-2; bioinformatics; membrane and envelope protein; structural prediction

Year: 2022 PMID： 35966939 PMCID： PMC9373114 DOI： 10.1177/11786361221115595

Source DB: PubMed Journal: Microbiol Insights ISSN： 1178-6361

Introduction

In December 2019, the whole world was stunned by the outbreak of unknown cause pneumonia, which was originated from Wuhan, Hubei Province of China. And then, by January 7, 2020, Chinese scientists have screened a novel Coronavirus (CoV) mainly responsible for the infection of the upper respiratory system from patients in Wuhan. The ensuing proficient human-to-human transmission of the virus ultimately affected millions of people worldwide. Since December 2021, there were more than 288.7 million confirmed cases and 5.45 million people have died around the world by this devastating virus. The most devastating thing is that the infection rate is continuously uprising which is resulting in significant mortality especially among the older age population and those with health co-morbidities. Corona viruses are very tiny in size (diameter, 65-125 nm) consist of a single-strand RNA as nucleic material.[2,3] Along with RNA, this particular virus consists of 12 different proteins such as nonstructural proteins (ORF1a and ORF1b) at the 5′-end, structural proteins (spike surface glycoprotein [S], envelope [E], matrix [M], and nucleocapsid [N]) and multiple lineage-specific accessory proteins (ORF3a, ORF6, ORF7b, ORF8, and ORF10) at the 3′-end. Although these proteins are basically involved in host receptor recognition, attachment, and entry into host cells, very slight is recognized about these protein structures and specific functionalities. The ORF10 protein is found upstream of the 3′-untranslated region (3′-UTR), apparently encodes for a protein of 38 amino acids long. The ORF7b protein is a presumed viral accessory protein encoded on subgenomic (sg) RNA 7, whereas, ORF7a possessed a distinctive immunoglobulin (Ig)-like domain with a 15-a.a single peptide sequence at its N terminus, an 81-a.a luminal domain, a 21 a.a transmembrane domain, and a short C-terminal tail. Also, the SARS-CoV ORF6 is characteristically between 42 and 63 amino acids in length and by transcribing into mRNA6 encodes SARS 6 protein. Moreover, the membrane glycoprotein is found abundantly and plays the main role in virion assemble, morphogenesis, and, also, define the shape of the viral envelope.[9,10] Lastly, the envelope proteins are short-chain polypeptide with a single α-helical transmembrane domain that can produce homopentametric ion channels (IC). Though Bangladesh is not exempt from the severe outbreak of the Corona virus, a large number of Bangladeshi strains also have been identified. On 8 March 2020, in Bangladesh, SARS-CoV-2 was reported for the very first time. A new strain was acknowledged on January 2021, from a 50-year-old symptomatic male patient in Dhaka, Bangladesh (SARS-CoV-2 strain G039392) and the strain was found as 99.9% identical to the UK variant B.1.1.7. In this study, 6 proteins (ORF10, ORF7b, ORF7a, ORF6, membrane glycoprotein, and envelope protein) were randomly selected from the SARS-CoV-2 Bangladeshi novel strain G039392 regarding characterization, so that more detailed studies could elucidate their structures and provide insights into the possible functions for the selected proteins. These 6 proteins were selected due to their availability in online database portal. From 9 different countries, selected 6 proteins were chosen for analysis from 9 different variants. At the time of the analysis all required data were available for only those important virulent proteins of SARS-CoV-2 virus though there are other proteins like Spike and nuclear capture proteins which are very important for the variant definition. Therefore, the major purpose of this analysis is to learn more about these proteins like the assessment of amino acid (a.a) composition, the energy level of chemical bonds, hydropathicity, etc. through bioinformatics approaches which could provide insight into probing novel functions regarding virulence of Covid-19. Moreover, structural prediction of 2D and 3D SARS-CoV-2 protein models could give further way to docking molecular components which can optimize devastating viral properties of this particular virus. Thereby, this study utilizes strictly bioinformatics approaches to theoretically characterize, classify, and construct the putative structure of selected 6 proteins in SARS-CoV-2 Bangladeshi strain G039392.

Materials and Methods

Sequence alignment and phylogenetic analysis

The reference sequence corresponding to the selected 6 proteins (ORF10, ORF7b, ORF7a, ORF6, membrane glycoprotein, and envelope protein) in SARS-CoV-2 strain G039392, along with other 9 variants from 9 different countries, were acquired from NCBI’s Protein Database. Sequences were aligned using MUSCLE on the MEGA-X-V10.1.7 software.[13,14] The neighbor-joining method was implemented by maintaining other default settings. Alignments reliability was measured by overall mean distance (⩽0.7 is reliable) and determined using p-distance substitution model. The protein trees were constructed using the neighbor-joining method and visualized on MEGA-X-V10.1.7. The phylogeny trees were tested using the bootstrap method.

Protein characterization

Phosphorylation sites were detected by DEPP server of PONDR®. ProtScale was used to generate hydrophobicity plot and ProtParam to determine the grand average of hydrophobicity (GRAVY). Also, the protein disorder predictions were performed using PONDR® (Predictor of Natural Disordered Regions) VLS2, XL1. Moreover, amino acid compositions and aliphatic index were analyzed employing ProtParam. Finally, the protein-binding propensities of the interacting residues were evaluated using SCRIBER.

Protein secondary structure prediction, 3D modeling, evolution, and validation

NetSurfP v2.0 server was employed to predict the protein secondary structure. The predictions of transmembrane helix (TH) were performed by TMHMM and Phobius by averaging the predictions and the most constant range of scores were utilized for analysis. The web server IntFOLD was used to make use of an ab initio modeling for constructing the selected proteins. According to the IntFOLD’s quality and confidence scoring, the models were evaluated and utilizing the 3Drefine web-server, the best model was then refined. The maximum QMEAN Z-score and Ramachandran plot were considered as most favorable among the 5 generated post-refinement models. Both UCSF Chimera and PyMol were used to visualize the most favorable 3D protein model.[27,28] The hydrophobicity surfaces were created according to the Kyte-Doolittle scale.

Results

In phylogenetic tree, the overall mean distance of ORF10 is 0.01, which is corresponding to almost 99.9% identity for the entire alignment (Figure 1G). The ORF10 protein from the strain of Spain has shown difference at the 30th position which is Leu (L) rather than Val (V) (Figure 1A). So, the height of the conserved region is from 1 to 29 residues. While, for ORF7b, ORF7a, and ORF6, membrane glycoprotein, envelope protein, the mean distance is 0.00 along with 100% conserved regions which are correspondences for the entire alignment (Figure 1).

Figure 1.

Multiple sequence alignment and phylogenetic analysis of 6 different protein sequences depicting evolutionary relationships with SARS-CoV-2 varieties of 10 different countries (A–F). Sequence alignment of ORF10, ORF7b, ORF7a, and ORF6, membrane glycoprotein, and envelope protein respectively. (G–L) Neighbor joining phylogenetic tree of ORF10, ORF7b, ORF7a, and ORF6, membrane glycoprotein, and envelope protein respectively. All proteins appear closely related to the proteins of different variants of different countries.

Phosphorylation

Single phosphorylation site (phosphorylated serine) was identified in ORF7a consist of 14.29%, whereas other proteins did not show any phosphorylation sites (Table 1).

Table 1.

Phosphorylation sites of the selected proteins.

Protein name	Number of phosphorylated serines	Number of phosphorylated threonines	Number of phosphorylated tyrosines
ORF10 protein	0 out of 2 (0.00%)	0 out of 2 (0.00%)	0 out of 3 (0.00%)
ORF7b protein	0 out of 2 (0.00%)	0 out of 1 (0.00%)	0 out of 1 (0.00%)
ORF7a protein	1 out of 7 (14.29%)	0 out of 10 (0.00%)	0 out of 5 (0.00%)
ORF6 protein	0 out of 4 (0.00%)	0 out of 3 (0.00%)	0 out of 2 (0.00%)
Membrane glycoprotein	0 out of 15 (0.00%)	0 out of 13 (0.00%)	0 out of 9 (0.00%)
Envelope protein	0 out of 8 (0.00%)	0 out of 4 (0.00%)	0 out of 4 (0.00%)

Phosphorylation sites of the selected proteins.

Hydropathicity

In ORF10 protein, the hydrophobicity plot exposed 2 hydrophobic regions spanning residues 3 to 20 and 28 to 35 along with 2 hydrophilic regions; residues 21 to 27 and a residue of 36 (Figure 2A).There are single hydrophilic and hydrophilic regions spanning residues 3 to 32 and 33 to 41, respectively in ORF7b protein (Figure 2B). In case of ORF7a protein, hydrophobic regions are 3 to 16, 25, 28 to 31, 47 and 48, 54 to 61, 63 to 67, 69, 72, 84 to 88, 98 to 115 and the hydrophilic regions are 17 to 24, 26 and 27, 32 to 46, 49 to 53, 62, 68, 70 and 71, 76 to 83, 89 to 97, 116 to 119 with the neutral regions 73, 75 (Figure 2C). In ORF6 protein, the hydrophobic regions are 3 to 7, 9 to 20, 23 to 28, 31 to 39, 42, hydrophilic regions are 8, 21 and 22, 29, 40 and 41, 43 to 59 with a neutral position 30 (Figure 2D). Moreover, for membrane glycoprotein, 8 to 11, 15, 21 to 39, 46 to 71, 74 and 75, 77 to 102, 104, 110, 117 to 122, 124, 126 to 132, 134, 137 to 146, 149 to 151, 168 to 171, 182 and 183, 189, 193 to 195, 217 to 220 are hydrophobic positions, while 3 to 7, 12 to 14, 16 to 20, 40 to 45, 72 and 73, 76, 103, 105 to 109, 111 to 116, 123, 125, 133, 135 and 136, 147 and 148, 152 to 167, 173 to 180, 184 to 188, 190 to 192, 196 to 216 are hydrophilic position and neutral places are 172, 181 (Figure 2E). Whereas, in case of envelop protein, hydrophobic region residues are 3 to 5, 11 to 54, 56 to 58, 60, 72 to 73 and hydrophilic region residues are 6 to 10, 55, 59, 61 to 71(Figure 2F). The GRAVY scores are 0.64, 1.45, 0.32, 0.23, 0.45, and 1.13, respectively for ORF10, ORF7b, ORF7a, ORF6, membrane glycoprotein, and envelope protein (Figure 2G).

Figure 2.

Hydrophobicity plot and GRAVY Scores of selected 6 proteins. (A-F) Hydrophobicity plot of ORF10, ORF7b, ORF7a, and ORF6, membrane glycoprotein, and envelope protein respectively. The hydrophobicity plots were generated according to the Kyte-Doolittle hydropathy plots. (G) GRAVY scores of ORF10, ORF7b, ORF7a, ORF6, membrane glycoprotein, and envelope protein. The numerical values for each score displayed above are their corresponding box. The proteins are recognized as mostly hydrophobic.

Protein disorder

The protein disorder plot indicated that the disorder scores were higher for C-terminal half than N-terminal half for almost all selected proteins (Figure 3). Almost all proteins showed protein disorder scores indicating moderate flexible to highly flexible residues. However, membrane glycoprotein is more disordered compared to other proteins. Also, no protein revealed scores of ⩽0.1 indicating rigidity.

Figure 3.

Per-residue disorder plot for ORF10, ORF7b, ORF7a, ORF6, membrane glycoprotein, and envelope protein of SARS-CoV2. All proteins found as highly flexible and ordered. Scores ⩾0.5 indicate disorder residues, while scores within 0.25 to 0.5 and 0.1 to 0.25 suggest highly flexible and moderate flexible residues. Scores ⩽0.1 indicate rigidity.

Amino acids composition and protein-binding propensity

The ORF10 protein consists of the highest percentage of asparagines (N), where, in case of ORF7b, ORF7a, membrane glycoprotein, and envelope protein, leucine (L) is presented in the maximum percentage. Moreover, for ORF6, it was isoleucine (Ile). The overall amino acid composition of all 6 proteins has been represented in Table 2. The binding propensity is important to influence electrostatic and aromatic interactions and also it is extremely varied with the amino acid residues. Several fluctuations have been observed in protein-binding propensity in both C-terminal half residues and N-terminal half residues (Supplemental Figure 1).

Table 2.

Amino acid composition of ORF10, ORF7b, ORF7a, ORF6, membrane glycoprotein, and envelope protein in percentage (%).

Amino acid	ORF10 (%)	ORF7b (%)	ORF7a (%)	ORF6 (%)	Membrane glycoprotein (%)	Envelope protein (%)
Ala (A)	5.3	4.7	7.4	1.6	8.6	5.3
Arg (R)	5.3	0.0	4.1	1.6	6.3	4.0
Asn (N)	13.2	2.3	1.7	6.6	5.0	6.7
Asp (D)	2.6	4.7	1.7	6.6	2.7	1.3
Cys (C)	2.6	4.7	5.0	0.0	1.8	4.0
Gln (Q)	2.6	2.3	4.1	4.9	1.8	0.0
Glu (E)	0.0	7.0	6.6	8.2	3.2	2.7
Gly (G)	2.6	0.0	3.3	0.0	6.3	1.3
His (H)	0.0	4.7	2.5	1.6	2.3	0.0
Ile (I)	7.9	11.6	6.6	16.4	9.0	4.0
Leu (L)	10.5	25.6	12.4	13.1	15.8	18.7
Lys (K)	0.0	0.0	5.8	6.6	3.2	2.7
Met (M)	5.3	4.7	0.8	4.9	1.8	1.3
Phe (F)	10.5	14.0	8.3	4.9	5.0	6.7
Pro (P)	2.6	0.0	5.0	1.6	2.3	2.7
Ser (S)	5.3	4.7	5.8	6.6	6.8	10.7
Thr (T)	5.3	2.3	8.3	4.9	5.9	5.3
Trp (W)	0.0	2.3	0.0	1.6	3.2	0.0
Tyr (Y)	7.9	2.3	4.1	3.3	4.1	5.3
Val (V)	10.5	2.3	6.6	4.9	5.4	17.3
Pyl (O)	0.0	0.0	0.0	0.0	0.0	0.0
Sec (U)	0.0	0.0	0.0	0.0	0.0	0.0

Amino acid composition of ORF10, ORF7b, ORF7a, ORF6, membrane glycoprotein, and envelope protein in percentage (%).

Aliphatic index and transmembrane helix

Aliphatic index values of more than 100 indicated that these proteins are highly thermo-stable over a wide temperature assortment. For ORF10, ORF7b, ORF7a, ORF6, membrane glycoprotein, and envelope protein, the aliphatic index values found were 107.63, 156.51, 100.74, 130.98, 120.86, and 144 respectively (Figure 4A). Transmembrane helices of less than one indicated these helices are less likely to interact with membrane lipids. For ORF10 and ORF7a, TH predicted spanning residues are 3 to 29 and 6 to 33, respectively. In case of ORF7b, TH predicted residues are 4 to 23 and 93 to 119. Furthermore, for ORF6 and envelope protein, the predicted TH spanning residues are 5 to 38 and 11 to 61. Finally, in case of membrane glycoprotein, the transmembrane helix residues are 18 to 59 and 61 to 105. The representation is in Figure 4B.

Figure 4.

(A, B) Aliphatic indexes and transmembrane helix prediction scores of ORF10, ORF7b, ORF7a, ORF6, membrane glycoprotein, and envelope protein respectively. All proteins showed aliphatic index values of more than 100 indicated that they are highly thermo-stable and transmembrane helices of less than 1.

Protein secondary structures

In respect to ORF10, ORF7b, and envelope protein, α-helix spanning residues are 11 to 21, 4 to 35, and 4 to 64, respectively (Figure 5A, B, and F). Additionally, in ORF6, the α-helix spanning residues are 4 to 21, 26 and 27, 29 to 44, 48 to 51 (Figure 5D). In case of ORF7a, the α-helix and β-sheet spanning residues are 90 to 96, 99 and 100, and 28 to 33, 40 to 41, 53 to 66, 72 to 79, respectively (Figure 5C). The membrane glycoprotein has the α-helix and β-sheet spanning residues of 10 to 19, 22 to 36, 40 to 70, 75 to 106, 161 to 163, and 112, 118 to 123, 128 to 132, 139 to 146, 148 to 151, 154 to 159, 167 to 172, 175 to 185, 193 to 201, respectively (Figure 5E).

Figure 5.

Secondary structure prediction of 6 selected proteins. (A-F) Secondary structure of ORF10, ORF7b, ORF7a, ORF6, membrane glycoprotein, and envelope protein respectively.

Protein modeling and validation

Initially, the models having low P-values and high-quality scores were subjected to refinement which was yielded by the IntFOLD web-based server. Then the selection was done according to the QMEAN Z score and Ramachandran plot score (Supplemental Table S1 and Figure 6A-F). QMEAN Z score and Ramachandran plot score table were added as Supplemental Table S1. In response to hydrophobic and hydrophilic properties, the majority of the proteins surfaces were found as hydrophobic (Figure 6G-L). The Ramachandran plot score details for all the proteins were found more than 90% except for membrane glycoprotein which is 87.6% (Figure 6M-R).

Figure 6.

Protein modeling and hydrophobicity surface 3D map of the selected 6 proteins. (A-F) Ribbon diagram of ORF10, ORF7b, ORF7a, ORF6, membrane glycoprotein, and envelope protein respectively. (G-L) Hydrophobicity surface map of ORF10, ORF7b, ORF7a, ORF6, membrane glycoprotein, and envelope protein respectively. All 6 protein models are found as highly flexible and stable. The blue color represents hydrophilic regions and the orange color expresses hydrophobic regions. Where, the whitish-blue color indicates semi-hydrophobic/hydrophilic character. (M-R) Representation of Ramachandran plot of ORF10, ORF7b, ORF7a, ORF6, membrane glycoprotein, and envelope protein, respectively.

Discussion

The phylogenetic data in this study proposes that ORF10 protein in Spain is most distantly related to all other ORF10 proteins. Whereas, the high similarity was detected in the other remaining selected ORF10 proteins of SARS-CoV-2. For ORF7b, ORF7a, and ORF6, membrane glycoprotein, and envelope protein, there was no distant relationship among the strains selected from the other different countries. All these proteins showed 100% conserved region, thus, mutations in these regions were not detected. It also revealed that these proteins shared a strong phylogenic relationship with their common ancestors in the past. Conserved regions of ORF7b, ORF7a, and ORF6, membrane glycoprotein, and envelope protein could be playing a fundamental role in the assembly of particular proteins, formation of protein structure, and/or demonstrating virulent functions by facilitating precise protein interactions. Although, defining relationships between specific sequences is not entirely possible when based solely on sequence data. We can predict that the selected proteins of SARS-CoV-2 here are highly ordered as intrinsically disordered proteins have a tendency to be phosphorylated that leads to disorder-to-order and order-to-disorder transitions. Phosphorylation controls the function of a particular protein and cell signaling by changing conformational shape in the phosphorylated protein which maintains the catalytic property of the protein. Thus, activation or inactivation of proteins mainly depends on phosphorylation. Prediction of phosphorylation site of selected 6 SARS-CoV-2 proteins conceded that phosphorylated serine, threonine, or tyrosine was mostly not present though ORF7a had a single phosphorylated serine. In every conceivable way, the phosphorylation of a distinct protein is able to modify its activities which include inflection of protein’s intrinsic biological property, proper sub-cellular location, docking with other related proteins, and half-life. It also decides the level and period of a response given by a protein which acts as an input to signal integration. Moreover, sites of phosphorylation are more prone to be evolutionary conserved than other interfacial residues. The purpose of the hydropathy index of amino acids is mainly to predict the function of a structurally or functionally unknown protein. The distribution of hydropathy clusters in a particular protein appears to recommend that these cluster location is principally conserved in a given group of proteins. In the present study, selected 6 SARS-CoV-2 proteins expressed hydropathy index which tended to be more hydrophobic. The literature revealed that hydrophobic proteins are more soluble and for this reason, they can function in an independent manner by avoiding undesirable interactions with watery molecules. In addition to that, these proteins are vital for protein folding which keeps it more stable and biologically active. Protein disorder predictions are an enormous challenge in structural proteomics and subsequently its function prediction including identification of those proteins that are unstructured either partially or wholly. In the current study, protein disorder predictions revealed that almost all proteins showed protein disorder scores indicating moderate flexible to highly flexible residues and no protein revealed scores which indicates rigidity. This result coincides with the interpretation presented by another research. However, membrane glycoprotein was more disordered compared to other proteins in this study. Disordered regions present in specific proteins could contain short linear peptide motifs which may later play a significant role in protein function. After predicting, avoidance of prospective disordered regions in protein can augment expression, proper foldability, and stability of that expressed protein. Protein binding propensity augments the knowledge of protein-protein interactions, docking, and annotation of functional properties of that protein at the molecular level. In addition, a high aliphatic index resembles to rise of the thermostability of globular proteins. All 6 selected SARS-CoV-2 proteins showed aliphatic index of more than 100 which indicates these proteins are highly thermostable over a wide range of temperature. Additionally, all 6 selected proteins showed transmembrane helixes which are less than 1 and transmembrane helixes have immense importance in the study of membrane proteins. Due to the significance of structural class prediction of protein, diverse major efforts have been employed to discover a prediction model that establishes the structural class and predicts protein secondary structure depending on the sequences of specific protein.[41,42] The prediction of secondary structures for 6 selected SARS-CoV-2 proteins revealed that each ORF7a protein and membrane glycoprotein has 1 α-helix and 1 β-strand. The structural class is one of the most imperative features for its vital task in the analysis of protein function, prediction of the rate of protein folding nature, and, also, execution of a suitable approach to uncover protein tertiary structure.[43 -45] Structure based antibody against SARS-CoV-2 can be a way to suppress the infection rate caused by this particular virus. By targeting specific proteins of this virus that can invade human body by directly attaching to the host cells would be a suitable approach. A recent study has demonstrated that SARS-CoV-2 attacks host cells via CD147-spike protein and this invasion of SARS-CoV-2 is mediated by a transmembrane glycoprotein from the immunoglobulin super family. An anti-CD147 humanized antibody named Meplazumab have the ability to block CD147 and subsequently prevention of SARS-CoV-2 to entrance to the host cells is occured. Thus, critical characterization and function analysis of structural proteins of SARS-CoV-2 is utmost necessary issue in therapeutic perspective.[48 -50] Different proteins of SARS-CoV-2 plays significant role to express its virulence in host. ORF10 protein of SARS-CoV-2 interacts with multiple human proteins after entering the body to control over the different molecular mechanisms. Mutations in the ORF-10 present a new level of severe infection rate. ORF7b protein of SARS-CoV-2 is an integral membrane protein that encoded within subgenomic RNA7. During infection, it accumulates in Golgi compartment associating with both cis and trans Golgi marker and causing Golgi compartment localization. Whereas, ORF7a protein of SARS-CoV-2 hinders bone marrow stromal antigen 2 virion tethering by a new system of interference of glycosylation process. ORF6 protein of SARS-CoV-2 was able to inhibit beta interferon (IFN-β) expression by halting its synthesis and signaling. Protein-protein interactions and protein-RNA interactions are significant for competent assembly of virion. Membrane glycoprotein of SARS-CoV-2 express a vital role in this purpose as formation of virus-like particle (VLP) in numerous SARS-CoV-2 involves only membrane glycoprotein, and envelope protein. Several in silico SARS-CoV-2 research presented the structure and functional perspective of the novel virus focusing its virulence transmission in human genome.[56 -58] The present study explored theoretical modeling, sequence, and structure-based functional characterization of 6 accessory proteins. Phylogenetic analysis of these proteins exposed a close evolutionary relationship with the proteins of distant origins. In this present study, the stable tertiary structure of proteins was predicted which gives the primary notion about the interaction of this protein 3D structures with enzymes or host receptors. Also, in this study, hydrophobicity surface map of particular proteins was created to distinctly show the hydrophobic or hydrophilic regions of protein. Selected 6 proteins of SARS-CoV-2 Bangladeshi variant were characterized as highly flexible, structurally and electrostatically extremely stable, ordered, biologically active, hydrophobic, and closely related to the proteins of different variants. Studying these diverse proteins of the SARS-CoV-2 virus has already yielded some clues about how they connect with the human cells but much remains to be assessed. Though further comprehensive assessment with broad-scale data are required to elucidate these upshots generated in this current study.

Conclusions

The analysis includes detailed information regarding the characterization and structure of proteins of SARS-CoV-2 Bangladeshi variant which was performed for the first time ever to enlighten the deep mechanism behind the virulence of the particular virus. Communally, the present study provides an interesting basis for characterizing proteins of novel viruses theoretically and structurally. The selected 6 proteins characterized as stable, ordered, hydrophobic, and also share strong phylogenetic relationships with proteins of other closely related SARS-CoV-2. Finally, the tertiary models of protein constructed in this study have higher quality and stability. This analysis can offer a foundation to perform the further analysis necessary to evaluate the biological function, interaction, and relevance to viral property of the 6 proteins in SARS-CoV-2. These predicted structures would be functional for investigation of each protein interaction and their functionalities by advanced computational analysis, understanding of viral pathogenesis or to study potential vaccines and especially, to avert epidemics and pandemics. Click here for additional data file. Supplemental material, sj-docx-1-mbi-10.1177_11786361221115595 for Characterization and Structural Prediction of Proteins in SARS-CoV-2 Bangladeshi Variant Through Bioinformatics by Pinky Debnath, Umama Khan and Md. Salauddin Khan in Microbiology Insights

54 in total

1. MEGA X: Molecular Evolutionary Genetics Analysis across Computing Platforms.

Authors: Sudhir Kumar; Glen Stecher; Michael Li; Christina Knyaz; Koichiro Tamura
Journal: Mol Biol Evol Date: 2018-06-01 Impact factor: 16.240

2. Coronavirus Infections-More Than Just the Common Cold.

Authors: Catharine I Paules; Hilary D Marston; Anthony S Fauci
Journal: JAMA Date: 2020-02-25 Impact factor: 56.272

3. Severe Acute Respiratory Syndrome Coronavirus ORF7a Inhibits Bone Marrow Stromal Antigen 2 Virion Tethering through a Novel Mechanism of Glycosylation Interference.

Authors: Justin K Taylor; Christopher M Coleman; Sandra Postel; Jeanne M Sisk; John G Bernbaum; Thiagarajan Venkataraman; Eric J Sundberg; Matthew B Frieman
Journal: J Virol Date: 2015-09-16 Impact factor: 5.103

4. Phosphorylation in protein-protein binding: effect on stability and function.

Authors: Hafumi Nishi; Kosuke Hashimoto; Anna R Panchenko
Journal: Structure Date: 2011-12-07 Impact factor: 5.006

5. Protein disorder prediction: implications for structural proteomics.

Authors: Rune Linding; Lars Juhl Jensen; Francesca Diella; Peer Bork; Toby J Gibson; Robert B Russell
Journal: Structure Date: 2003-11 Impact factor: 5.006

6. Toward the estimation of the absolute quality of individual protein structure models.

Authors: Pascal Benkert; Marco Biasini; Torsten Schwede
Journal: Bioinformatics Date: 2010-12-05 Impact factor: 6.937

7. IntFOLD: an integrated web resource for high performance protein structure and function prediction.

Authors: Liam J McGuffin; Recep Adiyaman; Ali H A Maghrabi; Ahmad N Shuid; Danielle A Brackenridge; John O Nealon; Limcy S Philomina
Journal: Nucleic Acids Res Date: 2019-07-02 Impact factor: 16.971

Review 8. COVID-19 infection: Origin, transmission, and characteristics of human coronaviruses.

Authors: Muhammad Adnan Shereen; Suliman Khan; Abeer Kazmi; Nadia Bashir; Rabeea Siddique
Journal: J Adv Res Date: 2020-03-16 Impact factor: 10.479

9. Genome Sequence of a SARS-CoV-2 Strain from Bangladesh That Is Nearly Identical to United Kingdom SARS-CoV-2 Variant B.1.1.7.

Authors: Mohammad Enayet Hossain; M Mahfuzur Rahman; M Shaheen Alam; Yeasir Karim; Ananya Ferdous Hoque; Sezanur Rahman; Mohammed Ziaur Rahman; Mustafizur Rahman
Journal: Microbiol Resour Announc Date: 2021-02-25

10. Comparison study on statistical features of predicted secondary structures for protein structural class prediction: From content to position.

Authors: Qi Dai; Yan Li; Xiaoqing Liu; Yuhua Yao; Yunjie Cao; Pingan He
Journal: BMC Bioinformatics Date: 2013-05-04 Impact factor: 3.169