| Literature DB >> 33890205 |
Dwaipayan Chaudhuri1, Satyabrata Majumder1, Joyeeta Datta1, Kalyan Giri2.
Abstract
Severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2), an enveloped RNA virus transmits by droplet infection thus affects the respiratory system. Different genomes have been reported globally for SARS-CoV-2 with moderate level of mutation which makes it harder to combat the virus. Mutational profiling and the relevant evolutionary aspect of coronavirus proteins namely spike glycoprotein, membrane protein, envelope protein, nucleoprotein, ORF1ab, ORF3a, ORF6, ORF7a, ORF7b and ORF8 were studied by in silico experiments. Clustering of the protein sequences and calculation of residue relative abundance were done to get an idea about the protein conservancy as well as finding out some representative sequences for phylogenetic and ancestral reconstruction. By mutational profiling and mutation analysis, the effect of mutations on the protein stability and their functional implication were studied. This study indicates the mutational effect on the proteins and their relevance in evolution, which directs us towards a better understanding of these variations and diversification of SARS-CoV-2 for useful future therapeutic study and thus aid in designing therapeutic agents keeping the highly variable regions in mind.Entities:
Keywords: Clustering; Mutation profiling; Mutation stability; SARS-CoV-2 proteins; Shannon entropy
Mesh:
Substances:
Year: 2021 PMID: 33890205 PMCID: PMC8061876 DOI: 10.1007/s10930-021-09988-3
Source DB: PubMed Journal: Protein J ISSN: 1572-3887 Impact factor: 2.371
Fig. 1Shannon entropy calculation of the aligned protein sequences of SARS-CoV-2. The bars show the frequency of variation of amino acids at that position. Lower entropic frequency means the presence of highly conserved amino acid residue at that position
Fig. 2Phylogenetic trees of eight SARS-Cov-2 proteins: a spike, b ORF3a, c envelope, d membrane protein, e ORF6, f ORF7a, g ORF8, h nucleocapsid, are mentioned in the text in Newick format. Representative sequences derived from clustering data were used for phylogenetic tree construction. Results show that the initial sequences (YP009724390, YP009724391, YP009724392, YP009724393, YP009724394, YP009724395, YP009724396, YP009724397) belong to the same strain for all proteins. The circle in each cases indicates the starting point for diverging out of phylogenetic tree. The rectangular boxes denote the representative sequence (ancestral sequence)
Fig. 3Independent model of mutational profiling using EVmutation server. X-axis of the graph represents the amino acid sequences with corresponding position of the mentioned protein and Y axis represents the amino acid (KRHEDNQTSCGAVLIMPYFW) substitution. The colour gradient from higher intensity of blue towards red refers to the lowering of harmful effect due to mutations on the protein
Fig. 4Epistatic model, considering both local and global effect of mutation using EVmutation server. X-axis of the graph represents the amino acid sequences with corresponding position of the mentioned protein and Y axis represents the amino acid (KRHEDNQTSCGAVLIMPYFW) substitution. The colour gradient from higher intensity of blue towards red refers to the lowering of harmful effect due to mutations on the protein
Number of mutations and mutational tolerance in SARS-CoV-2 proteins
| Name of the protein | Stabilizing mutation | Destabilizing mutation | Total no of mutation | Mutations tolerated | % of destabilized mutation | % of tolerating mutation | ||||
|---|---|---|---|---|---|---|---|---|---|---|
| Mild | Strong | Mild | Strong | Yes | No | |||||
| Envelope (E) | 0 | 1 | 1 | 1 | 3 | 1 | 2 | 66.7 | 33.3 | |
| Membrane protein (M) | 0 | 0 | 7 | 4 | 11 | 7 | 3 | 100 | 63.6 | |
| Nucleocapsid (N) | 17 | 3 | 34 | 10 | 64 | 33 | 31 | 68.75 | 51.6 | |
| Spike (S) | 10 | 1 | 36 | 24 | 71 | 54 | 17 | 84.5 | 76.05 | |
| ORF1ab | NSP1 | 3 | 0 | 8 | 7 | 18 | 3 | 15 | 78.94 | 55.9 |
| NSP2 | 19 | 3 | 32 | 26 | 80 | 55 | 25 | |||
| NSP3 | 26 | 4 | 62 | 55 | 147 | 105 | 23 | |||
| NSP4 | 9 | 2 | 15 | 11 | 37 | 14 | 12 | |||
| NSP5 | 7 | 0 | 13 | 4 | 23 | 12 | 7 | |||
| NSP6 | 2 | 0 | 6 | 11 | 18 | 12 | 5 | |||
| NSP7 | 1 | 0 | 3 | 1 | 5 | 0 | 10 | |||
| NSP8 | 5 | 1 | 14 | 4 | 24 | 14 | 4 | |||
| NSP9 | 1 | 0 | 5 | 3 | 9 | 5 | 2 | |||
| NSP10 | 0 | 0 | 2 | 2 | 4 | 2 | 32 | |||
| NSP12(RdRP) | 7 | 0 | 23 | 17 | 47 | 15 | 24 | |||
| NSP13 | 11 | 1 | 19 | 18 | 49 | 25 | 17 | |||
| NSP14 | 6 | 0 | 17 | 16 | 39 | 22 | 12 | |||
| NSP15 | 6 | 1 | 8 | 12 | 27 | 15 | 13 | |||
| NSP16 | 1 | 0 | 13 | 8 | 22 | 9 | 17 | |||
| Total | 104 | 12 | 240 | 195 | 551 | 308 | 218 | |||
| ORF3a | 6 | 1 | 17 | 17 | 41 | 6 | 35 | 82.9 | 14.6 | |
| ORF6 | 2 | 0 | 0 | 3 | 5 | 1 | 4 | 60 | 20 | |
| ORF7a | 2 | 1 | 6 | 7 | 16 | 2 | 14 | 81.25 | 14.3 | |
| ORF7b | 0 | 0 | 1 | 1 | 2 | Not applicable | 100 | - | ||
| ORF8 | 2 | 0 | 5 | 8 | 15 | 86.67 | - | |||
| Total | 162 | 617 | 779 | 414 | 338 | 79.20 | 53.14 | |||
The Gibbs free energy changes of four SARS-CoV-2 proteins due to the corresponding mutational changes
| Name of the protein | Amino acid position | AAwild type | AAMutant | ΔΔG at 25ºC | ΔΔG at 37ºC |
|---|---|---|---|---|---|
| Membrane protein (M) | 2 | A | V | − 0.39 | − 0.29 |
| S | − 0.46 | − 0.45 | |||
| 3 | D | G | − 0.49 | − 0.22 | |
| 38 | A | S | − 0.29 | − 0.27 | |
| 70 | V | I | − 1.27 | − 1.11 | |
| S | − 3.66 | − 3.36 | |||
| 85 | A | S | − 0.76 | − 0.74 | |
| 155 | H | Y | − 0.22 | − 0.21 | |
| 175 | T | M | − 0.59 | − 0.57 | |
| 190 | D | N | − 1.85 | − 1.65 | |
| 209 | D | Y | − 1.85 | − 1.73 | |
| Envelope (E) | 36 | A | V | + 1.47 | 1.59 |
| 37 | L | H | − 1.76 | − 1.74 | |
| 71 | L | P | − 1.34 | − 1.27 | |
| ORF6 | 8 | Q | H | − 1.9 | − 1.87 |
| 9 | V | F | − 2.36 | − 2.43 | |
| 34 | S | N | 0.85 | 1 | |
| 42 | K | N | − 1.41 | − 1.29 | |
| 61 | D | Y | 0.97 | 1.07 | |
| ORF7b | 19 | F | L | − 1.72 | − 1.63 |
| 28 | F | Y | − 0.89 | − 0.86 |
Statistically significant difference between ΔΔG data set of eight SARS CoV-2 proteins at 25 °C and 37 °C (except Envelope and ORF7b due to low sample size)
| Protein | T value | p value | Statistical significance |
|---|---|---|---|
| Membrane (M) | 0.26132 | 0.796515 | Non-significant |
| Nucleocapsid (N) | 0.60242 | 0.547981 | Non-significant |
| Spike (S) | 0.43077 | 0.66731 | Non-significant |
| ORF1ab | |||
| NSP1 | 0.24095 | 0.81104 | Non-significant |
| NSP2 | 0.50945 | 0.61115 | Non-significant |
| NSP3 | 0.66357 | 0.507488 | Non-significant |
| NSP4 | 0.14688 | 0.883634 | Non-significant |
| Protease | 0.43088 | 0.668568 | Non-significant |
| NSP6 | 0.06324 | 0.949926 | Non-significant |
| NSP7 | 0.11161 | 0.913886 | Non-significant |
| NSP8 | 0.35656 | 0.723051 | Non-significant |
| NSP9 | 0.18162 | 0.85816 | Non-significant |
| NSP10 | 0.29084 | 0.780965 | Non-significant |
| RdRp | 0.34436 | 0.731363 | Non-significant |
| NSP13 | 0.2186 | 0.827428 | Non-significant |
| NSP14 | 0.43463 | 0.665065 | Non-significant |
| NSP15 | 0.18243 | 0.855952 | Non-significant |
| NSP16 | 0.37798 | 0.707346 | Non-significant |
| ORF3a | 0.14018 | 0.888874 | Non-significant |
| ORF6 | 0.06503 | 0.949744 | Non-significant |
| ORF7a | 0.1184 | 0.906516 | Non-significant |
| ORF8 | 0.14751 | 0.883788 | Non-significant |